basic-tidb-0 一直启动失败 :CrashLoopBackOff

【 TiDB 使用环境】生产环境 /测试/ Poc
kind测试环境
【 TiDB 版本】
6.5
【复现路径】做过哪些操作出现的问题
安装官网教程使用kind 搭建集群
【遇到的问题:问题现象及影响】
root@ubuntu:/home/cxd# kubectl get pods -n tidb-cluster
NAME READY STATUS RESTARTS AGE
basic-discovery-5fbdb874d8-9btwp 1/1 Running 5 14h
basic-monitor-0 4/4 Running 16 14h
basic-pd-0 1/1 Running 4 14h
basic-tidb-0 1/2 CrashLoopBackOff 16 12h
basic-tidb-dashboard-0 1/1 Running 4 14h
basic-tikv-0 1/1 Running 4 13h

【资源配置】
【附件:截图/日志/监控】
[terror.go:300] [“unexpected error”] [error=“path "/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b" is not a descendant of mount point root "/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

第一次使用,云小白,期望大佬给点建议

想安装,启动运行起来,花了一天多时间,到了这一步,感觉就差最后一步了

卡了,很久了,没找到好的解决方法,希望大家给点建议

或者给点思路,我往思路方向去排查

这是kind 测试场景,我使用的是VM虚拟机

主机
|—|—|
|处理器|11th Gen Intel(R) Core™ i5-1135G7 @ 2.40GHz 2.42 GHz|
|机带 RAM|16.0 GB (15.7 GB 可用)|
|系统类型|64 位操作系统, 基于 x64 的处理器|

basic-tidb-0 1/2 CrashLoopBackOff 16 12h

describe 确认报错,https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/deploy-failures

Events:
Type Reason Age From Message


Normal Scheduled 12h default-scheduler Successfully assigned tidb-cluster/basic-tidb-0 to kind-control-plane
Normal Pulled 12h kubelet Container image “alpine:3.16.0” already present on machine
Normal Created 12h kubelet Created container slowlog
Normal Started 12h kubelet Started container slowlog
Normal Pulled 12h (x2 over 12h) kubelet Container image “uhub.service.ucloud.cn/pingcap/tidb:v6.5.0” already present on machine
Normal Created 12h (x2 over 12h) kubelet Created container tidb
Normal Started 12h (x2 over 12h) kubelet Started container tidb
Warning BackOff 12h (x3 over 12h) kubelet Back-off restarting failed container
Normal SandboxChanged 12h kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 12h kubelet Container image “alpine:3.16.0” already present on machine
Normal Created 12h kubelet Created container slowlog
Normal Started 12h kubelet Started container slowlog
Normal Pulled 12h (x3 over 12h) kubelet Container image “uhub.service.ucloud.cn/pingcap/tidb:v6.5.0” already present on machine
Normal Created 12h (x3 over 12h) kubelet Created container tidb
Normal Started 12h (x3 over 12h) kubelet Started container tidb
Warning BackOff 12h (x6 over 12h) kubelet Back-off restarting failed container
Normal SandboxChanged 40m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 40m kubelet Container image “alpine:3.16.0” already present on machine
Normal Created 40m kubelet Created container slowlog
Normal Started 40m kubelet Started container slowlog
Normal Pulled 39m (x4 over 40m) kubelet Container image “uhub.service.ucloud.cn/pingcap/tidb:v6.5.0” already present on machine
Normal Created 39m (x4 over 40m) kubelet Created container tidb
Normal Started 39m (x4 over 40m) kubelet Started container tidb
Warning BackOff 16s (x185 over 40m) kubelet Back-off restarting failed container
root@ubuntu:/home/cxd#

上面是kubectl describe pod basic-tidb-0 -n tidb-cluster的执行结果

kubectl logs -n tidb-cluster -f basic-tidb-0 -c tidb 执行结果:
start tidb-server …
/tidb-server --store=tikv --advertise-address=basic-tidb-0.basic-tidb-peer.tidb-cluster.svc --host=0.0.0.0 --path=basic-pd:2379 --config=/etc/tidb/tidb.toml
–log-slow-query=/var/log/tidb/slowlog
[2023/04/24 01:34:12.357 +00:00] [INFO] [cpuprofile.go:113] [“parallel cpu profiler started”]
[2023/04/24 01:34:12.357 +00:00] [FATAL] [terror.go:300] [“unexpected error”] [error=“path "/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b" is not a descendant of mount point root "/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

VM给的是10G 8核

感觉是目录权限的玩呢提,部署配置文件是什么啊?

em…是按照官网步骤来的:
https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/get-started

cgroup 有什么特殊配置? TiDB 一些特性需要首先读取节点的内存信息

没有做过什么配置,是新装的虚拟机

#subsys_name hierarchy num_cgroups enabled
cpuset 9 64 1
cpu 6 182 1
cpuacct 6 182 1
blkio 5 182 1
memory 7 276 1
devices 4 183 1
freezer 11 65 1
net_cls 2 64 1
perf_event 3 64 1
net_prio 2 64 1
hugetlb 12 64 1
pids 8 187 1
rdma 13 6 1
misc 10 1 1

你在虚拟机里直接创建一个docker容器试下,看有报错么?我怀疑可能是因为你是docker in 虚拟机,而官方文档是直接docker跑在本地

这样可以证明吗?
root@ubuntu:/home/cxd# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest feb5d9fea6a5 19 months ago 13.3kB
kindest/node 094599011731 2 years ago 1.17GB
root@ubuntu:/home/cxd# docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

  1. The Docker client contacted the Docker daemon.
  2. The Docker daemon pulled the “hello-world” image from the Docker Hub.
    (amd64)
  3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
  4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
Overview | Docker Documentation

root@ubuntu:/home/cxd#

报错来自 https://github.com/pingcap/tidb/blob/v6.5.0/tidb-server/main.go#L614-L615, 看这个包是用来 “Automatically set GOMAXPROCS to match Linux container CPU quota.” , 猜测还是 VM 环境导致的。