启动TIDB失败

/etc/fstab里每一项都添加了节点 还是不行

不是要每一项都加,需要满足 tidb 装载要求的才加,而且每个环境都不一样,只能依靠你自己排查 :joy:

正常情况是不需要用 / 根目录的。都是装在挂载的文件系统下面的,你把 yaml 配置文件完善一下吧。把 data_dir 和 deploy_dir 都设置一下,不要使用 / 根目录。

您的意思是只需要data_dir 和 deploy_dir所在的盘挂载吗

是的,你不是三块盘,挂载了三个文件系统/data1, /data2, /data。
你把安装目录还有数据目录都放在你挂载的盘上。对这些盘设置 nodelalloc 和noatime 参数就行了。不要去动根目录。
你的yaml文件指定一下data_dir 和 deploy_dir,tiup 应该就不会去检查 / 了。

apply以后,显示如下,启动集群还是显示check bootstrapped failed

Node          Check         Result  Message
----          -----         ------  -------
172.17.0.157  memory        Pass    memory size is 32768MB
172.17.0.157  selinux       Pass    SELinux is disabled
172.17.0.157  command       Pass    numactl: policy: default
172.17.0.157  timezone      Pass    time zone is the same as the first PD machine: Etc/UTC
172.17.0.157  cpu-cores     Pass    number of CPU cores / threads: 24
172.17.0.157  cpu-governor  Pass    CPU frequency governor is performance
172.17.0.157  service       Pass    service firewalld not found, ignore
172.17.0.157  os-version    Warn    OS is Ubuntu 18.04.6 LTS 18.04.6 (ubuntu support is not fullsted, be careful), auto fixing not supported
172.17.0.157  network       Pass    network speed of eno1 is 1000MB
172.17.0.157  network       Pass    network speed of eno2 is 1000MB
172.17.0.157  network       Pass    network speed of eno3 is 1000MB
172.17.0.157  network       Pass    network speed of eno4 is 1000MB
172.17.0.157  thp           Pass    THP is disabled
172.17.0.158  cpu-governor  Pass    CPU frequency governor is performance
172.17.0.158  network       Pass    network speed of eno1 is 1000MB
172.17.0.158  network       Pass    network speed of eno2 is 1000MB
172.17.0.158  network       Pass    network speed of eno3 is 1000MB
172.17.0.158  network       Pass    network speed of eno4 is 1000MB
172.17.0.158  selinux       Pass    SELinux is disabled
172.17.0.158  command       Pass    numactl: policy: default
172.17.0.158  timezone      Pass    time zone is the same as the first PD machine: Etc/UTC
172.17.0.158  os-version    Warn    OS is Ubuntu 18.04.6 LTS 18.04.6 (ubuntu support is not fullsted, be careful), auto fixing not supported
172.17.0.158  cpu-cores     Pass    number of CPU cores / threads: 24
172.17.0.158  memory        Pass    memory size is 32768MB
172.17.0.158  thp           Pass    THP is disabled
172.17.0.158  service       Pass    service firewalld not found, ignore
172.17.0.159  os-version    Warn    OS is Ubuntu 18.04.6 LTS 18.04.6 (ubuntu support is not fullsted, be careful), auto fixing not supported
172.17.0.159  selinux       Pass    SELinux is disabled
172.17.0.159  thp           Pass    THP is disabled
172.17.0.159  service       Pass    service firewalld not found, ignore
172.17.0.159  command       Pass    numactl: policy: default
172.17.0.159  timezone      Pass    time zone is the same as the first PD machine: Etc/UTC
172.17.0.159  cpu-cores     Pass    number of CPU cores / threads: 24
172.17.0.159  cpu-governor  Pass    CPU frequency governor is performance
172.17.0.159  memory        Pass    memory size is 32768MB
172.17.0.159  network       Pass    network speed of eno4 is 1000MB
172.17.0.159  network       Pass    network speed of eno1 is 1000MB
172.17.0.159  network       Pass    network speed of eno2 is 1000MB
172.17.0.159  network       Pass    network speed of eno3 is 1000MB
172.17.0.154  network       Pass    network speed of eno1 is 1000MB
172.17.0.154  network       Pass    network speed of eno2 is 1000MB
172.17.0.154  network       Pass    network speed of eno3 is 1000MB
172.17.0.154  network       Pass    network speed of eno4 is 1000MB
172.17.0.154  selinux       Pass    SELinux is disabled
172.17.0.154  thp           Pass    THP is disabled
172.17.0.154  command       Pass    numactl: policy: default
172.17.0.154  os-version    Warn    OS is Ubuntu 18.04.6 LTS 18.04.6 (ubuntu support is not fullsted, be careful), auto fixing not supported
172.17.0.154  cpu-cores     Pass    number of CPU cores / threads: 24
172.17.0.154  cpu-governor  Pass    CPU frequency governor is performance
172.17.0.154  memory        Pass    memory size is 32768MB
172.17.0.155  cpu-cores     Pass    number of CPU cores / threads: 24
172.17.0.155  network       Pass    network speed of eno2 is 1000MB
172.17.0.155  network       Pass    network speed of eno3 is 1000MB
172.17.0.155  network       Pass    network speed of eno4 is 1000MB
172.17.0.155  network       Pass    network speed of eno1 is 1000MB
172.17.0.155  thp           Pass    THP is disabled
172.17.0.155  selinux       Pass    SELinux is disabled
172.17.0.155  service       Pass    service firewalld not found, ignore
172.17.0.155  command       Pass    numactl: policy: default
172.17.0.155  timezone      Pass    time zone is the same as the first PD machine: Etc/UTC
172.17.0.155  os-version    Warn    OS is Ubuntu 18.04.6 LTS 18.04.6 (ubuntu support is not fullsted, be careful), auto fixing not supported
172.17.0.155  cpu-governor  Pass    CPU frequency governor is performance
172.17.0.155  memory        Pass    memory size is 32768MB
172.17.0.156  timezone      Pass    time zone is the same as the first PD machine: Etc/UTC
172.17.0.156  cpu-cores     Pass    number of CPU cores / threads: 24
172.17.0.156  selinux       Pass    SELinux is disabled
172.17.0.156  thp           Pass    THP is disabled
172.17.0.156  command       Pass    numactl: policy: default
172.17.0.156  os-version    Warn    OS is Ubuntu 18.04.6 LTS 18.04.6 (ubuntu support is not fullsted, be careful), auto fixing not supported
172.17.0.156  cpu-governor  Pass    CPU frequency governor is performance
172.17.0.156  memory        Pass    memory size is 32768MB
172.17.0.156  network       Pass    network speed of eno1 is 1000MB
172.17.0.156  network       Pass    network speed of eno2 is 1000MB
172.17.0.156  network       Pass    network speed of eno3 is 1000MB
172.17.0.156  network       Pass    network speed of eno4 is 1000MB
172.17.0.156  service       Pass    service firewalld not found, ignore
+ Try to apply changes to fix failed checks
  - Applying changes on 172.17.0.159 ... Done
  - Applying changes on 172.17.0.154 ... Done
  - Applying changes on 172.17.0.155 ... Done
  - Applying changes on 172.17.0.156 ... Done
  - Applying changes on 172.17.0.157 ... Done
  - Applying changes on 172.17.0.158 ... Done

是我系统的问题吗?但是文档显示6.1.0是支持ubuntu16.04以上的啊
image

和系统应该没关系,还是你磁盘配置有问题,这是我的配置,你参考一下,按照你的环境修改
vi /etc/fstab
UUID=3c7e03d9-1ce3-46a6-b982-3c201e127673 /tidb6 ext4 defaults,nodelalloc,noatime 0 2

mkdir /tidb6.1 && mount -a
[root@dbserver ~]$mount -t ext4
/dev/mapper/vg-root on / type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
/dev/sdb on /tidb6.1 type ext4 (rw,noatime,nodelalloc,data=ordered)

tiup cluster template > topology.yaml

global:
user: “tidb”
ssh_port: 22
deploy_dir: “/tidb6.1/tidb-deploy”
data_dir: “/tidb6.1/tidb-data”
server_configs: {}
pd_servers:

  • host: 192.168.80.174
    tidb_servers:
  • host: 192.168.80.179
    tikv_servers:
  • host: 192.168.80.176
  • host: 192.168.80.177
  • host: 192.168.80.178
    monitoring_servers:
  • host: 192.168.80.174
    grafana_servers:
  • host: 192.168.80.174
    alertmanager_servers:
  • host: 192.168.80.174

[stack=“github.com/pingcap/tidb/session.getStoreBootstrapVersion 说明store没起来,也就是tikv没起来。启动集群的顺序是pd,tikv,tidb。

这个不是显示tikv起来了

pan@admin:~$ tiup cluster start tidb-test --init
tiup is checking updates for component cluster ...
Starting component `cluster`: /home/pan/.tiup/components/cluster/v1.11.0/tiup-cluster start tidbt --init
Starting cluster tidb-test...
+ [ Serial ] - SSHKeySet: privateKey=/home/pan/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rpublicKey=/home/pan/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.154
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.154
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.156
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.159
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.155
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.158
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.158
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.157
+ [Parallel] - UserSSH: user=tidb, host=172.17.0.158
+ [ Serial ] - StartCluster
Starting component pd
        Starting instance 172.17.0.154:2379
        Start instance 172.17.0.154:2379 success
Starting component tikv
        Starting instance 172.17.0.157:20160
        Starting instance 172.17.0.155:20160
        Starting instance 172.17.0.156:20160
        Start instance 172.17.0.155:20160 success
        Start instance 172.17.0.157:20160 success
        Start instance 172.17.0.156:20160 success
Starting component tidb
        Starting instance 172.17.0.154:4000

tidb.log的具体内容
tidb.log (64 KB)

[2022/11/01 17:06:13.297 +08:00] [FATAL] [session.go:3052] ["check bootstrapped failed"] [error="context deadline exceeded"] [stack="github.com/pingcap/tidb/session.getStoreBootstrapVersion\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:3052\ngithub.com/pingcap/tidb/session.BootstrapSession\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2827\nmain.createStoreAndDomain\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:296\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:202\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
[2022/11/01 17:02:47.500 +08:00] [FATAL] [session.go:3052] ["check bootstrapped failed"] [error="[tikv:9002]TiKV server timeout"] [stack="github.com/pingcap/tidb/session.getStoreBootstrapVersion\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:3052\ngithub.com/pingcap/tidb/session.BootstrapSession\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2827\nmain.createStoreAndDomain\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:296\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:202\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]

请求tikv的网络超时了,检查一下tidb主机到tikv主机的网络

现在不超时了,但是Region is unavailable
[2022/11/02 10:55:59.137 +08:00] [FATAL] [session.go:3068] [“check bootstrapped failed”] [error=“[tikv:9005]Region is unavailable”] [stack=“github.com/pingcap/tidb/session.getStoreBootstrapVersion\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:3068\ngithub.com/pingcap/tidb/session.BootstrapSession\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2843\nmain.createStoreAndDomain\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:296\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:202\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

显示tiflash有问题


是不是我的机器没有满足32核的要求
image

1 个赞

重新的restart 一下 看看

还是报错
[2022/11/02 16:20:38.205 +08:00] [FATAL] [session.go:3068] [“check bootstrapped failed”] [error=“[tikv:9005]Region is unavailable”] [stack=“github.com/pingcap/tidb/session.getStoreBootstrapVersion\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:3068\ngithub.com/pingcap/tidb/session.BootstrapSession\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2843\nmain.createStoreAndDomain\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:296\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:202\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

[2022/11/02 16:18:34.720 +08:00] [WARN] [backoff.go:158] [“regionMiss backoffer.maxSleep 40000ms is exceeded, errors:\nepoch_not_match:<> at 2022-11-02T16:18:33.211197558+08:00\nepoch_not_match:<> at 2022-11-02T16:18:33.714191748+08:00\nepoch_not_match:<> at 2022-11-02T16:18:34.216897365+08:00\nlongest sleep type: regionMiss, time: 40010ms”]

没太看明白,现在是

  1. check bootstrapped failed 过了?应该还是没过吧?
  2. 要不直接把 相关日志扔上来?