tiup cluster v5.0.1启动异常,tikv有编译报错

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【测试环境】
2核6g,硬盘使用如下

[root@train ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 2.9G 0 2.9G 0% /dev
tmpfs 2.9G 36M 2.8G 2% /dev/shm
tmpfs 2.9G 155M 2.7G 6% /run
tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup
/dev/sda2 19G 15G 4.1G 79% /
tmpfs 581M 60K 581M 1% /run/user/0
/dev/sr0 478K 478K 0 100% /run/media/root/config-2
tmpfs 581M 36K 581M 1% /run/user/1000

【TiDB 版本】
v5.0.1

【问题描述】
完成部署 tiup cluster deploy cluster-test v5.0.1 ./topo.yaml --user root -p
启动异常 tiup cluster start cluster-test
错误日志 /tidb-deploy/tikv-20160/log/tikv.log

[2021/05/13 19:07:56.096 +08:00] [ERROR] [server.rs:854] [“failed to init io snooper”] [err_code=KV:Unknown] [err="“IO snooper is not started due to not compiling with BCC”"]
[2021/05/13 19:07:58.765 +08:00] [FATAL] [lib.rs:465] [“called Result::unwrap() on an Err value: Os { code: 2, kind: NotFound, message: “No such file or directory” }”] [backtrace=“stack backtrace:\ 0: tikv_util::set_panic_hook::{{closure}}\ at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/components/tikv_util/src/lib.rs:464\ 1: std::panicking::rust_panic_with_hook\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595\ 2: std::panicking::begin_panic_handler::{{closure}}\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497\ 3: std::sys_common::backtrace::__rust_end_short_backtrace\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141\ 4: rust_begin_unwind\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:493\ 5: core::panicking::panic_fmt\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:92\ 6: core::option::expect_none_failed\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/option.rs:1266\ 7: core::result::Result<T,E>::unwrap\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/result.rs:969\ cmd::server::TiKVServer::init_fs\ at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/cmd/src/server.rs:373\ cmd::server::run_tikv\ at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/cmd/src/server.rs:133\ 8: tikv_server::main\ at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/cmd/src/bin/tikv-server.rs:181\ 9: core::ops::function::FnOnce::call_once\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/ops/function.rs:227\ std::sys_common::backtrace::__rust_begin_short_backtrace\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/sys_common/backtrace.rs:125\ 10: main\ 11: __libc_start_main\ 12: \ ”] [location=cmd/src/server.rs:385] [thread_name=main]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

启动集群部署,tikv组件报错

[root@train ~]# tiup cluster start cluster-test
Starting component cluster: /root/.tiup/components/cluster/v1.4.3/tiup-cluster start cluster-test
Starting cluster cluster-test…

  • [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/cluster-test/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/cluster-test/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [Parallel] - UserSSH: user=tidb, host=127.0.0.1
  • [ Serial ] - StartCluster
    Starting component pd
    Starting instance pd 127.0.0.1:2379
    Start pd 127.0.0.1:2379 success
    Starting component node_exporter
    Starting instance 127.0.0.1
    Start 127.0.0.1 success
    Starting component blackbox_exporter
    Starting instance 127.0.0.1
    Start 127.0.0.1 success
    Starting component tikv
    Starting instance tikv 127.0.0.1:20162
    Starting instance tikv 127.0.0.1:20160
    Starting instance tikv 127.0.0.1:20161

Error: failed to start tikv: failed to start: tikv 127.0.0.1:20160, please check the instance’s log(/tidb-deploy/tikv-20160/log) for more detail.: timed out waiting for port 20160 to be started after 2m0s

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2021-05-13-19-18-24.log.
Error: run /root/.tiup/components/cluster/v1.4.3/tiup-cluster (wd:/root/.tiup/data/SXHyMqt) failed: exit status 1

集群部署的组件状态信息

[root@train ~]# tiup cluster display cluster-test
Starting component cluster : /root/.tiup/components/cluster/v1.4.3/tiup-cluster display cluster-test
Cluster type: tidb
Cluster name: cluster-test
Cluster version: v5.0.1
SSH type: builtin
Dashboard URL: http://127.0.0.1:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir

127.0.0.1:3000 grafana 127.0.0.1 3000 linux/x86_64 inactive - /tidb-deploy/grafana-3000
127.0.0.1:2379 pd 127.0.0.1 2379/2380 linux/x86_64 Up|L|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
127.0.0.1:9090 prometheus 127.0.0.1 9090 linux/x86_64 inactive /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
127.0.0.1:4000 tidb 127.0.0.1 4000/10080 linux/x86_64 Down - /tidb-deploy/tidb-4000
127.0.0.1:9000 tiflash 127.0.0.1 9000/8123/3930/20170/20292/8234 linux/x86_64 Down /tidb-data/tiflash-9000 /tidb-deploy/tiflash-9000
127.0.0.1:20160 tikv 127.0.0.1 20160/20180 linux/x86_64 Down /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
127.0.0.1:20161 tikv 127.0.0.1 20161/20181 linux/x86_64 Down /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
127.0.0.1:20162 tikv 127.0.0.1 20162/20182 linux/x86_64 Down /tidb-data/tikv-20162 /tidb-deploy/tikv-20162
Total nodes: 8

reserve-space 默认会占用 5 GB ,导致无法启动,如果只是测试,您可以把这里的参数改为 0.
https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#reserve-space

[storage]
reserve-space = “200MB”

修改为200MB后,tikv启动正常,但tiflash启动失败,tiflash-tikv有类似报错,信息如下:

Starting component tikv
Starting instance tikv 127.0.0.1:20162
Starting instance tikv 127.0.0.1:20160
Starting instance tikv 127.0.0.1:20161
Start tikv 127.0.0.1:20160 success
Start tikv 127.0.0.1:20162 success
Start tikv 127.0.0.1:20161 success
Starting component tidb
Starting instance tidb 127.0.0.1:4000
Start tidb 127.0.0.1:4000 success
Starting component tiflash
Starting instance tiflash 127.0.0.1:9000

Error: failed to start tiflash: failed to start: tiflash 127.0.0.1:9000, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2021-05-14-09-23-32.log.
Error: run /root/.tiup/components/cluster/v1.4.3/tiup-cluster (wd:/root/.tiup/data/SXLOxrH) failed: exit status 1

错误日志 /tidb-deploy/tiflash-9000/log/tiflash_tikv.log

[2021/05/14 09:24:42.417 +08:00] [ERROR] [server.rs:795] [“failed to init io snooper”] [err_code=KV:Unknown] [err="“IO snooper is not started due to not compiling with BCC”"]

修改为0吧,测试环境不需要预留了,如果还不行,可能需要扩容一些。

可以了

:+1:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。