tikv扩容失败

【 TiDB 使用环境】生产环境
【 TiDB 版本】v4.0.10
【复现路径】tiup cluster scale-out tidb-anti xxxx.yaml
cat xxxx.yaml
tikv_servers:

  • host: 10.65.66.201
  • host: 10.65.14.152
    检查日志发现:
    cat /data/tidb/.tiup/logs/tiup-cluster-debug-2024-11-19-16-04-04.log
    2024-11-19T16:04:04.560+0800 DEBUG TaskFinish {“task”: “ScaleConfig: cluster=tidb-anti, user=tidb, host=10.65.66.201, service=tikv-20160.service, deploy_dir=/data/tidb/deploy/tikv-20160, data_dir=[/data/tidb/data/tikv-20160], log_dir=/data/tidb/deploy/tikv-20160/log, cache_dir=/data/tidb/.tiup/storage/cluster/clusters/tidb-anti/config-cache”, “error”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.66.201:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1: check config failed”, “errorVerbose”: “check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.66.201:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1\ngithub.com/pingcap/tiup/pkg/cluster/spec.checkConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/server_config.go:311\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).InitConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:302\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).ScaleConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:357\ngithub.com/pingcap/tiup/pkg/cluster/task.(*ScaleConfig).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/scale_config.go:50\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1594”}
    2024-11-19T16:04:04.560+0800 DEBUG TaskFinish {“task”: “UserSSH: user=tidb, host=10.65.66.201\nScaleConfig: cluster=tidb-anti, user=tidb, host=10.65.66.201, service=tikv-20160.service, deploy_dir=/data/tidb/deploy/tikv-20160, data_dir=[/data/tidb/data/tikv-20160], log_dir=/data/tidb/deploy/tikv-20160/log, cache_dir=/data/tidb/.tiup/storage/cluster/clusters/tidb-anti/config-cache”, “error”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.66.201:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1: check config failed”, “errorVerbose”: “check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.66.201:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1\ngithub.com/pingcap/tiup/pkg/cluster/spec.checkConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/server_config.go:311\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).InitConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:302\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).ScaleConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:357\ngithub.com/pingcap/tiup/pkg/cluster/task.(*ScaleConfig).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/scale_config.go:50\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1594”}
    2024-11-19T16:04:04.560+0800 DEBUG TaskFinish {“task”: “UserSSH: user=tidb, host=10.65.66.201\nScaleConfig: cluster=tidb-anti, user=tidb, host=10.65.66.201, service=tikv-20160.service, deploy_dir=/data/tidb/deploy/tikv-20160, data_dir=[/data/tidb/data/tikv-20160], log_dir=/data/tidb/deploy/tikv-20160/log, cache_dir=/data/tidb/.tiup/storage/cluster/clusters/tidb-anti/config-cache\nUserSSH: user=tidb, host=10.65.14.152\nScaleConfig: cluster=tidb-anti, user=tidb, host=10.65.14.152, service=tikv-20160.service, deploy_dir=/data/tidb/deploy/tikv-20160, data_dir=[/data/tidb/data/tikv-20160], log_dir=/data/tidb/deploy/tikv-20160/log, cache_dir=/data/tidb/.tiup/storage/cluster/clusters/tidb-anti/config-cache”, “error”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.14.152:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1: check config failed”, “errorVerbose”: “check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.14.152:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1\ngithub.com/pingcap/tiup/pkg/cluster/spec.checkConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/server_config.go:311\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).InitConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:302\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).ScaleConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:357\ngithub.com/pingcap/tiup/pkg/cluster/task.(*ScaleConfig).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/scale_config.go:50\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1594”}
    2024-11-19T16:04:04.560+0800 INFO Execute command finished {“code”: 1, “error”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.14.152:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1: check config failed”, “errorVerbose”: “check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.65.14.152:22’ {ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd "" --data-dir "/data/tidb/data/tikv-20160"}, cause: Process exited with status 1\ngithub.com/pingcap/tiup/pkg/cluster/spec.checkConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/server_config.go:311\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).InitConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:302\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).ScaleConfig\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tikv.go:357\ngithub.com/pingcap/tiup/pkg/cluster/task.(*ScaleConfig).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/scale_config.go:50\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1594”}

接手的集群,怀疑是之前有人升级过该集群,导致的配置信息错乱。

重新做一遍免密互信吧。

或者建上tidb 用户,scale-out 时加上-utidb -p 应该也可以

通过tidb用户可以ssh到目标机器,不是互信导致的吧。。

ssh的时候要输入密码么? 需要免密互信。

不需要

ssh_stderr: invalid configuration: entity not found\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml --pd “” --data-dir “/data/tidb/data/tikv-20160”

应该是 tikv 配置文件里有些非法参数,检查下 tikv 参数配置文件吧

是的,不知道哪来的脏配置,严重怀疑升级过。。。导致配置文件错乱。

去要扩容的实例上检查 tikv.toml 发现有非标配置,直接手动创建目录并修改权限,tiup扩容正常了。

报错中提示有无效的配置:invalid configuration,可以通过 /data/tidb/deploy/tikv-20160/bin/tikv-server --config-check --config=/data/tidb/deploy/tikv-20160/conf/tikv.toml 命令进行检测配置文件是否有错误的地方

配置文件问题