tiup cluster list查询为空

tiup cluster list 查询为空
如何查询到当前正在执行的tidb cluster的名称?

通过 tiup cluster deploy 部署过的集群才会显示在 tiup cluster list 中,你之前使用 tiup cluster deploy 部署过吗?

是用的以下命令安装的,在中控机上安装 TiDB 组件

4.1) tiup cluster deploy mytidb v4.0.8 mini.yaml --user root -p **8

-p为root的密码。
tiup cluster list查不到,tiup cluster stop mytidb也提示不对

部署的日志能否上传一下?找到对应的 deploy 命令:

JoshuadeMacBook-Pro:~ joshua$ tiup cluster audit|head
Starting component `cluster`: /Users/joshua/.tiup/components/cluster/v1.2.5/tiup-cluster audit
ID           Time                       Command
--           ----                       -------
fvJhyJsdH4T  2020-12-10T10:27:31+08:00  /Users/joshua/.tiup/components/cluster/v1.2.5/tiup-cluster audit
fvJhs7Dw4Mt  2020-12-10T10:25:53+08:00  /Users/joshua/.tiup/components/cluster/v1.2.5/tiup-cluster deploy test v4.0.8 /Users/joshua/test.yaml -p

然后 tiup cluster audit fvJhs7Dw4Mt 查看日志(fvJhs7Dw4Mt 为 deploy 命令对应的 ID)

tiup cluster audit fvxY1fHvVy7


  • OPERATION TIME: 2020-12-06T00:22:49 -

/root/.tiup/components/cluster/v1.2.5/tiup-cluster deploy mytidb v4.0.0 mini.yaml --user root -p
2020-12-06T00:22:42.099+0800 INFO Execute command {“command”: “tiup cluster deploy mytidb v4.0.0 mini.yaml --user root -p”}
2020-12-06T00:22:42.105+0800 INFO Please confirm your topology:
2020-12-06T00:22:42.105+0800 WARN Attention:
2020-12-06T00:22:42.105+0800 WARN 1. If the topology is not what you expected, check your yaml file.
2020-12-06T00:22:42.105+0800 WARN 2. Please confirm there is no port/directory conflicts in same host.
2020-12-06T00:22:48.917+0800 ERROR SSHCommand {“host”: “172.19.120.84”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H bash -c "id -u tidb > /dev/null 2>&1 || (/usr/sbin/groupadd -f tidb && /usr/sbin/useradd -m -s /bin/bash -g tidb tidb) && echo ‘tidb ALL=(ALL) NOPASSWD:ALL’ > /etc/sudoers.d/tidb"”, “error”: “ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain”, “stdout”: “”, “stderr”: “”}
2020-12-06T00:22:48.918+0800 ERROR SSHCommand {“host”: “172.19.120.83”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H bash -c "id -u tidb > /dev/null 2>&1 || (/usr/sbin/groupadd -f tidb && /usr/sbin/useradd -m -s /bin/bash -g tidb tidb) && echo ‘tidb ALL=(ALL) NOPASSWD:ALL’ > /etc/sudoers.d/tidb"”, “error”: “ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain”, “stdout”: “”, “stderr”: “”}
2020-12-06T00:22:48.918+0800 ERROR SSHCommand {“host”: “172.19.120.85”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H bash -c "id -u tidb > /dev/null 2>&1 || (/usr/sbin/groupadd -f tidb && /usr/sbin/useradd -m -s /bin/bash -g tidb tidb) && echo ‘tidb ALL=(ALL) NOPASSWD:ALL’ > /etc/sudoers.d/tidb"”, “error”: “ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain”, “stdout”: “”, “stderr”: “”}
2020-12-06T00:22:48.918+0800 INFO Execute command finished {“code”: 1, “error”: “task.env_init.failed: Failed to initialize TiDB environment on remote host ‘172.19.120.84’, cause: module.user.user_add_failed: Failed to create new system user ‘tidb’ on remote host, cause: executor.ssh.execute_failed: Failed to execute command over SSH for ‘root@172.19.120.84:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H bash -c "id -u tidb > /dev/null 2>&1 || (/usr/sbin/groupadd -f tidb && /usr/sbin/useradd -m -s /bin/bash -g tidb tidb) && echo ‘tidb ALL=(ALL) NOPASSWD:ALL’ > /etc/sudoers.d/tidb"}, cause: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain”, “errorVerbose”: “task.env_init.failed: Failed to initialize TiDB environment on remote host ‘172.19.120.84’, cause: module.user.user_add_failed: Failed to create new system user ‘tidb’ on remote host, cause: executor.ssh.execute_failed: Failed to execute command over SSH for ‘root@172.19.120.84:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H bash -c "id -u tidb > /dev/null 2>&1 || (/usr/sbin/groupadd -f tidb && /usr/sbin/useradd -m -s /bin/bash -g tidb tidb) && echo ‘tidb ALL=(ALL) NOPASSWD:ALL’ > /etc/sudoers.d/tidb"}, cause: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain
at github.com/pingcap/tiup/pkg/cluster/executor.(*EasySSHExecutor).Execute()
\tgithub.com/pingcap/tiup@/pkg/cluster/executor/ssh.go:153
at github.com/pingcap/tiup/pkg/cluster/module.(*UserModule).Execute()
\tgithub.com/pingcap/tiup@/pkg/cluster/module/user.go:126
at github.com/pingcap/tiup/pkg/cluster/task.(*EnvInit).execute()
\tgithub.com/pingcap/tiup@/pkg/cluster/task/env_init.go:67
at github.com/pingcap/tiup/pkg/cluster/task.(*EnvInit).Execute()
\tgithub.com/pingcap/tiup@/pkg/cluster/task/env_init.go:46
at github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute()
\tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:191
at github.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute()
\tgithub.com/pingcap/tiup@/pkg/cluster/task/step.go:85
at github.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1()
\tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:236
at runtime.goexit()
\truntime/asm_amd64.s:1357”}

虽然有报错,但是cluster是部署起来了。 运行连接正常的。

navigate连接数据库,也是正常的。

这个集群看起来不是 TiUP 部署的,麻烦看下 /etc/systemd/system/tikv-20160.service 的内容

[Unit]
Description=tikv service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
LimitSTACK=10485760

User=tidb
ExecStart=/tidb-deploy/tikv-20160/scripts/run_tikv.sh
Restart=always

RestartSec=15s

[Install]
WantedBy=multi-user.target

在 tivk 的机器上 head 一下日志看看启动信息呢?
head /tidb-deploy/tikv-20160/log/tikv.log

[root@iZuf6ikybnkbb4w1xvsmfxZ soft]# head /tidb-deploy/tikv-20160/log/tikv.log
[2020/12/10 10:59:11.477 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=348]
[2020/12/10 11:06:04.248 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=http://172.19.120.85:2379]
[2020/12/10 11:06:04.248 +08:00] [INFO] [] [“New connected subchannel at 0x7fe029269480 for subchannel 0x7fe0292e3680”]
[2020/12/10 11:06:04.250 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=http://172.19.120.84:2379]
[2020/12/10 11:06:04.251 +08:00] [INFO] [util.rs:484] [“connected to PD leader”] [endpoints=http://172.19.120.84:2379]
[2020/12/10 11:06:04.251 +08:00] [INFO] [util.rs:190] [“heartbeat sender and receiver are stale, refreshing …”]
[2020/12/10 11:06:04.252 +08:00] [WARN] [util.rs:209] [“updating PD client done”] [spend=4.052198ms]
[2020/12/10 11:06:04.252 +08:00] [INFO] [client.rs:433] [“cancel region heartbeat sender”]
[2020/12/10 11:09:11.502 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=421414574195474432]
[2020/12/10 11:09:11.960 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=348]
[root@iZuf6ikybnkbb4w1xvsmfxZ soft]# dir /tidb-deploy/
monitor-9100 pd-2379 tidb-4000 tikv-20160

看起来这不是文件的开头,grep -r “Welcome to TiKV” /tidb-deploy/tikv-20160/log/ 看看

从 audit log 上看并没有执行过 tiup cluster start mytidb 命令,所以基本上确定这个集群不是 tiup 拉起来的。是不是有其他地方在控制这个集群?

tiup cluster start 确实是这个命令启的,控制的其他 2台服务器。

[root@iZuf6ikybnkbb4w1xvsmfvZ /]# grep -r “Welcome” /tidb-deploy/tikv-20160/log/
/tidb-deploy/tikv-20160/log/tikv.log.2020-12-08-10:56:06.504514139:[2020/12/07 10:55:56.506 +08:00] [INFO] [lib.rs:92] [“Welcome to TiKV”]

这个里面日志显示 tikv 的启动时间是 12 月 8 日,但是上传的 audit log 里没有执行过 tiup cluster start,执行了一次 restart 的时间是 12 月 10 号(并且这个 restart 命令是不对的,启动不了集群),说明集群不是由这台机器控制的:

目前能得出的结论是:你看到正在运行的这个集群不是由你执行 tiup cluster 这台机器的 root 用户部署的,所以请排查两方面:

  • 确认执行 tiup cluster 的机器是该集群的中控机

  • 确认 root 用户是部署该集群的用户

中控机怎么可能有 welcome信息呢?

image

为什么要在中控机上找启动信息呢? tidb一般是 tidb主机启动啊,中控只是负责控制

但是事实我是在中控上启动了2台 tidb服务器,现在也是运行的。

我不清楚为什么找不到 cluster name

你说可以正常访问的那个 tidb 集群,是运行在 tidb227 上的,是吧?

我看你的 Xshell 第一个标签 web62_tb_中控 这个机器应该是中控机,跟 tidb227 是不同的机器,对吧?

可是你排查问题,执行 tiup cluster audit 的机器,好像也是 tidb227 并不是 web62_tb_中控

感觉是你搞错了。

你可以在你的中控机器上,看看这个目录是什么内容

ls $TIUP_HOME/storage/cluster/clusters

我的意思是上 TiKV 的那台机器去执行这个 grep 命令,没说是在中控机上执行哦,并且你前面的回复里也去 TiKV 的机器上执行了这个命令,并且拿到了结果,根据结果判断,你认为的 “中控机” 并不是实际的中控机,需要进一步判断是否是用其他用户执行了 tiup cluster deploy

image

第 4 步:在中控机上安装 TiDB 组件

4.1) tiup cluster deploy mytidb v4.0.8 mini.yaml --user root -p **8

-p为root的密码。

4.2) 启动tidb

tiup cluster start mytidb

这是我的笔记,我当时是这么运行和启动的