将 TiDB Ansible 集群导入到 TiUP 中报错

tidb:v4.0.0-rc

错误描述:

在tidb用户下执行 tiup cluster import -d /home/tidb/tidb-ansible 命令出现如下权限问题。

Error: can not detect dir paths of tiflash 192.168.1.1:9000, grep: /etc/systemd/system/tiflash-9000.service: Permission denied

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-06-15-15-06-17.log.
Error: run /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster (wd:/home/tidb/.tiup/data/S1xgy64) failed: exit status 1

手动对/etc/systemd/system/tiflash-9000.service文件添加777权限也还是报错。

cat /home/tidb/logs/tiup-cluster-debug-2020-06-15-15-06-17.log 日志:

2020-06-15T14:38:52.101+0800 DEBUG Detecting deploy paths on 192.168.1.1…
2020-06-15T14:38:52.215+0800 INFO SSHCommand {“host”: “192.168.1.1”, “port”: “22”, “cmd”: "PATH=$PATH:/usr/bin:/usr/sbin cat grep 'ExecStart' /etc/systemd/system/tiflash-9000.serv ice | sed 's/ExecStart=//'", “stdout”: “”, “stderr”: “grep: /etc/systemd/system/tiflash-9000.service: Permission denied\ ”}
2020-06-15T14:38:52.215+0800 INFO Execute command finished {“code”: 1, “error”: “can not detect dir paths of tiflash 192.168.1.1:9000, grep: /etc/systemd/system/tiflash-9000.serv
ice: Permission denied\ ”, “errorVerbose”: “can not detect dir paths of tiflash 192.168.1.1:9000, grep: /etc/systemd/system/tiflash-9000.service: Permission denied\ \ngithub.com/pingcap/tiup/
pkg/cluster/ansible.readStartScript\ \tgithub.com/pingcap/tiup@/pkg/cluster/ansible/dirs.go:239\ github.com/pingcap/tiup/pkg/cluster/ansible.parseDirs\ \tgithub.com/pingcap/tiup@/pkg/cluster/an
sible/dirs.go:44\ngithub.com/pingcap/tiup/pkg/cluster/ansible.ParseAndImportInventory\ \tgithub.com/pingcap/tiup@/pkg/cluster/ansible/inventory.go:82\ github.com/pingcap/tiup/components/cluster
/command.newImportCmd.func1\ \tgithub.com/pingcap/tiup@/components/cluster/command/import.go:100\ github.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:842\ gith
ub.com/spf13/cobra.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.0.0/command.go:950\ github.com/spf13/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:887\ github.com/pin
gcap/tiup/components/cluster/command.Execute\ \tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\ main.main\ \tgithub.com/pingcap/tiup@/components/cluster/main.go:19\ runtime.mai
n\ \truntime/proc.go:203\ runtime.goexit\ \truntime/asm_amd64.s:1357”}

你好,

可否将 debug 日志上传下,这边看下上下文,并反馈下 ll/etc/systemd/system/tiflash-9000.service

你好:日志已上传
tiup-cluster-debug-2020-06-15-15-06-17.log (10.8 KB)

ll -lht /etc/systemd/system/tiflash-9000.service
-rwxrwxrwx 1 tidb tidb 304 May 19 11:51 /etc/systemd/system/tiflash-9000.service

你好,

通过中控机 ssh tidb@192.168.1.1 执行下面语句看是否有权限问题。

cat grep 'ExecStart' /etc/systemd/system/tiflash-9000.service | sed 's/ExecStart=//'

tidb>$ cat grep ‘ExecStart’ /etc/systemd/system/tiflash-9000.service | sed ‘s/ExecStart=//’
cat: grep: No such file or directory
cat: ExecStart: No such file or directory
cat: /etc/systemd/system/tiflash-9000.service: Permission denied

不行。

ssh tidb@192.168.1.1 是否是免密登录进去的。

通过 ssh -i /home/tidb/.tiup/storage/cluster/clusters/qh/ssh/id_rsa tidb@172.16.4.107 看是否成功,并执行 import 命令

发现在tidb用户下没有/home/tidb/.tiup/storage/cluster/clusters/yourClusterName/ssh/id_rsa 这个文件:
1.1.tidb.com<2020-06-16 15:02:43> ~/.tiup/storage/cluster
tidb>$ ls
audit

难道是我安装tiup有问题? 参考文档:https://pingcap.com/docs-cn/dev/upgrade-tidb-using-tiup/

192.168.1.1就是中控机,刚没说清楚,这个权限问题就是出在中控机上。

目前排查的方向是免密 sudo 的问题,目前操作为了验证通过 ssh 登录远程服务器,是否正常

额,辛苦将此字段换成集群的名字。

感谢反馈,将/home/tidb/.ssh/id_rsa.pub 写入 /home/tidb/.ssh/authorized_keys,并执行 import -d

yourClusterName

额,辛苦将此字段换成集群的名字。
1.1.tidb.com<2020-06-16 15:02:43> ~/.tiup/storage/cluster
tidb>$ ls
audit
我目录下只有这个文件。
将/home/tidb/.ssh/id_rsa.pub 写入 /home/tidb/.ssh/authorized_keys 我看了免密串是一致的。
应该不是这个问题吧?

辛苦反馈下 service 文件,

cat /etc/systemd/system/tiflash-9000.service

cat /etc/systemd/system/tiflash-9000.service
[Unit]
Description=tiflash-9000 service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
#LimitCORE=infinity
LimitSTACK=10485760
User=tidb
ExecStart=/data0/tidb/scripts/run_tiflash.sh
Restart=always
RestartSec=15s

[Install]
WantedBy=multi-user.target

你好,

请执行下以下命令确认下 tidb 用的 sudo 权限

tiup cluster exec qh --command=“sudo echo success”

ps: qh 改为自己集群的名字

你好:
tidb>$ tiup cluster exec test-cluster --command=“sudo echo success”
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster exec test-cluster --command=“sudo echo success”
Run shell command on host in the tidb cluster

Usage:
tiup cluster exec [flags]

Flags:
–command string the command run on cluster host (default “ls”)
-h, --help help for exec
-N, --node strings Only exec on host with specified nodes
-R, --role strings Only exec on host with specified roles
–sudo use root permissions (default false)

Global Flags:
–ssh-timeout int Timeout in seconds to connect host via SSH, ignored for operations that don’t need an SSH connection. (default 5)
–wait-timeout int Timeout in seconds to wait for an operation to complete, ignored for operations that don’t fit. (default 60)
-y, --yes Skip all confirmations and assumes ‘yes’

改为单引号。

tidb>$ tiup cluster exec test-cluster --command=‘sudo echo success’
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster exec test-cluster --command=sudo echo success

Error: cannot execute command on non-exists cluster test-cluster

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-06-17-09-52-17.log.
Error: run /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster (wd:/home/tidb/.tiup/data/S286z6L) failed: exit status 1

debug 日志:
tidb>$ more /home/tidb/logs/tiup-cluster-debug-2020-06-17-09-52-17.log
2020-06-17T09:52:17.593+0800 INFO Execute command {“command”: “tiup cluster exec test-cluster --command=sudo echo success”}
2020-06-17T09:52:17.593+0800 DEBUG Environment variables {“env”: [“TIUP_HOME=/home/tidb/.tiup”, “TIUP_WORK_DIR=/home/tidb”, “TIUP_INSTANCE_DATA_DIR=/hom
e/tidb/.tiup/data/S286z6L”, “TIUP_COMPONENT_DATA_DIR=/home/tidb/.tiup/storage/cluster”, “TIUP_COMPONENT_INSTALL_DIR=/home/tidb/.tiup/components/cluster/v1.0.4”
, “TIUP_TELEMETRY_STATUS=enable”, “TIUP_TELEMETRY_UUID=bd0663f6-9535-4351-84aa-dd2ccde2496e”, “TIUP_TAG=S286z6L”, “XDG_SESSION_ID=3918”, “HOSTNAME=1.1.tidb.com”, “SHELL=/bin/bash”, “TERM=linux”, “HISTSIZE=1000”, “USER=tidb”, “LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;
33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.l
ha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31
:
.lz=01;31:.lzo=01;31:.xz=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar
=01;31:
.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.jpg=01;35:.jpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01
;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;
35:
.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;3
5:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.e
mf=01;35:.axv=01;35:.anx=01;35:.ogv=01;35:.ogx=01;35:.aac=01;36:.au=01;36:.flac=01;36:.mid=01;36:.midi=01;36:.mka=01;36:.mp3=01;36:.mpc=01;36:.ogg
=01;36:
.ra=01;36:.wav=01;36:.axa=01;36:.oga=01;36:.spx=01;36:*.xspf=01;36:”, “MAVEN_HOME=/usr/local/maven/apache-maven-3.6.3”, “MAIL=/var/spool/mail/tidb”
, “PATH=/home/tidb/.tiup/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/maven/apache-maven-3.6.3/bin:/home/tidb/.local/bin:/home/tidb/bi
n:/usr/local/maven/apache-maven-3.6.3/bin”, “PWD=/home/tidb”, “LANG=en_US.UTF-8”, “TZ=Asia/Shanghai”, “PS1=\[\e]0;\a\]\ \[\e[1;32m\]\[\e[1;33m\]\H
[\e[1;35m\]<$(date +”%Y-%m-%d %T")> \[\e[32m\]\w\[\e[0m\]\ \u>\$ ", “HISTCONTROL=ignoredups”, “SHLVL=1”, “HOME=/home/tidb”, “LOGNAME=tidb”, “LES
SOPEN=||/usr/bin/lesspipe.sh %s”, “_=/home/tidb/.tiup/bin/tiup”, “OLDPWD=/home/tidb/tidb-ansible”]}
2020-06-17T09:52:17.599+0800 INFO Execute command finished {“code”: 1, “error”: “cannot execute command on non-exists cluster test-cluster”, “erro
rVerbose”: “cannot execute command on non-exists cluster test-cluster\ngithub.com/pingcap/tiup/components/cluster/command.newExecCmd.func1\ \tgithub.com/pingca
p/tiup@/components/cluster/command/exec.go:47\ngithub.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:842\ github.com/spf13/cobr
a.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.0.0/command.go:950\ github.com/spf13/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:88
7\ngithub.com/pingcap/tiup/components/cluster/command.Execute\ \tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\ main.main\ \tgithub.com/pingc
ap/tiup@/components/cluster/main.go:19\ runtime.main\ \truntime/proc.go:203\ runtime.goexit\ \truntime/asm_amd64.s:1357”}

你这个集群名字不存在,

Error: cannot execute command on non-exists cluster test-cluster
错误:无法在不存在的集群测试集群上执行命令

可能是减号引起的问题,尝试将集群名称用单引号包裹起来

还是一样,报错和上面一样。 cluster_name = test-cluster 名字也没错。

sorry,该集群还没有导入成功,所以在 tiup 中还未识别该集群名称。

回归到问题本身,在 tidb-ansible 目录执行下 ansible-playbook -i hosts.ini create_users.yml -u root -k 保证 host.ini 中存在 inventory 文件中所有的 ip。重新配置下 ssh 互信和 sudo 规则。

因为在报错信息中,获取到的还是权限问题,

tidb>$ ssh root@192.168.1.1
Last login: Wed Jun 17 13:45:18 2020 from 192.168.1.1

root># grep ‘ExecStart’ /etc/systemd/system/tiflash-9000.service | sed ‘s/ExecStart=//’
/data0/tidb/scripts/run_tiflash.sh

从tidb用户ssh到root没问题。:joy: 还是一样的错误