将 TiDB Ansible 集群导入到 TiUP 中报错

目前排查的方向是免密 sudo 的问题,目前操作为了验证通过 ssh 登录远程服务器,是否正常

额,辛苦将此字段换成集群的名字。

感谢反馈,将/home/tidb/.ssh/id_rsa.pub 写入 /home/tidb/.ssh/authorized_keys,并执行 import -d

yourClusterName

额,辛苦将此字段换成集群的名字。
1.1.tidb.com<2020-06-16 15:02:43> ~/.tiup/storage/cluster
tidb>$ ls
audit
我目录下只有这个文件。
将/home/tidb/.ssh/id_rsa.pub 写入 /home/tidb/.ssh/authorized_keys 我看了免密串是一致的。
应该不是这个问题吧?

辛苦反馈下 service 文件,

cat /etc/systemd/system/tiflash-9000.service

cat /etc/systemd/system/tiflash-9000.service
[Unit]
Description=tiflash-9000 service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
#LimitCORE=infinity
LimitSTACK=10485760
User=tidb
ExecStart=/data0/tidb/scripts/run_tiflash.sh
Restart=always
RestartSec=15s

[Install]
WantedBy=multi-user.target

你好,

请执行下以下命令确认下 tidb 用的 sudo 权限

tiup cluster exec qh --command=“sudo echo success”

ps: qh 改为自己集群的名字

你好:
tidb>$ tiup cluster exec test-cluster --command=“sudo echo success”
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster exec test-cluster --command=“sudo echo success”
Run shell command on host in the tidb cluster

Usage:
tiup cluster exec [flags]

Flags:
–command string the command run on cluster host (default “ls”)
-h, --help help for exec
-N, --node strings Only exec on host with specified nodes
-R, --role strings Only exec on host with specified roles
–sudo use root permissions (default false)

Global Flags:
–ssh-timeout int Timeout in seconds to connect host via SSH, ignored for operations that don’t need an SSH connection. (default 5)
–wait-timeout int Timeout in seconds to wait for an operation to complete, ignored for operations that don’t fit. (default 60)
-y, --yes Skip all confirmations and assumes ‘yes’

改为单引号。

tidb>$ tiup cluster exec test-cluster --command=‘sudo echo success’
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster exec test-cluster --command=sudo echo success

Error: cannot execute command on non-exists cluster test-cluster

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-06-17-09-52-17.log.
Error: run /home/tidb/.tiup/components/cluster/v1.0.4/tiup-cluster (wd:/home/tidb/.tiup/data/S286z6L) failed: exit status 1

debug 日志:
tidb>$ more /home/tidb/logs/tiup-cluster-debug-2020-06-17-09-52-17.log
2020-06-17T09:52:17.593+0800 INFO Execute command {“command”: “tiup cluster exec test-cluster --command=sudo echo success”}
2020-06-17T09:52:17.593+0800 DEBUG Environment variables {“env”: [“TIUP_HOME=/home/tidb/.tiup”, “TIUP_WORK_DIR=/home/tidb”, “TIUP_INSTANCE_DATA_DIR=/hom
e/tidb/.tiup/data/S286z6L”, “TIUP_COMPONENT_DATA_DIR=/home/tidb/.tiup/storage/cluster”, “TIUP_COMPONENT_INSTALL_DIR=/home/tidb/.tiup/components/cluster/v1.0.4”
, “TIUP_TELEMETRY_STATUS=enable”, “TIUP_TELEMETRY_UUID=bd0663f6-9535-4351-84aa-dd2ccde2496e”, “TIUP_TAG=S286z6L”, “XDG_SESSION_ID=3918”, “HOSTNAME=1.1.tidb.com”, “SHELL=/bin/bash”, “TERM=linux”, “HISTSIZE=1000”, “USER=tidb”, “LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;
33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.l
ha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31
:
.lz=01;31:.lzo=01;31:.xz=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar
=01;31:
.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.jpg=01;35:.jpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01
;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;
35:
.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;3
5:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.e
mf=01;35:.axv=01;35:.anx=01;35:.ogv=01;35:.ogx=01;35:.aac=01;36:.au=01;36:.flac=01;36:.mid=01;36:.midi=01;36:.mka=01;36:.mp3=01;36:.mpc=01;36:.ogg
=01;36:
.ra=01;36:.wav=01;36:.axa=01;36:.oga=01;36:.spx=01;36:*.xspf=01;36:”, “MAVEN_HOME=/usr/local/maven/apache-maven-3.6.3”, “MAIL=/var/spool/mail/tidb”
, “PATH=/home/tidb/.tiup/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/maven/apache-maven-3.6.3/bin:/home/tidb/.local/bin:/home/tidb/bi
n:/usr/local/maven/apache-maven-3.6.3/bin”, “PWD=/home/tidb”, “LANG=en_US.UTF-8”, “TZ=Asia/Shanghai”, “PS1=\[\e]0;\a\]\n\[\e[1;32m\]\[\e[1;33m\]\H
[\e[1;35m\]<$(date +”%Y-%m-%d %T")> \[\e[32m\]\w\[\e[0m\]\n\u>\$ ", “HISTCONTROL=ignoredups”, “SHLVL=1”, “HOME=/home/tidb”, “LOGNAME=tidb”, “LES
SOPEN=||/usr/bin/lesspipe.sh %s”, “_=/home/tidb/.tiup/bin/tiup”, “OLDPWD=/home/tidb/tidb-ansible”]}
2020-06-17T09:52:17.599+0800 INFO Execute command finished {“code”: 1, “error”: “cannot execute command on non-exists cluster test-cluster”, “erro
rVerbose”: “cannot execute command on non-exists cluster test-cluster\ngithub.com/pingcap/tiup/components/cluster/command.newExecCmd.func1\n\tgithub.com/pingca
p/tiup@/components/cluster/command/exec.go:47\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobr
a.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:88
7\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\nmain.main\n\tgithub.com/pingc
ap/tiup@/components/cluster/main.go:19\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357”}

你这个集群名字不存在,

Error: cannot execute command on non-exists cluster test-cluster
错误:无法在不存在的集群测试集群上执行命令

可能是减号引起的问题,尝试将集群名称用单引号包裹起来

还是一样,报错和上面一样。 cluster_name = test-cluster 名字也没错。

sorry,该集群还没有导入成功,所以在 tiup 中还未识别该集群名称。

回归到问题本身,在 tidb-ansible 目录执行下 ansible-playbook -i hosts.ini create_users.yml -u root -k 保证 host.ini 中存在 inventory 文件中所有的 ip。重新配置下 ssh 互信和 sudo 规则。

因为在报错信息中,获取到的还是权限问题,

tidb>$ ssh root@192.168.1.1
Last login: Wed Jun 17 13:45:18 2020 from 192.168.1.1

root># grep ‘ExecStart’ /etc/systemd/system/tiflash-9000.service | sed ‘s/ExecStart=//’
/data0/tidb/scripts/run_tiflash.sh

从tidb用户ssh到root没问题。:joy: 还是一样的错误

你好,

此命令无法证明 tidb 运维用户具有 sudo 权限,

  1. 按照楼上的方式重新执行下:ansible-playbook -i hosts.ini create_users.yml -u root -k
  2. 使用截图中的方式验证下,并执行 cat … 看是否成功。

其目的还是验证部署用户 tidb 是否有 sudo 权限,请配合完成下。

执行了。
root># su - tidb
Last login: Wed Jun 17 14:45:02 CST 2020 on pts/0
Last failed login: Wed Jun 17 14:48:01 CST 2020 from 192.168.1.1 on ssh:notty
There were 3 failed login attempts since the last successful login.

tidb>$ sudo su -
Last login: Wed Jun 17 14:53:32 CST 2020 from 192.168.xx.xx on pts/0
root># cat grep 'ExecStart' /etc/systemd/system/tiflash-9000.service | sed 's/ExecStart=//'
#!/bin/bash
set -e
ulimit -n 1000000

WARNING: This file was auto-generated. Do not edit!

All your edit might be overwritten!

cd “/data0/tidb” || exit 1

export RUST_BACKTRACE=1

export TZ=${TZ:-/etc/localtime}
export LD_LIBRARY_PATH=/data0/tidb/bin/tiflash:$LD_LIBRARY_PATH

echo -n 'sync … ’
stat=$(time sync)
echo ok
echo $stat

echo $$ > “status/tiflash.pid”

exec bin/tiflash/tiflash server --config-file conf/tiflash.toml

解决了。/etc/systemd/system/ 目录权限问题(不清楚什么时候这个目录权限改了),导致在tidb用户下无法cat。。。 谢谢!:tulip:

可以描述下 /etc/systemd/system/ 之前是什么权限,后来变更为什么权限,问题得到解决的吗

tidb>$ ll -lht
total 28K
drw-r–r--. 4 root root 4.0K May 19 11:51 system

root># chmod 775 system

root># ll -lht
total 28K
drwxrwxr-x. 4 root root 4.0K May 19 11:51 system

这个问题解决了。但是用tiup升级又遇到问题了。我先摸索下。

ok,新的问题欢迎开新帖继续讨论,感谢配合。

谢谢你们才对,一直耐心的回答:tulip:

:smiling_face_with_three_hearts::smiling_face_with_three_hearts::smiling_face_with_three_hearts: