aws部署异常

【TiDB 使用环境】生产环境 /测试/ Poc
【TiDB 版本】8.5.4
【操作系统】 aws linux 2023
【部署方式】AWS
【集群数据量】 20G
【集群节点数】8
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
[ec2-user@tidb-adm ~]$ tiup cluster deploy tidb-test v8.5.4 ./topology.yaml --user ec2-user -i ./wazuh.pem

  • Detect CPU Arch Name

    • Detecting node 172.244.0.143 Arch info … ⠇ Shell: host=172.244.0.143, sudo=false, command=uname -m
    • Detecting node 172.244.0.146 Arch info … ⠇ Shell: host=172.244.0.146, sudo=false, command=uname -m
  • Detect CPU Arch Name

    • Detecting node 172.244.0.143 Arch info … Done
    • Detecting node 172.244.0.146 Arch info … Done
    • Detecting node 172.244.0.165 Arch info … Done
    • Detecting node 172.244.0.241 Arch info … Done
    • Detecting node 172.244.0.135 Arch info … Done
    • Detecting node 172.244.0.151 Arch info … Done
    • Detecting node 172.244.0.50 Arch info … Done
    • Detecting node 172.244.0.55 Arch info … Done
  • Detect CPU OS Name

    • Detecting node 172.244.0.143 OS info … ⠸ Shell: host=172.244.0.143, sudo=false, command=uname -s
    • Detecting node 172.244.0.146 OS info … ⠸ Shell: host=172.244.0.146, sudo=false, command=uname -s
  • Detect CPU OS Name

    • Detecting node 172.244.0.143 OS info … Done
    • Detecting node 172.244.0.146 OS info … Done
    • Detecting node 172.244.0.165 OS info … Done
    • Detecting node 172.244.0.241 OS info … Done
    • Detecting node 172.244.0.135 OS info … Done
    • Detecting node 172.244.0.151 OS info … Done
    • Detecting node 172.244.0.50 OS info … Done
    • Detecting node 172.244.0.55 OS info … Done
      Please confirm your topology:
      Cluster type: tidb
      Cluster name: tidb-test
      Cluster version: v8.5.4
      Role Host Ports OS/Arch Directories

pd 172.244.0.143 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
pd 172.244.0.146 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
pd 172.244.0.165 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
tikv 172.244.0.241 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tikv 172.244.0.135 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tikv 172.244.0.151 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tidb 172.244.0.143 4000/10080 linux/x86_64 /tidb-deploy/tidb-4000
tidb 172.244.0.146 4000/10080 linux/x86_64 /tidb-deploy/tidb-4000
tiflash 172.244.0.50 9000/3930/20170/20292/8234/8123 linux/x86_64 /tidb-deploy/tiflash-9000,/tidb-data/tiflash-9000
prometheus 172.244.0.55 9090/9115/9100/12020 linux/x86_64 /tidb-deploy/prometheus-9090,/tidb-data/prometheus-9090
grafana 172.244.0.55 3000 linux/x86_64 /tidb-deploy/grafana-3000
alertmanager 172.244.0.55 9093/9094 linux/x86_64 /tidb-deploy/alertmanager-9093,/tidb-data/alertmanager-9093
Attention:
1. If the topology is not what you expected, check your yaml file.
2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: (default=N)
Do you want to continue? [y/N]: (default=N) y

  • Generate SSH keys … Done

  • Download TiDB components

  • Download TiDB components

  • Download TiDB components

    • Download pd:v8.5.4 (linux/amd64) … Done
    • Download tikv:v8.5.4 (linux/amd64) … ⠴ Download: component=tikv, version=v8.5.4, os=linux, arch=amd64
    • Download tidb:v8.5.4 (linux/amd64) … Done
    • Download tiflash:v8.5.4 (linux/amd64) … ⠴ Download: component=tiflash, version=v8.5.4, os=linux, arch=amd64
    • Download prometheus:v8.5.4 (linux/amd64) … Done
    • Download grafana:v8.5.4 (linux/amd64) … Done
    • Download alertmanager: (linux/amd64) … ⠴ Download: component=alertmanager, version=, os=linux, arch=amd64
    • Download node_exporter: (linux/amd64) … ⠴ Download: component=node_exporter, version=, os=linux, arch=amd64
    • Download blackbox_exporter: (linux/amd64) … ⠴ Download: component=blackbox_exporter, version=, os=linux, arch=amd64
      download https://tiup-mirrors.pingcap.com/tikv-v8.5.4-linux-amd64.tar.gz 1.84 MiB / 365.44 MiB 0.50% ? MiB/s[ec2-user@tidb-adm ~]$ tiup cluster deploy tidb-test v8.5.4 ./topology.yaml --user root -i ./wazuh.pem
  • Detect CPU Arch Name

  • Detect CPU Arch Name

    • Detecting node 172.244.0.143 Arch info … ⠇ Shell: host=172.244.0.143, sudo=false, command=uname -m
  • Detect CPU Arch Name

    • Detecting node 172.244.0.143 Arch info … Error
    • Detecting node 172.244.0.146 Arch info … Error
    • Detecting node 172.244.0.165 Arch info … Error
    • Detecting node 172.244.0.241 Arch info … Error
    • Detecting node 172.244.0.135 Arch info … Error
    • Detecting node 172.244.0.151 Arch info … Error
    • Detecting node 172.244.0.50 Arch info … Error
    • Detecting node 172.244.0.55 Arch info … Error

Error: failed to fetch cpu-arch or kernel-name: executor.ssh.execute_failed: Failed to execute command over SSH for ‘root@172.244.0.146:22’ {ssh_stderr: , ssh_stdout: Please login as the user “ec2-user” rather than the user “root”.
, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; uname -m}, cause: Process exited with status 142

Verbose debug logs has been written to /home/ec2-user/.tiup/logs/tiup-cluster-debug-2025-12-17-10-53-06.log.
[ec2-user@tidb-adm ~]$ tiup cluster list
Name User Version Path PrivateKey


/home/ec2-user/.tiup/logs/tiup-cluster-debug-2025-12-17-10-53-06.log:
2025-12-17T10:53:06.553Z INFO Execute command finished {“code”: 1, “error”: “failed to fetch cpu-arch or kernel-name: executor.ssh.execute_failed: Failed to execute command over SSH for ‘root@172.244.0.146:22’ {ssh_stderr: , ssh_stdout: Please login as the user "ec2-user" rather than the user "root".\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; uname -m}, cause: Process exited with status 142”, “errorVerbose”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘root@172.244.0.146:22’ {ssh_stderr: , ssh_stdout: Please login as the user "ec2-user" rather than the user "root".\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; uname -m}, cause: Process exited with status 142\n at github.com/pingcap/tiup/pkg/cluster/executor.(*EasySSHExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/ssh.go:174\n at github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/checkpoint.go:86\n at github.com/pingcap/tiup/pkg/cluster/task.(*Shell).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/task/shell.go:43\n at github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\n at github.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111\n at github.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1()\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144\n at runtime.goexit()\n\truntime/asm_amd64.s:1700\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20250523034308-74f78ae071ee/juju_adaptor.go:15\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Shell).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/shell.go:50\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1700\nfailed to fetch cpu-arch or kernel-name”}

aws 主机用户 ec2-user 具备sudo权限,怎么提示 sudo=false ;是不是脚本有问题 .

sudo=false 和用户登录错误提示,并不是脚本问题,而是AWS EC2主机的一个常见限制。我认为是AWS Linux 2023 默认禁用了 root 用户的直接 SSH 登录 。

1 个赞

如何破解?

1 个赞

which sudo
env | grep -i sudo
sudo whoami
这些执行看下结果

1 个赞

你用ec2-user账号安装的tidb? 切换root试试

1 个赞


搞定了.
应该是主机配置太低导致异常.

1 个赞

那是换了机吗?还是直接云端主机更改了配置呀

1 个赞

换了主机,其实升级主机配置也可以.
#清理安装
tiup cluster clean tidb-test --all

3 个赞

换了机,又清理安装了,把问题也就搞定了,不错不错。我也跟着长点经验。

2 个赞

tiup cluster clean tidb-test --all 这个是清理? 学习学习

AWS 环境下使用使用ec2-user 部署(具备 sudo 权限),禁止使用 root 。重新执行部署命令时,需显式指定sudo 权限

另外从日志看,TiKV、TiFlash 等大组件下载进度极低,可能是网络带宽不足、TiUP 镜像源访问慢,或部署过程中终端中断导致

感觉是权限问题,你是用的root吗

基础设施信息获取不到,例如操作系统内核这一类的

清理一下安装,tiup cluster clean tidb-test --all