【TiDB 使用环境】测试
【TiDB 版本】
【操作系统】
【部署方式】云上部署(什么云)/机器部署(什么机器配置、什么硬盘)
【集群数据量】
【集群节点数】
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】
[2025/07/16 17:03:21.885 +08:00] [FATAL] [terror.go:309] [“unexpected error”] [error="[tikv:9001]PD server timeout: "] [stack=“github.com/pingcap/tidb/pkg/parser/terror.MustNil\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:309\nmain.createStoreDDLOwnerMgrAndDomain\n\t/workspace/source/tidb/cmd/tidb-server/main.go:417\nmain.main\n\t/workspace/source/tidb/cmd/tidb-server/main.go:320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:272”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.MustNil\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:309\nmain.createStoreDDLOwnerMgrAndDomain\n\t/workspace/source/tidb/cmd/tidb-server/main.go:417\nmain.main\n\t/workspace/source/tidb/cmd/tidb-server/main.go:320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:272”]
【其他附件:截图/日志/监控】
tidb小白
(Ti D Ber Emsp S Ckm)
2
额,你这报错,主要是pd server超时了,你看下pd的部署日志。
tiup playground v6.5.8 --host 0.0.0.0 --without-monitor --tiflash 0
你用playground先体验下吧
zhanggame1
(Ti D Ber G I13ecx U)
4
云上部署是不是虚拟机之间的端口被阻断了,需要开端口
也不行
[root@db01 ~]# tiup playground v6.5.8 --host 0.0.0.0 --without-monitor --tiflash 0
The component pd
version v6.5.8 is not installed; downloading from repository.
download https://tiup-mirrors.pingcap.com/pd-v6.5.8-linux-amd64.tar.gz 46.46 MiB / 46.46 MiB 100.00% 37.04 MiB/s
Start pd instance: v6.5.8
The component tikv
version v6.5.8 is not installed; downloading from repository.
download https://tiup-mirrors.pingcap.com/tikv-v6.5.8-linux-amd64.tar.gz 254.01 MiB / 254.01 MiB 100.00% 18.60 MiB/s
Start tikv instance: v6.5.8
The component tidb
version v6.5.8 is not installed; downloading from repository.
download https://tiup-mirrors.pingcap.com/tidb-v6.5.8-linux-amd64.tar.gz 67.89 MiB / 67.89 MiB 100.00% 24.51 MiB/s
Start tidb instance: v6.5.8
Waiting for tidb instances ready
- TiDB: 192.168.0.113:4000 … Error
pd quit: exit status 1
[2025/07/16 17:55:46.576 +08:00] [INFO] [raft.go:389] [“newRaft 1a9aae1034f4b4c4 [peers: , term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]”]
[2025/07/16 17:55:46.576 +08:00] [INFO] [raft.go:706] [“1a9aae1034f4b4c4 became follower at term 1”]
[2025/07/16 17:55:46.576 +08:00] [INFO] [raft.go:1523] [“1a9aae1034f4b4c4 switched to configuration voters=(1917035976030729412)”]
[2025/07/16 17:55:53.854 +08:00] [WARN] [store.go:1379] [“simple token is not cryptographically signed”]
[2025/07/16 17:56:02.224 +08:00] [INFO] [quota.go:126] [“enabled backend quota”] [quota-name=v3-applier] [quota-size-bytes=8589934592] [quota-size=“8.6 GB”]
[2025/07/16 17:56:14.188 +08:00] [INFO] [server.go:816] [“starting etcd server”] [local-member-id=1a9aae1034f4b4c4] [local-server-version=3.4.21] [cluster-version=to_be_decided]
[2025/07/16 17:56:14.188 +08:00] [INFO] [server.go:682] [“started as single-node; fast-forwarding election ticks”] [local-member-id=1a9aae1034f4b4c4] [forward-ticks=5] [forward-duration=2.5s] [election-ticks=6] [election-timeout=3s]
[2025/07/16 17:56:14.192 +08:00] [INFO] [etcd.go:585] [“serving peer traffic”] [address=“[::]:12124”]
[2025/07/16 17:56:14.192 +08:00] [INFO] [etcd.go:247] [“now serving peer/client/metrics”] [local-member-id=1a9aae1034f4b4c4] [initial-advertise-peer-urls=“[http://192.168.0.113:12124]”] [listen-peer-urls=“[http://0.0.0.0:12124]”] [advertise-client-urls=“[http://192.168.0.113:26210]”] [listen-client-urls=“[http://0.0.0.0:26210]”] [listen-metrics-urls=“”]
[2025/07/16 17:56:14.193 +08:00] [FATAL] [main.go:120] [“run server failed”] [error=“[PD:server:ErrCancelStartEtcd]etcd start canceled”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:120\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]
…
check detail log from: /root/.tiup/data/Ur6aDc8/pd-0/pd.log
tidb quit: exit status 1
[2025/07/16 17:57:23.286 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:24.286 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:25.287 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:26.287 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:27.288 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:28.289 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:29.290 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:30.291 +08:00] [WARN] [base_client.go:258] [“[pd] failed to get cluster id”] [url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused" target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[2025/07/16 17:57:31.292 +08:00] [WARN] [store.go:83] [“new store with retry failed”] [error=“[pd] failed to get cluster id”]
[2025/07/16 17:57:31.292 +08:00] [FATAL] [terror.go:300] [“unexpected error”] [error=“[pd] failed to get cluster id”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.createStoreAndDomain\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:317\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:219\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.createStoreAndDomain\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:317\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:219\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]
…
check detail log from: /root/.tiup/data/Ur6aDc8/tidb-0/tidb.log
启动pd失败了。这个服务器资源配置多少?再看看这个节点上2379这个端口有没有被占用?
netstat -anp|grep 2379看下吧
1、这里是打印出代码的报错位置的路径,不是说要访问 github.com。
2、根据楼主贴出来的报错信息,大概率是PD节点(集群的大脑、调度节点)没有正常起来,导致后面无法部署集群。
[2025/07/16 17:56:14.193 +08:00] [FATAL] [main.go:120] [“run server failed”] [error=“[PD:server:ErrCancelStartEtcd]etcd start canceled”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:120\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]
3、请登录机器 192.168.0.113 确认 端口服务是否正常,是否存在端口被占用、防火墙拦截等问题。
[url=http://192.168.0.113:26210] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused” target:192.168.0.113:26210 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial tcp 192.168.0.113:26210: connect: connection refused” target:192.168.0.113:26210 status:TRANSIENT_FAILURE”]
[root@db01 ~]# free -h
total used free shared buff/cache available
Mem: 9.8G 3.3G 4.9G 11M 1.6G 2.6G
Swap: 0B 0B 0B
[root@db01 ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 5.3G 0 5.3G 0% /dev
tmpfs tmpfs 5.3G 0 5.3G 0% /dev/shm
tmpfs tmpfs 5.3G 9.8M 5.3G 1% /run
tmpfs tmpfs 5.3G 0 5.3G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 33G 17G 16G 53% /
/dev/sda1 xfs 1.1G 180M 884M 17% /boot
/dev/mapper/centos-data_1 xfs 76G 65G 11G 86% /data/1
/dev/mapper/centos-home xfs 54G 1.9G 52G 4% /home
/dev/mapper/centos-data_log1 xfs 43G 39G 4.3G 90% /data/log1
tmpfs tmpfs 1.1G 13k 1.1G 1% /run/user/42
tmpfs tmpfs 1.1G 0 1.1G 0% /run/user/500
tmpfs tmpfs 1.1G 0 1.1G 0% /run/user/0
[root@db01 ~]# netstat -anp|grep 2379
[root@db01 ~]#
[root@db01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
[root@db01 ~]#
[root@db01 ~]# sestatus
SELinux status: disabled
[root@db01 ~]#
[root@db01 ~]#