PD节点一个无法启动剩余两个节点工作正常。mydumper备份时报无法从这个节点取数据,部分表备份失败。

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
TiDB Version: v4.0.3
【问题描述】
PD节点无法启动,启动时报错如下:

pderr.log 每分钟都会出现这个报错:
goroutine 323 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0001451e0, 0xc00037ac00, 0x4, 0x4)
/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.uber.org/zap@v1.13.0/zapcore/entry.go:230 +0x546
go.uber.org/zap.(*Logger).Panic(0xc000108ea0, 0x23a464a, 0x29, 0xc00037ac00, 0x4, 0x4)
/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.uber.org/zap@v1.13.0/logger.go:225 +0x7f
go.etcd.io/etcd/mvcc.(*keyIndex).put(0xc000106c00, 0xc000108ea0, 0x42365ab, 0x0)
/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/key_index.go:82 +0x5fa
go.etcd.io/etcd/mvcc.restoreIntoIndex.func1(0xc0001041c0, 0xc000479a40, 0x2c689e0, 0xc000b25710, 0xc000108ea0)
/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/kvstore.go:505 +0x3e3
created by go.etcd.io/etcd/mvcc.restoreIntoIndex
/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/kvstore.go:473 +0xaa

tiup cluster start cluster_name -N ip:port命令执行时报错

retry error: operation timed out after 2m0s
pd 10.133.1.70:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance

Error: failed to restart: failed to start: failed to start pd: pd 10.133.1.70:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance: timed out waiting for port 2379 to be started after 2m0s

Verbose debug logs has been written to /DATA1/home/tidb/logs/tiup-cluster-debug-2021-05-10-15-23-03.log.
Error: run `` (wd:/DATA1/home/tidb/.tiup/data/SWzTcIG) failed: exit status 1

IP Active: activating (auto-restart) (Result: exit-code) since Mon 2021-05-10 14:49:05 CST; 1s ago

还有一个问题
PD是三副本,一个不能启动,其余正常,但是mydumper备份时,部分表格会报错,原因无法动这个宕机的PD取得相关备份数据。
Error dumping table (SCHEMA.table) data: query metric error: Get http:/IP:port/pd/api/v1/config: dial tcp IP:port: connect: connection refused

感谢请帮忙查看一下


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

1.麻烦看下 pd.log 中相关的报错信息是什么?
2.如果比较着急的话,可以考虑先将这个 pd 节点缩容掉,然后再重新扩容进去。

[2021/05/12 14:34:28.619 +08:00] [INFO] [server.go:184] [“register REST path”] [path=/pd/api/v1]
[2021/05/12 14:34:28.619 +08:00] [INFO] [server.go:184] [“register REST path”] [path=/swagger/]
[2021/05/12 14:34:28.620 +08:00] [INFO] [server.go:184] [“register REST path”] [path=/dashboard/api/]
[2021/05/12 14:34:28.620 +08:00] [INFO] [server.go:184] [“register REST path”] [path=/dashboard/]
[2021/05/12 14:34:28.620 +08:00] [INFO] [systime_mon.go:26] [“start system time monitor”]
[2021/05/12 14:34:28.620 +08:00] [INFO] [etcd.go:117] [“configuring peer listeners”] [listen-peer-urls="[http://10.133.1.70:2380]"]
[2021/05/12 14:34:28.620 +08:00] [INFO] [etcd.go:127] [“configuring client listeners”] [listen-client-urls="[http://0.0.0.0:2379]"]
[2021/05/12 14:34:28.620 +08:00] [INFO] [etcd.go:602] [“pprof is enabled”] [path=/debug/pprof]
[2021/05/12 14:34:28.621 +08:00] [INFO] [etcd.go:299] [“starting an etcd server”] [etcd-version=3.4.3] [git-sha=“Not provided (use ./build instead of go build)”] [go-version=go1.13] [go-os=linux] [go-arch=amd64] [max-cpu-set=56] [max-cpu-available=56] [member-initialized=true] [name=pd-10.133.1.70-2379] [data-dir=/DATA1/home/tidb/tidb-data/pd-2379] [wal-dir=] [wal-dir-dedicated=] [member-dir=/DATA1/home/tidb/tidb-data/pd-2379/member] [force-new-cluster=false] [heartbeat-interval=500ms] [election-timeout=3s] [initial-election-tick-advance=true] [snapshot-count=100000] [snapshot-catchup-entries=5000] [initial-advertise-peer-urls="[http://10.133.1.70:2380]"] [listen-peer-urls="[http://10.133.1.70:2380]"] [advertise-client-urls="[http://10.133.1.70:2379]"] [listen-client-urls="[http://0.0.0.0:2379]"] [listen-metrics-urls="[]"] [cors="[]"] [host-whitelist="[]"] [initial-cluster=] [initial-cluster-state=new] [initial-cluster-token=] [quota-size-bytes=8589934592] [pre-vote=true] [initial-corrupt-check=false] [corrupt-check-time-interval=0s] [auto-compaction-mode=periodic] [auto-compaction-retention=1h0m0s] [auto-compaction-interval=1h0m0s] [discovery-url=] [discovery-proxy=]
[2021/05/12 14:34:28.657 +08:00] [INFO] [backend.go:79] [“opened backend db”] [path=/DATA1/home/tidb/tidb-data/pd-2379/member/snap/db] [took=36.032964ms]
[2021/05/12 14:34:28.661 +08:00] [INFO] [server.go:443] [“recovered v2 store from snapshot”] [snapshot-index=69500703] [snapshot-size=“17 kB”]
[2021/05/12 14:34:28.661 +08:00] [INFO] [kvstore.go:378] [“restored last compact revision”] [meta-bucket-name=meta] [meta-bucket-name-key=finishedCompactRev] [restored-compact-revision=69465608]
[2021/05/12 14:34:28.682 +08:00] [FATAL] [kvstore.go:521] [“failed to unmarshal mvccpb.KeyValue”] [error=“proto: wrong wireType = 1 for field Lease”] [stack=“go.etcd.io/etcd/mvcc.restoreChunk\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/kvstore.go:521\ go.etcd.io/etcd/mvcc.(*store).restore\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/kvstore.go:404\ngo.etcd.io/etcd/mvcc.NewStore\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/kvstore.go:156\ go.etcd.io/etcd/mvcc.newWatchableStore\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/watchable_store.go:78\ go.etcd.io/etcd/mvcc.New\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/mvcc/watchable_store.go:73\ go.etcd.io/etcd/etcdserver.recoverSnapshotBackend\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/etcdserver/backend.go:105\ go.etcd.io/etcd/etcdserver.NewServer\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/etcdserver/server.go:452\ go.etcd.io/etcd/embed.StartEtcd\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/embed/etcd.go:211\ github.com/pingcap/pd/v4/server.(*Server).startEtcd\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/src/github.com/pingcap/pd/server/server.go:259\ngithub.com/pingcap/pd/v4/server.(*Server).Run\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/src/github.com/pingcap/pd/server/server.go:443\ main.main\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.3/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:118\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203”]

pd.log内容如下 半分钟左右刷新一次

1.麻烦通过 pd-ctl 工具反馈下 memeber 信息;
2.看下 tikv 日志中有无该 pd 相关的报错信息。