pd节点主机意外重启后pd服务启动失败tocommit(34008955) is out of range

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:5.7.10-TiDB-v2.0.2
  • 【集群节点分布】:3tidb、3tikv、3pd,分布在3主机上,docker部署
  • 【问题描述】:1主机意外关机一段时间,上有tidb、tikv、pd各一个服务。开机后,重新启动这3个服务,tidb和tikv正常,pd不正常。pd日志如下:
2020/04/13 03:52:02.720 log.go:86: [info] etcdserver/membership: [set the cluster version to 3.2 from store]
2020/04/13 03:52:02.722 log.go:86: [info] mvcc: [restore compact to 17856181]
2020/04/13 03:52:02.728 log.go:82: [warning] auth: [simple token is not cryptographically signed]
2020/04/13 03:52:02.730 log.go:86: [info] rafthttp: [starting peer 7d04ea7389cf0185...]
2020/04/13 03:52:02.730 log.go:86: [info] rafthttp: [started HTTP pipelining with peer 7d04ea7389cf0185]
2020/04/13 03:52:02.731 log.go:86: [info] rafthttp: [started streaming with peer 7d04ea7389cf0185 (writer)]
2020/04/13 03:52:02.732 log.go:86: [info] rafthttp: [started streaming with peer 7d04ea7389cf0185 (writer)]
2020/04/13 03:52:02.732 log.go:86: [info] rafthttp: [started peer 7d04ea7389cf0185]
2020/04/13 03:52:02.732 log.go:86: [info] rafthttp: [added peer 7d04ea7389cf0185]
2020/04/13 03:52:02.732 log.go:86: [info] rafthttp: [starting peer af6b3a2505e67855...]
2020/04/13 03:52:02.732 log.go:86: [info] rafthttp: [started HTTP pipelining with peer af6b3a2505e67855]
2020/04/13 03:52:02.733 log.go:86: [info] rafthttp: [started streaming with peer 7d04ea7389cf0185 (stream Message reader)]
2020/04/13 03:52:02.733 log.go:86: [info] rafthttp: [started streaming with peer 7d04ea7389cf0185 (stream MsgApp v2 reader)]
2020/04/13 03:52:02.733 log.go:86: [info] rafthttp: [started streaming with peer af6b3a2505e67855 (writer)]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [started streaming with peer af6b3a2505e67855 (writer)]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [started peer af6b3a2505e67855]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [added peer af6b3a2505e67855]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [started streaming with peer af6b3a2505e67855 (stream Message reader)]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [started streaming with peer af6b3a2505e67855 (stream MsgApp v2 reader)]
2020/04/13 03:52:02.737 log.go:86: [info] etcdserver: [starting server... [version: 3.2.18, cluster version: 3.2]]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [peer 7d04ea7389cf0185 became active]
2020/04/13 03:52:02.737 log.go:86: [info] rafthttp: [established a TCP streaming connection with peer 7d04ea7389cf0185 (stream MsgApp v2 reader)]
2020/04/13 03:52:02.738 log.go:86: [info] rafthttp: [established a TCP streaming connection with peer 7d04ea7389cf0185 (stream Message reader)]
2020/04/13 03:52:02.738 log.go:86: [info] rafthttp: [peer af6b3a2505e67855 became active]
2020/04/13 03:52:02.738 log.go:86: [info] rafthttp: [established a TCP streaming connection with peer af6b3a2505e67855 (stream MsgApp v2 reader)]
2020/04/13 03:52:02.738 log.go:86: [info] rafthttp: [established a TCP streaming connection with peer af6b3a2505e67855 (stream MsgApp v2 writer)]
2020/04/13 03:52:02.738 log.go:86: [info] rafthttp: [established a TCP streaming connection with peer af6b3a2505e67855 (stream Message reader)]
2020/04/13 03:52:02.738 log.go:86: [info] rafthttp: [established a TCP streaming connection with peer af6b3a2505e67855 (stream Message writer)]
2020/04/13 03:52:02.739 log.go:78: [fatal] raft: [tocommit(34008955) is out of range [lastIndex(34008953)]. Was the raft log corrupted, truncated, or lost?]

如果pd大多数副本都在的话,可以通过 pd-ctl delete member,清空member目录,再 join 回来。

不过不知道能否帮忙提供一下 pd 的数据目录和 pd 的日志,我们想排查一下发生的原因。