线上pd异常切换

版本:tidb 4.0.13
操作:无操作
问题:pd异常切主

旧主报错如下(还有一些PD:etcd:ErrEtcdKVPut的报错)
[2022/08/25 17:32:21.656 +08:00] [ERROR] [tso.go:302] [“invalid timestamp”] [timestamp={}]
[2022/08/25 17:32:21.665 +08:00] [ERROR] [tso.go:302] [“invalid timestamp”] [timestamp={}]
[2022/08/25 17:32:21.673 +08:00] [ERROR] [tso.go:302] [“invalid timestamp”] [timestamp={}]

新主日志如下

[2022/08/25 17:32:43.058 +08:00] [INFO] [raft.go:923] [“6cf576fb46c07868 is starting a new election at term 2”]
[2022/08/25 17:32:43.063 +08:00] [INFO] [raft.go:729] [“6cf576fb46c07868 became pre-candidate at term 2”]
[2022/08/25 17:32:43.063 +08:00] [INFO] [raft.go:824] [“6cf576fb46c07868 received MsgPreVoteResp from 6cf576fb46c07868 at term 2”]
[2022/08/25 17:32:43.063 +08:00] [INFO] [raft.go:811] [“6cf576fb46c07868 [logterm: 2, index: 34283891] sent MsgPreVote request to a5bb61ba213ecdc0 at term 2”]
[2022/08/25 17:32:43.063 +08:00] [INFO] [raft.go:811] [“6cf576fb46c07868 [logterm: 2, index: 34283891] sent MsgPreVote request to b6aed577daad8738 at term 2”]
[2022/08/25 17:32:43.063 +08:00] [INFO] [node.go:331] [“raft.node: 6cf576fb46c07868 lost leader b6aed577daad8738 at term 2”]
[2022/08/25 17:32:43.066 +08:00] [INFO] [raft.go:824] [“6cf576fb46c07868 received MsgPreVoteResp from a5bb61ba213ecdc0 at term 2”]
[2022/08/25 17:32:43.066 +08:00] [INFO] [raft.go:1302] [“6cf576fb46c07868 has received 2 MsgPreVoteResp votes and 0 vote rejections”]
[2022/08/25 17:32:43.067 +08:00] [INFO] [raft.go:713] [“6cf576fb46c07868 became candidate at term 3”]
[2022/08/25 17:32:43.067 +08:00] [INFO] [raft.go:824] [“6cf576fb46c07868 received MsgVoteResp from 6cf576fb46c07868 at term 3”]
[2022/08/25 17:32:43.067 +08:00] [INFO] [raft.go:811] [“6cf576fb46c07868 [logterm: 2, index: 34283891] sent MsgVote request to a5bb61ba213ecdc0 at term 3”]
[2022/08/25 17:32:43.067 +08:00] [INFO] [raft.go:811] [“6cf576fb46c07868 [logterm: 2, index: 34283891] sent MsgVote request to b6aed577daad8738 at term 3”]
[2022/08/25 17:32:43.067 +08:00] [INFO] [raft.go:824] [“6cf576fb46c07868 received MsgVoteResp from a5bb61ba213ecdc0 at term 3”]
[2022/08/25 17:32:50.469 +08:00] [ERROR] [client.go:172] [“region sync with leader meet error”] [error="[PD:grpc:ErrGRPCRecv]rpc error: code = Canceled desc = context canceled"]
[2022/08/25 17:32:51.470 +08:00] [INFO] [server.go:1099] [“leader changed, try to campaign leader”]
[2022/08/25 17:32:51.470 +08:00] [INFO] [server.go:1115] [“start to campaign leader”] [campaign-leader-name=pd-192.168.1.2-13115]
[2022/08/25 17:32:51.472 +08:00] [INFO] [server.go:1134] [“campaign leader ok”] [campaign-leader-name=pd-192.168.1.2-13115]
[2022/08/25 17:32:51.485 +08:00] [INFO] [server.go:158] [“establish sync region stream”] [requested-server=pd-192.168.1.1-13115] [url=http://192.168.1.1:13115]
[2022/08/25 17:32:51.485 +08:00] [INFO] [server.go:176] [“requested server has already in sync with server”] [requested-server=pd-192.168.1.1-13115] [server=pd-192.168.1.2-13115] [last-
index=1089874144]
[2022/08/25 17:32:51.492 +08:00] [INFO] [tso.go:298] [“sync hasn’t completed yet, wait for a while”]
[2022/08/25 17:32:51.495 +08:00] [INFO] [tso.go:298] [“sync hasn’t completed yet, wait for a while”]
[2022/08/25 17:32:51.514 +08:00] [INFO] [tso.go:298] [“sync hasn’t completed yet, wait for a while”]

看起来像是更新时间戳出现了问题,想了解一下这个问题是什么原因导致的(都有哪些原因导致更新时间戳会失败

PD leader 失效后,会发起重新选举,然后 获取新任期的 leader,会重置 TSO(满足单调性,避免重复)

其他的 成员也会同步这些数据,会有短暂的不可用

当leader 选举和同步过程完成后,就可以正式承担服务~

有没有检查下当时的网络有没有问题,和操作系统时间有问题没

看了网络,没啥问题,操作系统时间没法追溯了

正常情况下,PD leader的任期时间是多长?

image

别的地方找的图,参考看看

不是很懂,难道超过3S,就触发pd leader 选举了

就是说这是正常的报错日志,原Leader超时3秒(默认)就会触发重新选举?