KV 节点强制重启后频繁 【starting a new election】

【 TiDB 使用环境】
【概述】:tikv节点日志中出现大量 [starting a new election]
【背景】:tiup reload -N kv 节点超时,采用stop/start方式强制重启节点后,发现日志频繁出现 starting a new election
【现象】:集群数据同步阻塞,集群读写阻塞
【业务影响】:
【TiDB 版本】:4.0.0-rc
【附件】:

  • 相关日志

[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=31] [raft_id=76254896] [region_id=5028956]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 31”] [term=31] [raft_id=76254896] [region_id=5028956]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:949] ["[logterm: 261, index: 475873] sent request to 76156412"] [msg=MsgRequestPreVote] [term=262] [id=76156412] [log_index=475873] [log_term=261] [raft_id=11657442] [region_id=11037222]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:928] [“76254896 received message from 76254896”] [term=31] [msg=MsgRequestPreVote] [from=76254896] [id=76254896] [raft_id=76254896] [region_id=5028956]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:949] ["[logterm: 261, index: 475873] sent request to 11626083"] [msg=MsgRequestPreVote] [term=262] [id=11626083] [log_index=475873] [log_term=261] [raft_id=11657442] [region_id=11037222]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:949] ["[logterm: 30, index: 118] sent request to 69117052"] [msg=MsgRequestPreVote] [term=31] [id=69117052] [log_index=118] [log_term=30] [raft_id=76254896] [region_id=5028956]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:949] ["[logterm: 30, index: 118] sent request to 75733982"] [msg=MsgRequestPreVote] [term=31] [id=75733982] [log_index=118] [log_term=30] [raft_id=76254896] [region_id=5028956]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=38] [raft_id=76235526] [region_id=4981503]
[2022/01/26 11:45:52.473 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 38”] [term=38] [raft_id=76235526] [region_id=4981503]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:928] [“76235526 received message from 76235526”] [term=38] [msg=MsgRequestPreVote] [from=76235526] [id=76235526] [raft_id=76235526] [region_id=4981503]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:949] ["[logterm: 37, index: 80] sent request to 75165418"] [msg=MsgRequestPreVote] [term=38] [id=75165418] [log_index=80] [log_term=37] [raft_id=76235526] [region_id=4981503]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:949] ["[logterm: 37, index: 80] sent request to 67404619"] [msg=MsgRequestPreVote] [term=38] [id=67404619] [log_index=80] [log_term=37] [raft_id=76235526] [region_id=4981503]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=36] [raft_id=76276925] [region_id=133965]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 36”] [term=36] [raft_id=76276925] [region_id=133965]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:928] [“76276925 received message from 76276925”] [term=36] [msg=MsgRequestPreVote] [from=76276925] [id=76276925] [raft_id=76276925] [region_id=133965]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:949] ["[logterm: 35, index: 142] sent request to 5581808"] [msg=MsgRequestPreVote] [term=36] [id=5581808] [log_index=142] [log_term=35] [raft_id=76276925] [region_id=133965]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:949] ["[logterm: 35, index: 142] sent request to 76025480"] [msg=MsgRequestPreVote] [term=36] [id=76025480] [log_index=142] [log_term=35] [raft_id=76276925] [region_id=133965]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=14] [raft_id=72665270] [region_id=48411985]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 14”] [term=14] [raft_id=72665270] [region_id=48411985]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:928] [“72665270 received message from 72665270”] [term=14] [msg=MsgRequestPreVote] [from=72665270] [id=72665270] [raft_id=72665270] [region_id=48411985]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:949] ["[logterm: 13, index: 101] sent request to 76159687"] [msg=MsgRequestPreVote] [term=14] [id=76159687] [log_index=101] [log_term=13] [raft_id=72665270] [region_id=48411985]
[2022/01/26 11:45:52.474 +08:00] [INFO] [raft.rs:949] ["[logterm: 13, index: 101] sent request to 73941203"] [msg=MsgRequestPreVote] [term=14] [id=73941203] [log_index=101] [log_term=13] [raft_id=72665270] [region_id=48411985]
[2022/01/26 11:45:52.477 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=8] [raft_id=58098374] [region_id=58098372]
[2022/01/26 11:45:52.477 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 8”] [term=8] [raft_id=58098374] [region_id=58098372]
[2022/01/26 11:45:52.477 +08:00] [INFO] [raft.rs:928] [“58098374 received message from 58098374”] [term=8] [msg=MsgRequestPreVote] [from=58098374] [id=58098374] [raft_id=58098374] [region_id=58098372]
[2022/01/26 11:45:52.477 +08:00] [INFO] [raft.rs:949] ["[logterm: 7, index: 14014] sent request to 58098373"] [msg=MsgRequestPreVote] [term=8] [id=58098373] [log_index=14014] [log_term=7] [raft_id=58098374] [region_id=58098372]
[2022/01/26 11:45:52.477 +08:00] [INFO] [raft.rs:949] ["[logterm: 7, index: 14014] sent request to 72699070"] [msg=MsgRequestPreVote] [term=8] [id=72699070] [log_index=14014] [log_term=7] [raft_id=58098374] [region_id=58098372]
[2022/01/26 11:45:52.480 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=18] [raft_id=12885463] [region_id=12885461]
[2022/01/26 11:45:52.480 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 18”] [term=18] [raft_id=12885463] [region_id=12885461]
[2022/01/26 11:45:52.480 +08:00] [INFO] [raft.rs:928] [“12885463 received message from 12885463”] [term=18] [msg=MsgRequestPreVote] [from=12885463] [id=12885463] [raft_id=12885463] [region_id=12885461]
[2022/01/26 11:45:52.480 +08:00] [INFO] [raft.rs:949] ["[logterm: 17, index: 69] sent request to 75166851"] [msg=MsgRequestPreVote] [term=18] [id=75166851] [log_index=69] [log_term=17] [raft_id=12885463] [region_id=12885461]
[2022/01/26 11:45:52.480 +08:00] [INFO] [raft.rs:949] ["[logterm: 17, index: 69] sent request to 74595842"] [msg=MsgRequestPreVote] [term=18] [id=74595842] [log_index=69] [log_term=17] [raft_id=12885463] [region_id=12885461]
[2022/01/26 11:45:52.489 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=27] [raft_id=5167874] [region_id=5167872]
[2022/01/26 11:45:52.489 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 27”] [term=27] [raft_id=5167874] [region_id=5167872]
[2022/01/26 11:45:52.489 +08:00] [INFO] [raft.rs:928] [“5167874 received message from 5167874”] [term=27] [msg=MsgRequestPreVote] [from=5167874] [id=5167874] [raft_id=5167874] [region_id=5167872]
[2022/01/26 11:45:52.489 +08:00] [INFO] [raft.rs:949] ["[logterm: 26, index: 56] sent request to 65992560"] [msg=MsgRequestPreVote] [term=27] [id=65992560] [log_index=56] [log_term=26] [raft_id=5167874] [region_id=5167872]
[2022/01/26 11:45:52.490 +08:00] [INFO] [raft.rs:949] ["[logterm: 26, index: 56] sent request to 75758579"] [msg=MsgRequestPreVote] [term=27] [id=75758579] [log_index=56] [log_term=26] [raft_id=5167874] [region_id=5167872]
[2022/01/26 11:45:52.490 +08:00] [INFO] [raft.rs:1203] [“starting a new election”] [term=23] [raft_id=76303898] [region_id=67857414]
[2022/01/26 11:45:52.490 +08:00] [INFO] [raft.rs:859] [“became pre-candidate at term 23”] [term=23] [raft_id=76303898] [region_id=67857414]
[2022/01/26 11:45:52.490 +08:00] [INFO] [raft.rs:928] [“76303898 received message from 76303898”] [term=23] [msg=MsgRequestPreVote] [from=76303898] [id=76303898] [raft_id=76303898] [region_id=67857414]
[2022/01/26 11:45:52.490 +08:00] [INFO] [raft.rs:949] ["[logterm: 22, index: 413871] sent request to 67857417"] [msg=MsgRequestPreVote] [term=23] [id=67857417] [log_index=413871] [log_term=22] [raft_id=76303898] [region_id=67857414]
[2022/01/26 11:45:52.490 +08:00] [INFO] [raft.rs:949] ["[logterm: 22, index: 413871] sent request to 67857415"] [msg=MsgRequestPreVote] [term=23] [id=67857415] [log_index=413871] [log_term=22] [raft_id=76303898] [region_id=67857414]

  • 节点信息

{
“store”: {
“id”: 4,
“address”: “192.168.100.151:20160”,
“labels”: [
{
“key”: “host”,
“value”: “tikv1”
},
{
“key”: “snap-gc-timeout”,
“value”: “1h”
}
],
“version”: “4.0.0-rc”,
“status_address”: “192.168.100.151:20180”,
“git_hash”: “f45d0c963df3ee4b1011caf5eb146cacd1fbbad8”,
“start_timestamp”: 1643139054,
“binary_path”: “/usr/local/tidb/tikv-20160/bin/tikv-server”,
“last_heartbeat”: 1643169120528013934,
“state_name”: “Up”
},
“status”: {
“capacity”: “879.6GiB”,
“available”: “191.4GiB”,
“used_size”: “672.6GiB”,
“leader_count”: 260,
“leader_weight”: 1,
“leader_score”: 260,
“leader_size”: 20846,
“region_count”: 39654,
“region_weight”: 1,
“region_score”: 476218276.5034647,
“region_size”: 3141193,
“start_ts”: “2022-01-26T03:30:54+08:00”,
“last_heartbeat_ts”: “2022-01-26T11:52:00.528013934+08:00”,
“uptime”: “8h21m6.528013934s”
}
}

1 个赞

tikv有多少个节点?

10个

  1. 辛苦导出一下问题时间段(包含问题前后 30 分钟)的 TiKV-detail 的监控。https://docs.pingcap.com/zh/tidb/stable/exporting-grafana-snapshots/#将-grafana-监控数据导出成快照
  2. 麻烦提供一下问题 tikv 完整的 日志。
1 个赞

Test-Cluster-TiKV-Details_2022-01-26T06_55_02.381Z.json (19.2 MB)

tikv 日志tikv.log (21.0 MB)

这个问题有后续吗?我们也遇到了。我们的场景是某个tikv节点自动重启后,频繁starting a new election,tikv.log 20分钟轮转一个300M的日志文件

我的场景符合这个帖子(https://asktug.com/t/topic/68329)中的描述,参考其中的解决方案搞定了。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。