region is unavailable, 无法选举新的leader

Bug 反馈
清晰准确地描述您发现的问题,提供任何可能复现问题的步骤有助于研发同学及时处理问题
【 Bug 的影响】
客户端读写报错,返回“region is unavailable”,重试不能恢复。
【可能的问题复现步骤】

【看到的非预期行为】

【期望看到的行为】

【相关组件及具体版本】
v4.0.14
【其他背景信息或者截图】
如集群拓扑,系统和内核版本,应用 app 信息等;如果问题跟 SQL 有关,请提供 SQL 语句和相关表的 Schema 信息;如果节点日志存在关键报错,请提供相关节点的日志内容或文件;如果一些业务敏感信息不便提供,请留下联系方式,我们与您私下沟通。

有问题的region在PD中信息如下:
region 131228717
{
“id”: 131228717,
“start_key”: “7480000000000000FFF75F7286F20B86E9FFC5ED480000000000FA”,
“end_key”: “7480000000000000FFF75F7286F21D58B0FF7178C60000000000FA”,
“epoch”: {
“conf_ver”: 230,
“version”: 80
},
“peers”: [
{
“id”: 192440095,
“store_id”: 190589610
},
{
“id”: 221727388,
“store_id”: 220255427
},
{
“id”: 223128009,
“store_id”: 221796927
}
],
“leader”: {
“id”: 223128009,
“store_id”: 221796927
},
“written_bytes”: 0,
“read_bytes”: 0,
“written_keys”: 0,
“read_keys”: 0,
“approximate_size”: 68,
“approximate_keys”: 65973
}

从相关的store看到的日志限制,这个region没有leader,一直在选举,根据store上存储的region信息可以看到他应该可以选举产生新的leader,但是日志限制它发起的prevote消息被拒绝了,原因是lease没有过期,这个leader没有过期一直如此,我尝试重启其中一个store,这个store上存储的region的epch信息比较落后,这个region重新选主成功。
这是我第二次看到这样的问题了,每次都是重启那个epoch落后的store,问题恢复。

1赞

贴一段 TiKV 的选举 Log 看看哈~

1赞

[2021/10/27 14:10:21.670 +08:00] [INFO] [raft.rs:971] ["[logterm: 30, index: 1031451, vote: 192440095] ignored vote from 223128009 [logterm: 29, index: 1030801]: lease
is not expired"] [“msg type”=MsgRequestPreVote] [“remaining ticks”=8] [term=30] [msg_index=1030801] [msg_term=29] [from=223128009] [vote=192440095] [log_index=1031451
] [log_term=30] [raft_id=221727388] [region_id=131228717]

一直打印这样的日志

1赞

从日志上看是说223128009这个副本租约不过期,不能选举,但是我重启的节点是store 190589610,peer 223128009并不在这个store上。

store 190589610上的日志:
[2021/10/27 14:09:24.448 +08:00] [INFO] [raft.rs:971] ["[logterm: 30, index: 1031451, vote: 192440095] ignored vote from 223128009 [logterm: 29, index: 1030801]: lease
is not expired"] [“msg type”=MsgRequestPreVote] [“remaining ticks”=8] [term=30] [msg_index=1030801] [msg_term=29] [from=223128009] [vote=192440095] [log_index=1031451
] [log_term=30] [raft_id=192440095] [region_id=131228717]

麻烦把问题 region 涉及到的 3 个 tikv 的节点日志压缩起来发一下吧

store 190589610上的日志:
[2021/10/02 06:42:35.237 +08:00] [INFO] [raft.rs:971] ["[logterm: 24, index: 1030796, vote: 221727388] ignored vote from 223128009 [logterm: 24, index: 1030796]: lease
is not expired"] [“msg type”=MsgRequestPreVote] [“remaining ticks”=9] [term=24] [msg_index=1030796] [msg_term=24] [from=223128009] [vote=221727388] [log_index=1030796
] [log_term=24] [raft_id=192440095] [region_id=131228717]

store 220255427上的日志:
[2021/10/02 02:56:37.217 +08:00] [INFO] [peer.rs:221] [“replicate peer”] [peer_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.217 +08:00] [INFO] [raft.rs:783] [“became follower at term 0”] [term=0] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.217 +08:00] [INFO] [raft.rs:285] [newRaft] [peers="[]"] [“last term”=0] [“last index”=0] [applied=0] [commit=0] [term=0] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.217 +08:00] [INFO] [raw_node.rs:222] [“RawNode created with id 223128009.”] [id=223128009] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.217 +08:00] [INFO] [raft.rs:1003] [“received a message with higher term from 221727388”] [“msg type”=MsgHeartbeat] [message_term=24] [term=0] [from=221727388] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.217 +08:00] [INFO] [raft.rs:783] [“became follower at term 24”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.616 +08:00] [INFO] [] [“Connect failed: {“created”:”@1633114597.615955262",“description”:“Failed to connect to remote host: FD Shutdown”,“file”:"/rust/registry/src/github.com-1ecc6299db9ec823/grpcio-sys-0.5.3/grpc/src/core/lib/iomgr/lockfree_event.cc",“file_line”:195,“os_error”:“Timeout occurred”,“referenced_errors”:[{“created”:"@1633114597.615948725",“description”:“connect() timed out”,“file”:"/rust/registry/src/github.com-1ecc6299db9ec823/grpcio-sys-0.5.3/grpc/src/core/lib/iomgr/tcp_client_posix.cc",“file_line”:112}],“target_address”:“ipv4:10.28.208.18:20274”}"]
[2021/10/02 02:56:37.616 +08:00] [INFO] [] [“Subchannel 0x7f1976888900: Retry immediately”]
[2021/10/02 02:56:37.616 +08:00] [INFO] [] [“Failed to connect to channel, retrying”]
[2021/10/02 02:56:37.616 +08:00] [WARN] [raft_client.rs:296] [“RPC batch_raft fail”] [err=“Some(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some(“failed to connect to all addresses”) }))”] [sink_err=“Some(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“failed to connect to all addresses”) })))”] [to_addr=10.28.208.18:20274]
[2021/10/02 02:56:37.616 +08:00] [INFO] [store.rs:2258] [“broadcasting unreachable”] [unreachable_store_id=222296723] [store_id=221796927]
[2021/10/02 02:56:37.662 +08:00] [WARN] [raft_client.rs:199] [“send to 10.28.208.18:20274 fail, the gRPC connection could be broken”]
[2021/10/02 02:56:37.662 +08:00] [ERROR] [transport.rs:163] [“send raft msg err”] [err=“Other(”[src/server/raft_client.rs:208]: RaftClient send fail")"]
[2021/10/02 02:56:37.674 +08:00] [INFO] [transport.rs:144] [“resolve store address ok”] [addr=10.28.208.18:20274] [store_id=222296723]
[2021/10/02 02:56:37.674 +08:00] [INFO] [raft_client.rs:48] [“server: new connection with tikv endpoint”] [addr=10.28.208.18:20274]
[2021/10/02 02:56:37.973 +08:00] [INFO] [snap.rs:225] [“saving snapshot file”] [file=/media/nvme0n1/tidb/deploy/tikv-20273/snap/rev_131228717_24_1030793_(default|lock|write).sst] [snap_key=131228717_24_1030793]
[2021/10/02 02:56:37.993 +08:00] [INFO] [raft.rs:1957] ["[commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 1030793, term: 24]"] [snapshot_term=24] [snapshot_index=1030793] [last_term=0] [last_index=0] [commit=0] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.994 +08:00] [INFO] [raft_log.rs:487] [“log [committed=0, applied=0, unstable.offset=1, unstable.entries.len()=0] starts to restore snapshot [index: 1030793, term: 24]”] [snapshot_term=24] [snapshot_index=1030793] [log=“committed=0, applied=0, unstable.offset=1, unstable.entries.len()=0”] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.994 +08:00] [INFO] [raft.rs:1893] ["[commit: 1030793, term: 24] restored snapshot [index: 1030793, term: 24]"] [snapshot_term=24] [snapshot_index=1030793] [commit=1030793] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.994 +08:00] [INFO] [peer_storage.rs:990] [“begin to apply snapshot”] [peer_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.994 +08:00] [INFO] [peer_storage.rs:1033] [“apply snapshot with state ok”] [state=“applied_index: 1030793 truncated_state { index: 1030793 term: 24 }”] [region=“id: 131228717 start_key: 7480000000000000FFF75F7286F20B86E9FFC5ED480000000000FA end_key: 7480000000000000FFF75F7286F21D58B0FF7178C60000000000FA region_epoch { conf_ver: 228 version: 80 } peers { id: 192440095 store_id: 190589610 } peers { id: 221727388 store_id: 220255427 } peers { id: 222330609 store_id: 222296724 } peers { id: 223128009 store_id: 221796927 is_learner: true }”] [peer_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.994 +08:00] [INFO] [peer.rs:2771] [“snapshot is applied”] [region=“id: 131228717 start_key: 7480000000000000FFF75F7286F20B86E9FFC5ED480000000000FA end_key: 7480000000000000FFF75F7286F21D58B0FF7178C60000000000FA region_epoch { conf_ver: 228 version: 80 } peers { id: 192440095 store_id: 190589610 } peers { id: 221727388 store_id: 220255427 } peers { id: 222330609 store_id: 222296724 } peers { id: 223128009 store_id: 221796927 is_learner: true }”] [peer_id=223128009] [region_id=131228717]
[2021/10/02 02:56:37.994 +08:00] [INFO] [region.rs:297] [“begin apply snap data”] [region_id=131228717]
[2021/10/02 02:56:38.245 +08:00] [INFO] [region.rs:362] [“apply new data”] [time_takes=241.677221ms] [region_id=131228717]

[2021/10/02 03:00:49.930 +08:00] [INFO] [apply.rs:1231] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { peer { id: 223128009 store_id: 221796927
} }”] [index=1030794] [term=24] [peer_id=223128009] [region_id=131228717]
[2021/10/02 03:00:49.930 +08:00] [INFO] [apply.rs:1570] [“exec ConfChange”] [epoch=“conf_ver: 228 version: 80”] [type=AddNode] [peer_id=223128009] [region_id=131228717
]
[2021/10/02 03:00:49.930 +08:00] [INFO] [apply.rs:1625] [“add peer successfully”] [region=“id: 131228717 start_key: 7480000000000000FFF75F7286F20B86E9FFC5ED480000000000FA end_key: 7480000000000000FFF75F7286F21D58B0FF7178C60000000000FA region_epoch { conf_ver: 228 version: 80 } peers { id: 192440095 store_id: 190589610 } peers { id: 221727388 store_id: 220255427 } peers { id: 222330609 store_id: 222296724 } peers { id: 223128009 store_id: 221796927 is_learner: true }”] [peer=“id: 223128009 store_id: 221796927”] [peer_id=223128009] [region_id=131228717]

[2021/10/02 06:42:17.018 +08:00] [INFO] [raft.rs:1177] [“starting a new election”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:17.018 +08:00] [INFO] [raft.rs:833] [“became pre-candidate at term 24”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:17.018 +08:00] [INFO] [raft.rs:902] [“223128009 received message from 223128009”] [term=24] [msg=MsgRequestPreVote] [from=223128009] [id=223128009] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:17.018 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 192440095"] [msg=MsgRequestPreVote] [term=24] [id=192440095] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:17.018 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 221727388"] [msg=MsgRequestPreVote] [term=24] [id=221727388] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]

[2021/10/02 06:42:35.235 +08:00] [INFO] [raft.rs:1177] [“starting a new election”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:35.235 +08:00] [INFO] [raft.rs:833] [“became pre-candidate at term 24”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:35.235 +08:00] [INFO] [raft.rs:902] [“223128009 received message from 223128009”] [term=24] [msg=MsgRequestPreVote] [from=223128009] [id=223128009] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:35.235 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 192440095"] [msg=MsgRequestPreVote] [term=24] [id=192440095] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:35.235 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 221727388"] [msg=MsgRequestPreVote] [term=24] [id=221727388] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]

[2021/10/02 06:42:53.286 +08:00] [INFO] [raft.rs:1177] [“starting a new election”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:53.286 +08:00] [INFO] [raft.rs:833] [“became pre-candidate at term 24”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:53.286 +08:00] [INFO] [raft.rs:902] [“223128009 received message from 223128009”] [term=24] [msg=MsgRequestPreVote] [from=223128009] [id=223128009] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:53.286 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 192440095"] [msg=MsgRequestPreVote] [term=24] [id=192440095] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 06:42:53.286 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 221727388"] [msg=MsgRequestPreVote] [term=24] [id=221727388] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]

[2021/10/02 06:43:06.584 +08:00] [INFO] [raft.rs:783] [“became follower at term 24”] [term=24] [raft_id=223128009] [region_id=131228717]

[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:1739] ["[term 24] received MsgTimeoutNow from 221727388 and starts an election to get leadership."] [from=221727388] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:1177] [“starting a new election”] [term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:807] [“became candidate at term 25”] [term=25] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:902] [“223128009 received message from 223128009”] [term=25] [msg=MsgRequestVote] [from=223128009] [id=223128009] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 192440095"] [msg=MsgRequestVote] [term=25] [id=192440095] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:923] ["[logterm: 24, index: 1030796] sent request to 221727388"] [msg=MsgRequestVote] [term=25] [id=221727388] [log_index=1030796] [log_term=24] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:1673] [“received from 221727388”] [term=25] [“msg type”=MsgRequestVoteResponse] [from=221727388] [raft_id=223128009] [region_id=131228717]
[2021/10/02 07:00:33.583 +08:00] [INFO] [raft.rs:874] [“became leader at term 25”] [term=25] [raft_id=223128009] [region_id=131228717]

[2021/10/02 07:03:48.540 +08:00] [INFO] [raft.rs:971] ["[logterm: 25, index: 1030797, vote: 223128009] ignored vote from 221727388 [logterm: 25, index: 1030797]: lease is not expired"] [“msg type”=MsgRequestPreVote] [“remaining ticks”=5] [term=25] [msg_index=1030797] [msg_term=25] [from=221727388] [vote=223128009] [log_index=1030797] [log_term=25] [raft_id=223128009] [region_id=131228717]
[2021/10/02 03:06:35.939 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 222330609 store_id: 222296724”] [change_type=RemoveNode] [region_id=131228717]
[2021/10/02 03:06:35.939 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 223128489 store_id: 221796927 is_learner: true”] [change_type=AddLearnerNode] [re
gion_id=164176396]
[2021/10/02 03:06:35.939 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 223128556 store_id: 221796927 is_learner: true”] [change_type=AddLearnerNode] [re
gion_id=221441578]
[2021/10/02 03:06:35.940 +08:00] [INFO] [peer.rs:2763] [“propose conf change peer”] [change_peer=222330609] [change_type=RemoveNode] [peer_id=221727388] [region_id=131
228717]
[2021/10/02 03:06:35.941 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 223128338 store_id: 221796927”] [change_type=AddNode] [region_id=177832663]
[2021/10/02 03:06:35.941 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 223128453 store_id: 221796927”] [change_type=AddNode] [region_id=219926510]
[2021/10/02 03:06:35.942 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 223128340 store_id: 221796927”] [change_type=AddNode] [region_id=116071690]
[2021/10/02 03:06:35.942 +08:00] [INFO] [pd.rs:782] [“try to change peer”] [peer=“id: 223128339 store_id: 221796927”] [change_type=AddNode] [region_id=204215808]
[2021/10/02 03:06:35.948 +08:00] [INFO] [apply.rs:1231] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { change_type: RemoveNode peer { id: 22233
0609 store_id: 222296724 } }”] [index=1030795] [term=24] [peer_id=221727388] [region_id=131228717]
[2021/10/02 03:06:35.948 +08:00] [INFO] [apply.rs:1570] [“exec ConfChange”] [epoch=“conf_ver: 229 version: 80”] [type=RemoveNode] [peer_id=221727388] [region_id=131228
717]
[2021/10/02 03:06:35.948 +08:00] [INFO] [apply.rs:1678] [“remove peer successfully”] [region=“id: 131228717 start_key: 7480000000000000FFF75F7286F20B86E9FFC5ED48000000
0000FA end_key: 7480000000000000FFF75F7286F21D58B0FF7178C60000000000FA region_epoch { conf_ver: 229 version: 80 } peers { id: 192440095 store_id: 190589610 } peers { i
d: 221727388 store_id: 220255427 } peers { id: 222330609 store_id: 222296724 } peers { id: 223128009 store_id: 221796927 }”] [peer=“id: 222330609 store_id: 222296724”]
[peer_id=221727388] [region_id=131228717]
[2021/10/02 03:06:35.951 +08:00] [INFO] [peer.rs:1956] [“notify pd with change peer region”] [region=“id: 131228717 start_key: 7480000000000000FFF75F7286F20B86E9FFC5ED
480000000000FA end_key: 7480000000000000FFF75F7286F21D58B0FF7178C60000000000FA region_epoch { conf_ver: 230 version: 80 } peers { id: 192440095 store_id: 190589610 } p
eers { id: 221727388 store_id: 220255427 } peers { id: 223128009 store_id: 221796927 }”] [peer_id=221727388] [region_id=131228717]

[2021/10/02 06:42:35.237 +08:00] [INFO] [raft.rs:971] ["[logterm: 24, index: 1030796, vote: 221727388] ignored vote from 223128009 [logterm: 24, index: 1030796]: lease is not expired"] [“msg type”=MsgRequestPreVote] [“remaining ticks”=4] [term=24] [msg_index=1030796] [msg_term=24] [from=223128009] [vote=221727388] [log_index=1030796] [log_term=24] [raft_id=221727388] [region_id=131228717]

@KS贺大伟 麻烦提供一下上文提到的 tikv log 完整的压缩一下哈

190589610.tikv.log.tar.gz (34.5 MB) 220255427.tikv.log.tar.gz (30.1 MB) 221796927.tikv.log.tar.gz (94.8 MB)

这是相关的日志,这个是核心集群的日志,不要用于分析问题以外的目的,谢谢啦

另外麻烦关注一下 down peer与pending peer在同一个store上,无法自动处理

有啥进展吗?

我们先分析一下,稍等~

大佬,有啥进展吗

222252897
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:1739] ["[term 99] received MsgTimeoutNow from 221195294 and starts an election to get leadership."] [from=221195294] [term=99] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:1177] ["starting a new election"] [term=99] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:807] ["became candidate at term 100"] [term=100] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:902] ["222252897 received message from 222252897"] [term=100] [msg=MsgRequestVote] [from=222252897] [id=222252897] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:923] ["[logterm: 99, index: 65709] sent request to 19947609"] [msg=MsgRequestVote] [term=100] [id=19947609] [log_index=65709] [log_term=99] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:923] ["[logterm: 99, index: 65709] sent request to 221195294"] [msg=MsgRequestVote] [term=100] [id=221195294] [log_index=65709] [log_term=99] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:1673] ["received from 19947609"] [term=100] ["msg type"=MsgRequestVoteResponse] [from=19947609] [raft_id=222252897] [region_id=19947607]
[2021/10/01 21:20:55.711 +08:00] [INFO] [raft.rs:874] ["became leader at term 100"] [term=100] [raft_id=222252897] [region_id=19947607]
...
[2021/10/01 22:29:57.147 +08:00] [INFO] [raft.rs:971] ["[logterm: 100, index: 65710, vote: 222252897] ignored vote from 19947609 [logterm: 100, index: 65710]: lease is not expired"] ["msg type"=MsgRequestPreVote] ["remaining ticks"=10] [term=100] [msg_index=65710] [msg_term=100] [from=19947609] [vote=222252897] [log_index=65710] [log_term=100] [raft_id=222252897] [region_id=19947607]
[2021/10/01 22:29:57.147 +08:00] [INFO] [raft.rs:971] ["[logterm: 100, index: 65710, vote: 222252897] ignored vote from 19947609 [logterm: 100, index: 65710]: lease is not expired"] ["msg type"=MsgRequestVote] ["remaining ticks"=10] [term=100] [msg_index=65710] [msg_term=100] [from=19947609] [vote=222252897] [log_index=65710] [log_term=100] [raft_id=222252897] [region_id=19947607]



221195294
[2021/10/01 22:29:57.149 +08:00] [INFO] [raft.rs:1192] ["[logterm: 100, index: 65710, vote: 222252897] cast vote for 19947609 [logterm: 100, index: 65710] at term 100"] ["msg type"=MsgRequestPreVote] [term=100] [msg_index=65710] [msg_term=100] [from=19947609] [vote=222252897] [log_index=65710] [log_term=100] [raft_id=221195294] [region_id=19947607]
[2021/10/01 22:29:57.149 +08:00] [INFO] [raft.rs:1003] ["received a message with higher term from 19947609"] ["msg type"=MsgRequestVote] [message_term=101] [term=100] [from=19947609] [raft_id=221195294] [region_id=19947607]
[2021/10/01 22:29:57.149 +08:00] [INFO] [raft.rs:783] ["became follower at term 101"] [term=101] [raft_id=221195294] [region_id=19947607]
[2021/10/01 22:29:57.149 +08:00] [INFO] [raft.rs:1192] ["[logterm: 100, index: 65710, vote: 0] cast vote for 19947609 [logterm: 100, index: 65710] at term 101"] ["msg type"=MsgRequestVote] [term=101] [msg_index=65710] [msg_term=100] [from=19947609] [vote=0] [log_index=65710] [log_term=100] [raft_id=221195294] [region_id=19947607]
...
[2021/10/02 03:48:52.251 +08:00] [INFO] [raft.rs:1003] ["received a message with higher term from 222252897"] ["msg type"=MsgRequestVote] [message_term=102] [term=101] [from=222252897] [raft_id=221195294] [region_id=19947607]
[2021/10/02 03:48:52.251 +08:00] [INFO] [raft.rs:783] ["became follower at term 102"] [term=102] [raft_id=221195294] [region_id=19947607]
[2021/10/02 03:48:52.251 +08:00] [INFO] [raft.rs:1192] ["[logterm: 101, index: 65711, vote: 0] cast vote for 222252897 [logterm: 101, index: 65711] at term 102"] ["msg type"=MsgRequestVote] [term=102] [msg_index=65711] [msg_term=101] [from=222252897] [vote=0] [log_index=65711] [log_term=101] [raft_id=221195294] [region_id=19947607]

这是 region_id=19947607 在10/01 晚上两个节点的日志片段,可以看到是 peer 19947609 在发起选举,但是我没在整个集群的 tikv 日志中找到 19947609 的任何日志(包括发起选举等等)

220528533
[2021/10/02 05:11:05.340 +08:00] [INFO] [raft.rs:971] ["[logterm: 66, index: 76, vote: 220528532] ignored vote from 222243997 [logterm: 66, index: 76]: lease is not expired"] ["msg type"=MsgRequestPreVote] ["remaining ticks"=9] [term=66] [msg_index=76] [msg_term=66] [from=222243997] [vote=220528532] [log_index=76] [log_term=66] [raft_id=220528533] [region_id=220528531]
[2021/10/02 05:11:24.217 +08:00] [INFO] [raft.rs:971] ["[logterm: 66, index: 76, vote: 220528532] ignored vote from 222243997 [logterm: 66, index: 76]: lease is not expired"] ["msg type"=MsgRequestPreVote] ["remaining ticks"=8] [term=66] [msg_index=76] [msg_term=66] [from=222243997] [vote=220528532] [log_index=76] [log_term=66] [raft_id=220528533] [region_id=220528531]

又比如这个 region 220528531 只出现在这一个节点的日志文件中

有问题的region是131228717,其他的region我没有关注,至少业务请求没有触及到其他的问题region,所以我收集的日志都是131228717这个region的