down peer与pending peer在同一个store上,无法自动处理

这个问题通过重启问题store,然后人工在PD上手动remove peer修复。但是我查看TiKV日志的时候发现,这个down peer所在的store一直打印如下的日子,而其他store确没有类似的日志,重启这个store,问题仍然存在。
[2021/10/29 11:21:27.583 +08:00] [INFO] [raft.rs:923] ["[logterm: 5, index: 5] sent request to 221662017"] [msg=MsgRequestPreVote] [term=5] [id=221662017] [log_index=5] [log_term=5] [raft_id=221662016] [region_id=221662015]
[2021/10/29 11:21:27.684 +08:00] [ERROR] [util.rs:342] [“request failed, retry”] [err_code=KV:Unknown] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 1-CANCELLED, details: Some(“Cancelled”) })))”]
[2021/10/29 11:21:27.684 +08:00] [ERROR] [util.rs:387] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(“invalid store ID 9, not found”) }))”]
[2021/10/29 11:21:27.684 +08:00] [INFO] [util.rs:462] [“connecting to PD endpoint”] [endpoints=http://10.36.68.167:2383]
[2021/10/29 11:21:27.684 +08:00] [INFO] [] [“New connected subchannel at 0x7fcaa623c940 for subchannel 0x7fc7c6ff0180”]
[2021/10/29 11:21:27.685 +08:00] [INFO] [util.rs:462] [“connecting to PD endpoint”] [endpoints=http://10.36.144.171:2383]
[2021/10/29 11:21:27.686 +08:00] [INFO] [util.rs:527] [“connected to PD leader”] [endpoints=http://10.36.144.171:2383]
[2021/10/29 11:21:27.686 +08:00] [INFO] [util.rs:225] [“heartbeat sender and receiver are stale, refreshing …”]
[2021/10/29 11:21:27.686 +08:00] [ERROR] [client.rs:457] [“failed to send heartbeat”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 1-CANCELLED, details: Some(“Cancelled”) })))”]
[2021/10/29 11:21:27.686 +08:00] [ERROR] [util.rs:342] [“request failed, retry”] [err_code=KV:Unknown] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 1-CANCELLED, details: Some(“Cancelled”) })))”]
[2021/10/29 11:21:27.778 +08:00] [WARN] [util.rs:243] [“updating PD client done”] [spend=93.699425ms]
[2021/10/29 11:21:27.778 +08:00] [ERROR] [util.rs:387] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(“invalid store ID 9, not found”) }))”]
[2021/10/29 11:21:27.778 +08:00] [ERROR] [util.rs:396] [“reconnect failed”] [err_code=KV:PD:Unknown] [err=“Other(”[components/pd_client/src/util.rs:181]: cancel reconnection due to too small interval")"]
[2021/10/29 11:21:28.779 +08:00] [ERROR] [util.rs:387] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(“invalid store ID 9, not found”) }))”]
[2021/10/29 11:21:28.779 +08:00] [INFO] [util.rs:462] [“connecting to PD endpoint”] [endpoints=http://10.36.68.167:2383]
[2021/10/29 11:21:28.779 +08:00] [INFO] [] [“New connected subchannel at 0x7fcaa623cdc0 for subchannel 0x7fc7c6ff3980”]
[2021/10/29 11:21:28.780 +08:00] [INFO] [util.rs:462] [“connecting to PD endpoint”] [endpoints=http://10.36.144.171:2383]
[2021/10/29 11:21:28.780 +08:00] [INFO] [util.rs:527] [“connected to PD leader”] [endpoints=http://10.36.144.171:2383]
[2021/10/29 11:21:28.780 +08:00] [INFO] [util.rs:225] [“heartbeat sender and receiver are stale, refreshing …”]
[2021/10/29 11:21:28.780 +08:00] [ERROR] [client.rs:457] [“failed to send heartbeat”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 1-CANCELLED, details: Some(“Cancelled”) })))”]
[2021/10/29 11:21:28.780 +08:00] [ERROR] [util.rs:342] [“request failed, retry”] [err_code=KV:Unknown] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 1-CANCELLED, details: Some(“Cancelled”) })))”]
[2021/10/29 11:21:28.868 +08:00] [WARN] [util.rs:243] [“updating PD client done”] [spend=89.768229ms]
[2021/10/29 11:21:28.869 +08:00] [ERROR] [util.rs:387] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(“invalid store ID 9, not found”) }))”]
[2021/10/29 11:21:28.869 +08:00] [ERROR] [util.rs:396] [“reconnect failed”] [err_code=KV:PD:Unknown] [err=“Other(”[components/pd_client/src/util.rs:181]: cancel reconnection due to too small interval")"]
[2021/10/29 11:21:29.869 +08:00] [ERROR] [util.rs:387] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(“invalid store ID 9, not found”) }))”]
[2021/10/29 11:21:29.869 +08:00] [ERROR] [transport.rs:137] [“resolve store address failed”] [err=“Other(”[src/server/resolve.rs:72]: description() is deprecated; use Display")"] [store_id=9]
[2021/10/29 11:21:29.869 +08:00] [WARN] [mod.rs:89] [“handle task resolve store 9 address”] [takes=5470]

这里我看到它尝试解析store 9,但是目前PD中没有store 9,应该是一个已经废弃的store,相关的信息在pd上是被清理掉了。

1 个赞