tikv频繁报超时等错误

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:tidb 4.0.0
  • 【问题描述】:tikv 日志频繁出现如下错误,目前使用正常
    2020-07-03 17:50:17
    ERROR
    TiKV 192.168.0.155
    [] [“ipv4:192.168.1.154:46886: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:50:17
    ERROR
    TiKV 192.168.0.155
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:50:17
    ERROR
    TiKV 192.168.1.154
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.155:20171]
    2020-07-03 17:50:23
    ERROR
    TiKV 192.168.0.154
    [] [“ipv4:192.168.1.153:48536: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:50:23
    ERROR
    TiKV 192.168.1.153
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.154:20171]
    2020-07-03 17:50:23
    ERROR
    TiKV 192.168.0.154
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:50:58
    ERROR
    TiKV 192.168.0.155
    [] [“ipv4:192.168.1.159:34940: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:50:58
    ERROR
    TiKV 192.168.0.155
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:50:58
    ERROR
    TiKV 192.168.1.159
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.155:20171]
    2020-07-03 17:52:20
    ERROR
    TiKV 192.168.0.155
    [] [“ipv4:192.168.1.154:46896: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:52:20
    ERROR
    TiKV 192.168.1.154
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.155:20171]
    2020-07-03 17:52:20
    ERROR
    TiKV 192.168.0.155
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:52:35
    ERROR
    TiKV 192.168.1.157
    [] [“ipv4:192.168.0.154:20419: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:52:35
    ERROR
    TiKV 192.168.1.157
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:53:01
    ERROR
    TiKV 192.168.0.154
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.1.157:20171]
    2020-07-03 17:57:14
    ERROR
    TiKV 192.168.0.153
    [] [“ipv4:192.168.1.154:49378: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:57:14
    ERROR
    TiKV 192.168.0.153
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:57:14
    ERROR
    TiKV 192.168.1.154
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) }))”] [to_addr=192.168.0.153:20171]
    2020-07-03 17:58:10
    ERROR
    TiKV 192.168.0.155
    [] [“ipv4:192.168.1.154:46940: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:58:10
    ERROR
    TiKV 192.168.0.155
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:58:10
    ERROR
    TiKV 192.168.1.154
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.155:20171]
    2020-07-03 17:58:25
    ERROR
    TiKV 192.168.0.154
    [] [“ipv4:192.168.1.153:48586: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 17:58:25
    ERROR
    TiKV 192.168.0.154
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 17:58:51
    ERROR
    TiKV 192.168.1.153
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.154:20171]
    2020-07-03 18:04:57
    ERROR
    TiKV 192.168.0.154
    [] [“ipv4:192.168.0.155:62493: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 18:04:57
    ERROR
    TiKV 192.168.0.154
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 18:04:57
    ERROR
    TiKV 192.168.0.155
    [snap.rs:394] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(“keepalive watchdog timeout”) })))”] [to_addr=192.168.0.154:20171]
    2020-07-03 18:05:14
    ERROR
    TiKV 192.168.0.155
    [] [“ipv4:192.168.1.154:46982: Keepalive watchdog fired. Closing transport.”]
    2020-07-03 18:05:14
    ERROR
    TiKV 192.168.0.155
    [snap.rs:354] [“failed to recv snapshot”] [err=RemoteStopped]
    2020-07-03 18:05:14
    ERROR
    TiKV 192.168.1.154
    [snap.rs:394] [“failed to send snap”] [err="Grpc(RpcFinished(Some

是否是网络问题?

传输 snapshot 或者接受 snapshot 的时候如果网络抖动导致请求失败会报错,如果不影响服务请忽略

服务目前没有影响,就是报错挺频繁的,并且tikv之间的网络偶尔报警延迟4.5s

keepalive watchdog timeout , 看起来是操作系统看门狗的超时,您可以查一下操作系统的message日志,在这个时刻是否有什么告警。 同时检查网络是否正常,多谢。

1 Like