tikv同步数据异常停止?

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】4.0.12

【问题描述】三个节点TIKV,在DM初始化过程中,其中一个tikv状态异常,disconnect, 监控显示也是tikv down,是因为压力大吗?
这个时间点只有DM在同步上游mysql数据,没有业务操作,TIKV IO压力是比较大的。

[2021/05/14 23:15:35.438 +08:00] [ERROR] [client.rs:438] [“failed to send heartbeat”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 0-OK, details: None })))”]
[2021/05/14 23:15:35.438 +08:00] [ERROR] [util.rs:343] [“request failed, retry”] [err_code=KV:Unknown] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 0-OK, details: None })))”]
[2021/05/14 23:15:35.438 +08:00] [ERROR] [util.rs:343] [“request failed, retry”] [err_code=KV:Unknown] [err=“Other(SendError("…"))”]
[2021/05/14 23:15:35.438 +08:00] [ERROR] [util.rs:343] [“request failed, retry”] [err_code=KV:Unknown] [err=“Other(SendError("…"))”]
[2021/05/14 23:57:54.919 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 2032] region not exist but not tombstone: region { id: 2032 start_key: 7480000000000003FFC900000000000000F8 end_key: 7480000000000003FFCB00000000000000F8 region_epoch { conf_ver: 5 version: 462 } peers { id: 2033 store_id: 1 } peers { id: 2034 store_id: 4 } peers { id: 2035 store_id: 5 } }")”] [store_id=1]
[2021/05/14 23:59:27.080 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 2084] region not exist but not tombstone: region { id: 2084 start_key: 7480000000000001FFE300000000000000F8 end_key: 7480000000000001FFE35F698000000000FF0000010380000000FF05F7B7C003800000FF0005F7B7C0038000FF0000000018700380FF000000000005EC01FF3230303330303700FFFE00000000000000F8 region_epoch { conf_ver: 5 version: 230 } peers { id: 2085 store_id: 1 } peers { id: 2086 store_id: 4 } peers { id: 2087 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:12:06.616 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9480] region not exist but not tombstone: region { id: 9480 start_key: 7480000000000006FF3000000000000000F8 end_key: 7480000000000006FF305F728000000000FF0EA60B0000000000FA region_epoch { conf_ver: 5 version: 589 } peers { id: 9481 store_id: 1 } peers { id: 9482 store_id: 4 } peers { id: 9483 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:12:12.182 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9492] region not exist but not tombstone: region { id: 9492 start_key: 7480000000000004FF5A5F698000000000FF0000040131373635FF36393600FE017369FF642D31304138FF39FF3442312D343245FFFF382D344139422D39FFFF4434312D423941FF42FF453139443331FF3733FF0000000000FF000000F703800000FF00000306E3000000FC end_key: 7480000000000004FF5A5F698000000000FF0000050131303031FF32333239FF000000FF0000000000F70173FF69642D45343439FFFF413941422D333238FFFF432D344443372DFF41FF3937372D4643FF3744FF3246353245FF343437FF00000000FF00000000F7038000FF00000000D5160000FD region_epoch { conf_ver: 5 version: 500 } peers { id: 9493 store_id: 1 } peers { id: 9494 store_id: 4 } peers { id: 9495 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:12:21.188 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9484] region not exist but not tombstone: region { id: 9484 start_key: 7480000000000004FF3F00000000000000F8 end_key: 7480000000000004FF3F5F698000000000FF0000040003800000FF000001BB6C000000FC region_epoch { conf_ver: 5 version: 494 } peers { id: 9485 store_id: 1 } peers { id: 9486 store_id: 4 } peers { id: 9487 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:12:24.038 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9484] region not exist but not tombstone: region { id: 9484 start_key: 7480000000000004FF3F00000000000000F8 end_key: 7480000000000004FF3F5F698000000000FF0000040003800000FF000001BB6C000000FC region_epoch { conf_ver: 5 version: 494 } peers { id: 9485 store_id: 1 } peers { id: 9486 store_id: 4 } peers { id: 9487 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:12:26.494 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9484] region not exist but not tombstone: region { id: 9484 start_key: 7480000000000004FF3F00000000000000F8 end_key: 7480000000000004FF3F5F698000000000FF0000040003800000FF000001BB6C000000FC region_epoch { conf_ver: 5 version: 494 } peers { id: 9485 store_id: 1 } peers { id: 9486 store_id: 4 } peers { id: 9487 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:13:10.710 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9508] region not exist but not tombstone: region { id: 9508 start_key: 7480000000000006FFC000000000000000F8 end_key: 7480000000000006FFC200000000000000F8 region_epoch { conf_ver: 5 version: 649 } peers { id: 9509 store_id: 1 } peers { id: 9510 store_id: 4 } peers { id: 9511 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:13:12.784 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9512] region not exist but not tombstone: region { id: 9512 start_key: 7480000000000004FF5A5F698000000000FF0000050131373635FF36373300FE017369FF642D42304441FF46FF4331342D333934FFFF432D343138302D39FFFF4538462D303034FF45FF434645433533FF4534FF0000000000FF000000F703800000FF000003028E000000FC end_key: 7480000000000004FF5A5F698000000000FF0000050137323930FF39383500FE017369FF642D32313135FF32FF3831442D373134FFFF362D343030432D39FFFF3641302D324534FF33FF344433373543FF3036FF0000000000FF000000F703800000FF00000FADCD000000FC region_epoch { conf_ver: 5 version: 500 } peers { id: 9513 store_id: 1 } peers { id: 9514 store_id: 4 } peers { id: 9515 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:13:14.854 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9512] region not exist but not tombstone: region { id: 9512 start_key: 7480000000000004FF5A5F698000000000FF0000050131373635FF36373300FE017369FF642D42304441FF46FF4331342D333934FFFF432D343138302D39FFFF4538462D303034FF45FF434645433533FF4534FF0000000000FF000000F703800000FF000003028E000000FC end_key: 7480000000000004FF5A5F698000000000FF0000050137323930FF39383500FE017369FF642D32313135FF32FF3831442D373134FFFF362D343030432D39FFFF3641302D324534FF33FF344433373543FF3036FF0000000000FF000000F703800000FF00000FADCD000000FC region_epoch { conf_ver: 5 version: 500 } peers { id: 9513 store_id: 1 } peers { id: 9514 store_id: 4 } peers { id: 9515 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:13:17.232 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9512] region not exist but not tombstone: region { id: 9512 start_key: 7480000000000004FF5A5F698000000000FF0000050131373635FF36373300FE017369FF642D42304441FF46FF4331342D333934FFFF432D343138302D39FFFF4538462D303034FF45FF434645433533FF4534FF0000000000FF000000F703800000FF000003028E000000FC end_key: 7480000000000004FF5A5F698000000000FF0000050137323930FF39383500FE017369FF642D32313135FF32FF3831442D373134FFFF362D343030432D39FFFF3641302D324534FF33FF344433373543FF3036FF0000000000FF000000F703800000FF00000FADCD000000FC region_epoch { conf_ver: 5 version: 500 } peers { id: 9513 store_id: 1 } peers { id: 9514 store_id: 4 } peers { id: 9515 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:13:19.336 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9512] region not exist but not tombstone: region { id: 9512 start_key: 7480000000000004FF5A5F698000000000FF0000050131373635FF36373300FE017369FF642D42304441FF46FF4331342D333934FFFF432D343138302D39FFFF4538462D303034FF45FF434645433533FF4534FF0000000000FF000000F703800000FF000003028E000000FC end_key: 7480000000000004FF5A5F698000000000FF0000050137323930FF39383500FE017369FF642D32313135FF32FF3831442D373134FFFF362D343030432D39FFFF3641302D324534FF33FF344433373543FF3036FF0000000000FF000000F703800000FF00000FADCD000000FC region_epoch { conf_ver: 5 version: 500 } peers { id: 9513 store_id: 1 } peers { id: 9514 store_id: 4 } peers { id: 9515 store_id: 5 } }")”] [store_id=1]
[2021/05/15 00:13:19.336 +08:00] [ERROR] [store.rs:487] [“handle raft message failed”] [err_code=KV:Raftstore:Unknown] [err=“Other("[components/raftstore/src/store/fsm/store.rs:1360]: [region 9496] region not exist but not tombstone: region { id: 9496 start_key: 7480000000000006FF2C00000000000000F8 end_key: 7480000000000006FF2C5F698000000000FF0000050132303130FF33313130FF303534FF3030313030FF3536FF000000000000F903FF8AB6D55B2A800000FF0000000000000000F7 region_epoch { conf_ver: 5 version: 590 } peers { id: 9497 store_id: 1 } peers { id: 9498 store_id: 4 } peers { id: 9499 store_id: 5 } }")”] [store_id=1]14.log (217.6 KB)

另外发现那个时间点TIKV 内存占用比较多,从日志查看是应该是被OOM了

DM在同步大数据量,写TIkv(8C 16G内存)时候,为啥会消耗这么多内存?有控制方法在同步大量数据时候防止OOM?

  1. 是否是单机多实例方式部署 TiKV 节点?
  2. TiKV 内存占用情况可以参考一下这几个帖子的说明
    tikv内存不释放
  1. 可以调整下 block cache 参数控制下内存

是一个节点,一个TIKV,我看正常就占用54%左右OS的内存。

目前OS是16G内存,tikv最高占用14G,默认参数是45%的OS占用,当时没有大查询,只有DM在同步数据。
单机单TIKV,只有限制block-cache 吗?

目前 tikv 中可控制的内存就 block cache 这个可以通过参数控制