ticdc延迟问题排查

版本:4.0.10
tps:1K左右,但是是不是收到cdc延迟告警15s,查看了当时cdc的日志如下:
[2021/09/16 14:36:59.456 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:00.458 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:01.459 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:01.812 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=992189] [requestID=119890] [span="[7480000000000008ff0f5f720000000000fa, 7480000000000008ff0f5f730000000000fa)"] [checkpoint=427759821499662348] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:992189 > “]
[2021/09/16 14:37:01.812 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.812 +08:00] [INFO] [region_cache.go:827] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=992189]
[2021/09/16 14:37:01.812 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.813 +08:00] [INFO] [client.go:667] [“cannot get rpcCtx, retry span”] [regionID=992189] [span=”[7480000000000008ff0f5f720000000000fa, 7480000000000008ff0f5f730000000000fa)"]
[2021/09/16 14:37:01.813 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.813 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.813 +08:00] [INFO] [client.go:765] [“start new request”] [request="{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:992189,“region_epoch”:{“conf_ver”:5,“version”:307},“checkpoint_ts”:427759821499662348,“start_key”:“dIAAAAAAAAj/D19yAAAAAAD6”,“end_key”:“dIAAAAAAAAj/D19zAAAAAAD6”,“request_id”:120117,“extra_op”:1}"] [addr=172.10.168.45:27001]
[2021/09/16 14:37:01.816 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=992189] [requestID=120117] [span="[7480000000000008ff0f5f720000000000fa, 7480000000000008ff0f5f730000000000fa)"] [checkpoint=427759821499662348] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:992189 > “]
[2021/09/16 14:37:01.816 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.816 +08:00] [INFO] [region_cache.go:827] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=992189]
[2021/09/16 14:37:01.816 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.816 +08:00] [INFO] [client.go:765] [“start new request”] [request=”{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:992189,“region_epoch”:{“conf_ver”:5,“version”:307},“checkpoint_ts”:427759821499662348,“start_key”:“dIAAAAAAAAj/D19yAAAAAAD6”,“end_key”:“dIAAAAAAAAj/D19zAAAAAAD6”,“request_id”:120118,“extra_op”:1}"] [addr=10.84.117.171:7301]
[2021/09/16 14:37:01.817 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=992189] [requestID=120118] [span="[7480000000000008ff0f5f720000000000fa, 7480000000000008ff0f5f730000000000fa)"] [checkpoint=427759821499662348] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:992189 leader:<id:992192 store_id:5 > > “]
[2021/09/16 14:37:01.817 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.817 +08:00] [INFO] [region_cache.go:840] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=992189] [currIdx=1] [leaderStoreID=5]
[2021/09/16 14:37:01.817 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=350] [regionID=992189] [startKey=7480000000000008ff0f5f720000000000fa] [endKey=7480000000000008ff0f5f730000000000fa] [checkpointTs=427759821499662348]
[2021/09/16 14:37:01.817 +08:00] [INFO] [client.go:765] [“start new request”] [request=”{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:992189,“region_epoch”:{“conf_ver”:5,“version”:307},“checkpoint_ts”:427759821499662348,“start_key”:“dIAAAAAAAAj/D19yAAAAAAD6”,“end_key”:“dIAAAAAAAAj/D19zAAAAAAD6”,“request_id”:120119,“extra_op”:1}"] [addr=10.84.94.14:7301]
[2021/09/16 14:37:02.460 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:02.511 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-tradeorder] [capture=172.10.168.55:28001] [count=9] [qps=0]
[2021/09/16 14:37:02.511 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-seo] [capture=172.10.168.55:28001] [count=0] [qps=0]
[2021/09/16 14:37:02.511 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-support] [capture=172.10.168.55:28001] [count=0] [qps=0]
[2021/09/16 14:37:03.352 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-user] [capture=172.10.168.55:28001] [count=14] [qps=0]
[2021/09/16 14:37:03.462 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:04.264 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-sl-datacenter] [capture=172.10.168.55:28001] [count=149] [qps=4]
[2021/09/16 14:37:04.462 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:05.463 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:05.495 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-sl-datacenter-dwv] [capture=172.10.168.55:28001] [count=0] [qps=0]
[2021/09/16 14:37:05.495 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-product-1] [capture=172.10.168.55:28001] [count=55] [qps=1]
[2021/09/16 14:37:06.464 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:06.714 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=966689] [requestID=119978] [span="[7480000000000006ff015f720000000000fa, 7480000000000006ff015f728000000000ff02c8eb0000000000fa)"] [checkpoint=427759822508916861] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:966689 > “]
[2021/09/16 14:37:06.714 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.714 +08:00] [INFO] [region_cache.go:827] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=966689]
[2021/09/16 14:37:06.714 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.714 +08:00] [INFO] [client.go:667] [“cannot get rpcCtx, retry span”] [regionID=966689] [span=”[7480000000000006ff015f720000000000fa, 7480000000000006ff015f728000000000ff02c8eb0000000000fa)"]
[2021/09/16 14:37:06.714 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.714 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.714 +08:00] [INFO] [client.go:765] [“start new request”] [request="{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:966689,“region_epoch”:{“conf_ver”:5,“version”:155},“checkpoint_ts”:427759822508916861,“start_key”:“dIAAAAAAAAb/AV9yAAAAAAD6”,“end_key”:“dIAAAAAAAAb/AV9ygAAAAAD/AsjrAAAAAAD6”,“request_id”:120120,“extra_op”:1}"] [addr=172.10.168.45:27001]
[2021/09/16 14:37:06.719 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=966689] [requestID=120120] [span="[7480000000000006ff015f720000000000fa, 7480000000000006ff015f728000000000ff02c8eb0000000000fa)"] [checkpoint=427759822508916861] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:966689 > “]
[2021/09/16 14:37:06.719 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.719 +08:00] [INFO] [region_cache.go:827] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=966689]
[2021/09/16 14:37:06.719 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.719 +08:00] [INFO] [client.go:765] [“start new request”] [request=”{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:966689,“region_epoch”:{“conf_ver”:5,“version”:155},“checkpoint_ts”:427759822508916861,“start_key”:“dIAAAAAAAAb/AV9yAAAAAAD6”,“end_key”:“dIAAAAAAAAb/AV9ygAAAAAD/AsjrAAAAAAD6”,“request_id”:120121,“extra_op”:1}"] [addr=10.84.117.171:7301]
[2021/09/16 14:37:06.719 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=966689] [requestID=120121] [span="[7480000000000006ff015f720000000000fa, 7480000000000006ff015f728000000000ff02c8eb0000000000fa)"] [checkpoint=427759822508916861] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:966689 leader:<id:966692 store_id:5 > > “]
[2021/09/16 14:37:06.719 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.719 +08:00] [INFO] [region_cache.go:840] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=966689] [currIdx=1] [leaderStoreID=5]
[2021/09/16 14:37:06.719 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=295] [regionID=966689] [startKey=7480000000000006ff015f720000000000fa] [endKey=7480000000000006ff015f728000000000ff02c8eb0000000000fa] [checkpointTs=427759822508916861]
[2021/09/16 14:37:06.719 +08:00] [INFO] [client.go:765] [“start new request”] [request=”{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:966689,“region_epoch”:{“conf_ver”:5,“version”:155},“checkpoint_ts”:427759822508916861,“start_key”:“dIAAAAAAAAb/AV9yAAAAAAD6”,“end_key”:“dIAAAAAAAAb/AV9ygAAAAAD/AsjrAAAAAAD6”,“request_id”:120122,“extra_op”:1}"] [addr=10.84.94.14:7301]
[2021/09/16 14:37:07.465 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:08.466 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:09.467 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:10.469 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:11.014 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=801257] [requestID=120111] [span="[7480000000000006ff015f728000000002fffd8bae0000000000fa, 7480000000000006ff015f730000000000fa)"] [checkpoint=427759823872065538] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:801257 > “]
[2021/09/16 14:37:11.014 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=295] [regionID=801257] [startKey=7480000000000006ff015f728000000002fffd8bae0000000000fa] [endKey=7480000000000006ff015f730000000000fa] [checkpointTs=427759823872065538]
[2021/09/16 14:37:11.014 +08:00] [INFO] [region_cache.go:827] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=801257]
[2021/09/16 14:37:11.014 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=295] [regionID=801257] [startKey=7480000000000006ff015f728000000002fffd8bae0000000000fa] [endKey=7480000000000006ff015f730000000000fa] [checkpointTs=427759823872065538]
[2021/09/16 14:37:11.014 +08:00] [INFO] [client.go:765] [“start new request”] [request=”{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:801257,“region_epoch”:{“conf_ver”:5,“version”:156},“checkpoint_ts”:427759823872065538,“start_key”:“dIAAAAAAAAb/AV9ygAAAAAL//YuuAAAAAAD6”,“end_key”:“dIAAAAAAAAb/AV9zAAAAAAD6”,“request_id”:120123,“extra_op”:1}"] [addr=10.84.117.171:7301]
[2021/09/16 14:37:11.017 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=801257] [requestID=120123] [span="[7480000000000006ff015f728000000002fffd8bae0000000000fa, 7480000000000006ff015f730000000000fa)"] [checkpoint=427759823872065538] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:801257 > “]
[2021/09/16 14:37:11.017 +08:00] [INFO] [region_range_lock.go:365] [“unlocked range”] [lockID=295] [regionID=801257] [startKey=7480000000000006ff015f728000000002fffd8bae0000000000fa] [endKey=7480000000000006ff015f730000000000fa] [checkpointTs=427759823872065538]
[2021/09/16 14:37:11.017 +08:00] [INFO] [region_cache.go:827] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=1] [regionID=801257]
[2021/09/16 14:37:11.017 +08:00] [INFO] [region_range_lock.go:217] [“range locked”] [lockID=295] [regionID=801257] [startKey=7480000000000006ff015f728000000002fffd8bae0000000000fa] [endKey=7480000000000006ff015f730000000000fa] [checkpointTs=427759823872065538]
[2021/09/16 14:37:11.017 +08:00] [INFO] [client.go:765] [“start new request”] [request=”{“header”:{“cluster_id”:6940895255969595236,“ticdc_version”:“4.0.11”},“region_id”:801257,“region_epoch”:{“conf_ver”:5,“version”:156},“checkpoint_ts”:427759823872065538,“start_key”:“dIAAAAAAAAb/AV9ygAAAAAL//YuuAAAAAAD6”,“end_key”:“dIAAAAAAAAb/AV9zAAAAAAD6”,“request_id”:120124,“extra_op”:1}"] [addr=10.84.94.14:7301]
[2021/09/16 14:37:11.469 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:12.470 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:13.268 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cdc-us-customer] [capture=172.10.168.55:28001] [count=0] [qps=0]
[2021/09/16 14:37:13.471 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:14.473 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:15.474 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]
[2021/09/16 14:37:15.759 +08:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=cf-us-check-heartbeat-not-del] [capture=172.10.168.55:28001] [count=0] [qps=0]
[2021/09/16 14:37:16.475 +08:00] [WARN] [client.go:1237] [“drop resolved ts due to region feed stopped”] [regionID=2] [requestID=101363] [addr=172.10.168.45:27001]

1 个赞

有几个疑问:

  1. range locked 是怎么产生的?
    2)switch region peer to next due to NotLeader with NULL leader
    3)switch region leader to specific leader due to kv return NotLeader

region leader switch 对cdc的延迟有这么大的影响吗

反馈以下内容:版本
出现的问题
可复现操作
脚本
日志
你这缺了挺多的,而且尽可能在提问的时候一次性描述清楚。追问会被误以为是“有回复”,所以有可能被漏掉

tps:1K左右,但是时不时会收到cdc延迟告警,告警阈值15s,整个集群tps1k左右,为什么cdc解析会延迟这么久呢?(cdc server 负载很低,tikv的负载也很低,同机房),看cdc的日志提示 region leader发生了切换,但是仅仅region leader切换就延迟这么多,有点不解。

建议升级到 4014 再看下

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。