cdc同步数据到kafka中止

【 TiDB 使用环境】生产环境
【 TiDB 版本】4.0.6
【复现路径】未发现导致问题的路径
【遇到的问题:问题现象及影响】
问题:cdc 无法同步数据,tidb 4.0.6, ticdc 4.0.14,kafka topic 正常;
碰到问题:无法同步数据,tso也不再推进,cdc 重启、任务重建也都不同步,除非把cdc日志中提及的超时 region 或 not leader 的 region 相关的表从cdc任务中排除,同步恢复(本身tidb集群读取和写入数据正常)
统计的相关 region_id:

1772637 tabel1

1012586 tabel1

633916 tabel1

404133 tabel2

411363 tabel2

1218954 table3

细节:当日志中有相关 region 日志时,有两种现象,一是延时很大,二是可能 tso 完全不推进,排除 region 相关的表,tso 又可以推进,数据也可以同步
cdc日志,截取了部分日志,大多数是重复的:
[2022/12/16 06:20:32.259 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message
of newer requestID”] [regionID=1772637] [oldRequestID=130] [requestID=131] [addr=ip1:20160]
[2022/12/16 06:20:31.793 +00:00] [INFO] [client.go:860] [“start new request”] [request=“{"header":{"cluster_id":6929
472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38117,"version":380
},"checkpoint_ts":438087496947204097,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/
BLihAAAAAAD6","request_id":131,"extra_op":1}”] [addr=ip1:20160]
[2022/12/16 06:20:31.793 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [star
tKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs
=438087496947204097]
[2022/12/16 06:20:31.790 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [st
artKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpoint
Ts=438087496947204097]
[2022/12/16 06:20:31.790 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=130] [
span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=43808
7496947204097] [error=“[CDC:ErrEventFeedEventError]region_not_found:<region_id:1772637 > “]
[2022/12/16 06:20:31.790 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message
of newer requestID”] [regionID=1772637] [oldRequestID=122] [requestID=130] [addr=ip1:20160]
[2022/12/16 06:20:31.787 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929
472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38115,"version":380
},"checkpoint_ts":438087496947204097,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/
BLihAAAAAAD6","request_id":130,"extra_op":1}”] [addr=ip1:20160]
[2022/12/16 06:20:31.787 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [star
tKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs
=438087496947204097]
[2022/12/16 06:20:31.787 +00:00] [INFO] [region_cache.go:829] [“switch region peer to next due to NotLeader with NULL le
ader”] [currIdx=0] [regionID=1772637]
[2022/12/16 06:20:31.787 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [st
artKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpoint
Ts=438087496947204097]
[2022/12/16 06:20:31.787 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=129] [
span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=43808
7496947204097] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1772637 > “]
[2022/12/16 06:20:31.787 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message
of newer requestID”] [regionID=1772637] [oldRequestID=123] [requestID=129] [addr=172.23.50.156:20160]
[2022/12/16 06:20:31.785 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929
472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38115,"version":380
},"checkpoint_ts":438087496947204097,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/
BLihAAAAAAD6","request_id":129,"extra_op":1}”] [addr=172.23.50.156:20160]
[2022/12/16 06:20:31.784 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [star
tKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs
=438087496947204097]
[2022/12/16 06:20:31.784 +00:00] [INFO] [region_cache.go:829] [“switch region peer to next due to NotLeader with NULL le
ader”] [currIdx=2] [regionID=1772637]
[2022/12/16 06:20:31.784 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [st
artKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpoint
Ts=438087496947204097]
[2022/12/16 06:20:31.784 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=125] [
span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=43808
7496947204097] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1772637 > “]
[2022/12/16 06:20:30.561 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012517] [maxVers
ion=438087613208068096]
[2022/12/16 06:20:30.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts
is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=”[7480000000000001ffd25f720000000000fa,
7480000000000001ffd25f730000000000fa)”] [duration=39m55.35s] [lastEvent=39m55.000141463s] [resolvedTs=43808698790287769
7]
[2022/12/16 06:20:30.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [reg
ionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=39m55.000
141463s]
[2022/12/16 06:20:28.982 +00:00] [INFO] [statistics.go:126] [“sink replication status”] [name=MQ] [changefeed=ticdc-dept
-task] [capture=172.23.51.40:8300] [count=0] [qps=0]
[2022/12/16 06:20:28.880 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1772637] [maxVers
ion=438087612683780096]
[2022/12/16 06:20:28.877 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts
is not pushing for too long time, try to resolve lock”] [regionID=1772637] [span=“[7480000000000001ff715f720000000000fa,
7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [duration=7m31.5s] [lastEvent=3m18.901974009s] [resolvedTs=43
8087496947204097]
[2022/12/16 06:20:28.877 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [reg
ionID=1772637] [span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [
duration=3m18.901974009s]

[2022/12/16 07:06:05.911 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088323251306497]
[2022/12/16 07:06:05.911 +00:00] [INFO] [client.go:860] [“start new request”] [request=“{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38135,"version":380},"checkpoint_ts":438088323251306497,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/BLihAAAAAAD6","request_id":198,"extra_op":1}”] [addr=172.23.50.154:20160]
[2022/12/16 07:06:05.914 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=1772637] [oldRequestID=196] [requestID=198] [addr=172.23.50.154:20160]
[2022/12/16 07:06:05.914 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=198] [span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=438088323251306497] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1772637 leader:<id:2108428 store_id:6 > > “]
[2022/12/16 07:06:05.914 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088323251306497]
[2022/12/16 07:06:05.914 +00:00] [INFO] [region_cache.go:842] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=1772637] [currIdx=1] [leaderStoreID=6]
[2022/12/16 07:06:05.914 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088323251306497]
[2022/12/16 07:06:05.914 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38135,"version":380},"checkpoint_ts":438088323251306497,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/BLihAAAAAAD6","request_id":199,"extra_op":1}”] [addr=172.23.50.155:20160]
[2022/12/16 07:06:06.349 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=1772637] [oldRequestID=181] [requestID=199] [addr=172.23.50.155:20160]
[2022/12/16 07:06:07.185 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=633916] [span=“[7480000000000001ff715f72c000000000ff01f9ae0000000000fa, 7480000000000001ff715f730000000000fa)”] [duration=2m21.807617222s]
[2022/12/16 07:06:07.185 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=633916] [span=“[7480000000000001ff715f72c000000000ff01f9ae0000000000fa, 7480000000000001ff715f730000000000fa)”] [duration=4m25.35s] [lastEvent=2m21.807617222s] [resolvedTs=438088263233437697]
[2022/12/16 07:06:07.187 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=633916] [maxVersion=438088330171908096]
[2022/12/16 07:06:10.089 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=633916] [requestID=188] [span=“[7480000000000001ff715f72c000000000ff01f9ae0000000000fa, 7480000000000001ff715f730000000000fa)”] [checkpoint=438088263233437697] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:633916 > “]
[2022/12/16 07:06:10.089 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=633916] [startKey=7480000000000001ff715f72c000000000ff01f9ae0000000000fa] [endKey=7480000000000001ff715f730000000000fa] [checkpointTs=438088263233437697]
[2022/12/16 07:06:10.089 +00:00] [INFO] [region_cache.go:829] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=2] [regionID=633916]
[2022/12/16 07:06:10.089 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=633916] [startKey=7480000000000001ff715f72c000000000ff01f9ae0000000000fa] [endKey=7480000000000001ff715f730000000000fa] [checkpointTs=438088263233437697]
[2022/12/16 07:06:10.090 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":633916,"region_epoch":{"conf_ver":32654,"version":379},"checkpoint_ts":438088263233437697,"start_key":"dIAAAAAAAAH/cV9ywAAAAAD/AfmuAAAAAAD6","end_key":"dIAAAAAAAAH/cV9zAAAAAAD6","request_id":200,"extra_op":1}”] [addr=ip1:20160]
[2022/12/16 07:06:10.092 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=633916] [oldRequestID=143] [requestID=200] [addr=ip1:20160]
[2022/12/16 07:06:10.092 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=633916] [requestID=200] [span=“[7480000000000001ff715f72c000000000ff01f9ae0000000000fa, 7480000000000001ff715f730000000000fa)”] [checkpoint=438088263233437697] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:633916 > “]
[2022/12/16 07:06:10.092 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=633916] [startKey=7480000000000001ff715f72c000000000ff01f9ae0000000000fa] [endKey=7480000000000001ff715f730000000000fa] [checkpointTs=438088263233437697]
[2022/12/16 07:06:10.092 +00:00] [INFO] [region_cache.go:829] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=3] [regionID=633916]
[2022/12/16 07:06:10.092 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=633916] [startKey=7480000000000001ff715f72c000000000ff01f9ae0000000000fa] [endKey=7480000000000001ff715f730000000000fa] [checkpointTs=438088263233437697]
[2022/12/16 07:06:10.092 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":633916,"region_epoch":{"conf_ver":32654,"version":379},"checkpoint_ts":438088263233437697,"start_key":"dIAAAAAAAAH/cV9ywAAAAAD/AfmuAAAAAAD6","end_key":"dIAAAAAAAAH/cV9zAAAAAAD6","request_id":201,"extra_op":1}”] [addr=172.23.50.156:20160]
[2022/12/16 07:06:10.095 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=633916] [oldRequestID=176] [requestID=201] [addr=172.23.50.156:20160]
[2022/12/16 07:06:10.096 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=633916] [requestID=201] [span=“[7480000000000001ff715f72c000000000ff01f9ae0000000000fa, 7480000000000001ff715f730000000000fa)”] [checkpoint=438088263233437697] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:633916 leader:<id:2108426 store_id:9 > > “]
[2022/12/16 07:06:10.096 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=633916] [startKey=7480000000000001ff715f72c000000000ff01f9ae0000000000fa] [endKey=7480000000000001ff715f730000000000fa] [checkpointTs=438088263233437697]
[2022/12/16 07:06:10.096 +00:00] [INFO] [region_cache.go:842] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=633916] [currIdx=0] [leaderStoreID=9]
[2022/12/16 07:06:10.096 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=633916] [startKey=7480000000000001ff715f72c000000000ff01f9ae0000000000fa] [endKey=7480000000000001ff715f730000000000fa] [checkpointTs=438088263233437697]
[2022/12/16 07:06:10.096 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":633916,"region_epoch":{"conf_ver":32654,"version":379},"checkpoint_ts":438088263233437697,"start_key":"dIAAAAAAAAH/cV9ywAAAAAD/AfmuAAAAAAD6","end_key":"dIAAAAAAAAH/cV9zAAAAAAD6","request_id":202,"extra_op":1}”] [addr=ip1:20160]
[2022/12/16 07:06:10.378 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=1m45.00026666s]
[2022/12/16 07:06:10.379 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=1m46.15s] [lastEvent=1m45.00026666s] [resolvedTs=438088306277482497]
[2022/12/16 07:06:10.380 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012586] [maxVersion=438088331482628096]
[2022/12/16 07:06:10.458 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=633916] [oldRequestID=200] [requestID=202] [addr=ip1:20160]
[2022/12/16 07:06:10.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=31m18.419707105s]
[2022/12/16 07:06:10.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=1h25m35.35s] [lastEvent=31m18.419707105s] [resolvedTs=438086987902877697]
[2022/12/16 07:06:10.560 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012517] [maxVersion=438088331482628096]
[2022/12/16 07:06:15.378 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=1m50.000243818s]
[2022/12/16 07:06:15.379 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=1m51.15s] [lastEvent=1m50.000243818s] [resolvedTs=438088306277482497]
[2022/12/16 07:06:15.380 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012586] [maxVersion=438088332793348096]
[2022/12/16 07:06:15.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=31m23.419706366s]
[2022/12/16 07:06:15.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=1h25m40.35s] [lastEvent=31m23.419706366s] [resolvedTs=438086987902877697]
[2022/12/16 07:06:15.561 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012517] [maxVersion=438088332793348096]
[2022/12/16 07:06:20.378 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=1m55.000258779s]
[2022/12/16 07:06:20.379 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=1m56.15s] [lastEvent=1m55.000258779s] [resolvedTs=438088306277482497]
[2022/12/16 07:06:20.382 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012586] [maxVersion=438088334104068096]
[2022/12/16 07:06:20.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=31m28.419695846s]
[2022/12/16 07:06:20.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=1h25m45.35s] [lastEvent=31m28.419695846s] [resolvedTs=438086987902877697]
[2022/12/16 07:06:20.560 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012517] [maxVersion=438088334104068096]
[2022/12/16 07:06:25.378 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=2m0.000184686s]
[2022/12/16 07:06:25.378 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=2m1.15s] [lastEvent=2m0.000184686s] [resolvedTs=438088306277482497]
[2022/12/16 07:06:25.380 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012586] [maxVersion=438088335414788096]
[2022/12/16 07:06:25.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=31m33.419660729s]
[2022/12/16 07:06:25.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=1h25m50.35s] [lastEvent=31m33.419660729s] [resolvedTs=438086987902877697]
[2022/12/16 07:06:25.561 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012517] [maxVersion=438088335414788096]
[2022/12/16 07:06:30.378 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=2m5.000319753s]
[2022/12/16 07:06:30.379 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=2m6.15s] [lastEvent=2m5.000319753s] [resolvedTs=438088306277482497]
[2022/12/16 07:06:30.382 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012586] [maxVersion=438088336725508096]
[2022/12/16 07:06:30.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=31m38.419710231s]
[2022/12/16 07:06:30.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=1h25m55.35s] [lastEvent=31m38.419710231s] [resolvedTs=438086987902877697]
[2022/12/16 07:06:30.560 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012517] [maxVersion=438088336725508096]
[2022/12/16 07:06:34.193 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=199] [span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=438088340120272897] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1772637 > “]
[2022/12/16 07:06:34.194 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088340120272897]
[2022/12/16 07:06:34.194 +00:00] [INFO] [region_cache.go:829] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=2] [regionID=1772637]
[2022/12/16 07:06:34.194 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088340120272897]
[2022/12/16 07:06:34.194 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38135,"version":380},"checkpoint_ts":438088340120272897,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/BLihAAAAAAD6","request_id":203,"extra_op":1}”] [addr=ip1:20160]
[2022/12/16 07:06:34.195 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=1772637] [oldRequestID=165] [requestID=203] [addr=ip1:20160]
[2022/12/16 07:06:34.196 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=203] [span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=438088340120272897] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1772637 > “]
[2022/12/16 07:06:34.196 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088340120272897]
[2022/12/16 07:06:34.196 +00:00] [INFO] [region_cache.go:829] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=3] [regionID=1772637]
[2022/12/16 07:06:34.196 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088340120272897]
[2022/12/16 07:06:34.196 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38135,"version":380},"checkpoint_ts":438088340120272897,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/BLihAAAAAAD6","request_id":204,"extra_op":1}”] [addr=172.23.50.156:20160]
[2022/12/16 07:06:34.198 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=1772637] [oldRequestID=194] [requestID=204] [addr=172.23.50.156:20160]
[2022/12/16 07:06:34.198 +00:00] [INFO] [client.go:1041] [“EventFeed disconnected”] [regionID=1772637] [requestID=204] [span=“[7480000000000001ff715f720000000000fa, 7480000000000001ff715f729000000000ff04b8a10000000000fa)”] [checkpoint=438088340120272897] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1772637 leader:<id:2108430 store_id:9 > > “]
[2022/12/16 07:06:34.198 +00:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088340120272897]
[2022/12/16 07:06:34.198 +00:00] [INFO] [region_cache.go:842] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=1772637] [currIdx=0] [leaderStoreID=9]
[2022/12/16 07:06:34.198 +00:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=36] [regionID=1772637] [startKey=7480000000000001ff715f720000000000fa] [endKey=7480000000000001ff715f729000000000ff04b8a10000000000fa] [checkpointTs=438088340120272897]
[2022/12/16 07:06:34.198 +00:00] [INFO] [client.go:860] [“start new request”] [request=”{"header":{"cluster_id":6929472922985843426,"ticdc_version":"4.0.14"},"region_id":1772637,"region_epoch":{"conf_ver":38135,"version":380},"checkpoint_ts":438088340120272897,"start_key":"dIAAAAAAAAH/cV9yAAAAAAD6","end_key":"dIAAAAAAAAH/cV9ykAAAAAD/BLihAAAAAAD6","request_id":205,"extra_op":1}”] [addr=ip1:20160]
[2022/12/16 07:06:34.521 +00:00] [INFO] [client.go:1347] [“region state entry will be replaced because received message of newer requestID”] [regionID=1772637] [oldRequestID=203] [requestID=205] [addr=ip1:20160]
[2022/12/16 07:06:35.378 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=2m10.000230906s]
[2022/12/16 07:06:35.379 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012586] [span=“[7480000000000001ff715f729000000000ff04b8a10000000000fa, 7480000000000001ff715f72c000000000ff01f9ae0000000000fa)”] [duration=2m11.15s] [lastEvent=2m10.000230906s] [resolvedTs=438088306277482497]
[2022/12/16 07:06:35.382 +00:00] [INFO] [lock_resolver.go:122] [“resolve lock successfully”] [regionID=1012586] [maxVersion=438088338036228096]
[2022/12/16 07:06:35.557 +00:00] [WARN] [client.go:1515] [“region not receiving event from tikv for too long time”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=31m43.419696046s]
[2022/12/16 07:06:35.558 +00:00] [WARN] [client.go:1531] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=1012517] [span=“[7480000000000001ffd25f720000000000fa, 7480000000000001ffd25f730000000000fa)”] [duration=1h26m0.35s] [lastEvent=31m43.419696046s] [resolvedTs=438086987902877697]

【资源配置】
【附件:截图/日志/监控】

你好
请按以下方法排查:

  1. 查看 TiCDC 日志有包含以下关键字告警,并且有打印具体 region id 和 resolvets 的时间,可以使用 pd-ctl tso 确认卡住的大概时间 : “The resolvedTs is fallen back in kvclient” “region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”

  2. 根据告警的 region id 查看所有 TiKV 日志,查看最后一次 register 和 deregister 的日志信息 grep * | grep register

grep 2256981 * | grep register

[2021/05/15 23:48:22.801 +08:00] [INFO] [endpoint.rs:484] [ "cdc register region" ] [downstream_id=DownstreamID(2020)] [req_id=87945] [conn_id=ConnID(98)] [region_id=2256981]

[2021/05/16 05:28:48.574 +08:00] [INFO] [endpoint.rs:351] [ "cdc deregister" ] [deregister= "Deregister { deregister: " region ", region_id: 2256981, observe_id: ObserveID(1887), err: Request(message: " peer is not leader for region 2256981, leader may None " not_leader { region_id: 2256981 }) }" ]

  1. 查看这个region id 最后一次 schedule resolver 的日志 grep * | grep ‘schedule resolver ready’

grep 2256981 * | grep 'schedule resolver ready'

[2021/05/15 23:48:21.777 +08:00] [INFO] [endpoint.rs:1228] [ "cdc resolver initialized and schedule resolver ready" ] [takes=176ms] [observe_id=ObserveID(1887)] [lock_count=0] [resolved_ts=0] [downstream_id=DownstreamID(1887)] [conn_id=ConnID(36)] [region_id=2256981]

  1. 如果步骤 2 最后一次 register 时间 > 步骤 3 schedule resolver 的时间,说明 TiKV 卡了

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。