ticdc异常报错

【 TiDB 使用环境】生产环境 /测试/ Poc
k8s部署的测试环境
【 TiDB 版本】
5.4.1
【复现路径】
将tidb的增量数据同步到mysql
【遇到的问题:问题现象及影响】
由于该tidb中数据表比较多(28万张表)开启ticdc同步后一直卡着不动,并且无法执行其他的操作

同步任务:
curl -X POST -H “‘Content-type’:‘application/json’” http://192.160.139.56:8301/api/v1/changefeeds -d ‘{“force_replicate”: true, “start_ts”:439384345192955912, “changefeed_id”:“wps-mysql”,“sink_uri”:“mysql://admin:dsflafvas8d@x.x.x.x:x.x.x./?time-zone=&worker-count=16&max-txn-row=256”}’

查询同步信息:
[root@vm10-32-1-164 ticdc]# curl -X GET http://192.160.139.56:8301/api/v1/changefeeds?state=all
{
“error_msg”: “context deadline exceeded”,
“error_code”: “”

【资源配置】
tidb、pd、tikv、ticdc:3副本、8核32G
【附件:截图/日志/监控】
ticdc日志:
[2023/02/13 13:08:36.060 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2152242] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.071 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2476130] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.083 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2342500] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.095 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2446730] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.107 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2290403] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=42.000288667s] [role=owner]
[2023/02/13 13:08:36.119 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2161778] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.131 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2505686] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.142 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2138490] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.154 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2245614] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.166 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2064686] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.178 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2543234] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.189 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2090784] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.201 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2034058] [boundaryTs=439384345192955912]
[2023/02/13 13:08:36.216 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2342768] [boundaryTs=439384345192955912]

ticdc监控:

https://github.com/pingcap/tiflow/issues/4987

有点像这个问题~

这个问题已经在 6.0 以后的版本被修复了。测试环境你可以尝试下新版本 6.5.0

1、上游tidb有28w个表,这个确实有点多,是在同一个库 还是分布在多个库?可否尝试分多个changefeed?
2、报错日志明显是PD-Server卡死不动,pd集群是怎么部署的?etcd对io的要求比较高。
3、可以看看etcd的负载情况,哪些到了瓶颈?

1、28万张表,分布在3000个库中,没个库大约100张表,目前计划分多个changefeed进行处理,但ticdc卡住不动了,操作不了无法删除当前的changefeed

2、pd集群在k8s中部署的,3副本,4核、内存8G,磁盘100G,磁盘都是ssd,pd最近24小时的监控图如下:

当前pd的leader日志如下:
[2023/02/14 09:00:57.713 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202584] [step=“transfer leader from store 1 to store 5”] [source=create]
[2023/02/14 09:00:57.714 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202584] [from=1] [to=5]
[2023/02/14 09:00:57.715 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202584] [takes=1.684627ms] [operator=“"transfer-hot-read-leader {transfer leader: store 1 to 5} (kind:hot-region,leader, region:1202584(86,17), createAt:2023-02-14 09:00:57.712943653 +0800 CST m=+941027.276799437, startAt:2023-02-14 09:00:57.713414336 +0800 CST m=+941027.277270130, currentStep:1, steps:[transfer leader from store 1 to store 5]) finished"”] [additional-info=]
[2023/02/14 09:00:59.114 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1030033] [operator=“"balance-leader {transfer leader: store 5 to 1} (kind:leader, region:1030033(81,17), createAt:2023-02-14 09:00:59.114682035 +0800 CST m=+941028.678537814, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 1])"”] [additional-info=“{"sourceScore":"236.00","targetScore":"234.00"}”]
[2023/02/14 09:00:59.114 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1030033] [step=“transfer leader from store 5 to store 1”] [source=create]
[2023/02/14 09:00:59.116 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1030033] [from=5] [to=1]
[2023/02/14 09:00:59.116 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1030033] [takes=1.435858ms] [operator=“"balance-leader {transfer leader: store 5 to 1} (kind:leader, region:1030033(81,17), createAt:2023-02-14 09:00:59.114682035 +0800 CST m=+941028.678537814, startAt:2023-02-14 09:00:59.114890233 +0800 CST m=+941028.678746021, currentStep:1, steps:[transfer leader from store 5 to store 1]) finished"”] [additional-info=“{"sourceScore":"236.00","targetScore":"234.00"}”]
[2023/02/14 09:01:37.726 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202596] [operator=“"transfer-hot-read-leader {transfer leader: store 1 to 4} (kind:hot-region,leader, region:1202596(88,17), createAt:2023-02-14 09:01:37.726820516 +0800 CST m=+941067.290676272, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 1 to store 4])"”] [additional-info=]
[2023/02/14 09:01:37.727 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202596] [step=“transfer leader from store 1 to store 4”] [source=create]
[2023/02/14 09:01:37.729 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202596] [from=1] [to=4]
[2023/02/14 09:01:37.729 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202596] [takes=2.227342ms] [operator=“"transfer-hot-read-leader {transfer leader: store 1 to 4} (kind:hot-region,leader, region:1202596(88,17), createAt:2023-02-14 09:01:37.726820516 +0800 CST m=+941067.290676272, startAt:2023-02-14 09:01:37.727068308 +0800 CST m=+941067.290924071, currentStep:1, steps:[transfer leader from store 1 to store 4]) finished"”] [additional-info=]
[2023/02/14 09:01:42.193 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1201876] [operator=“"balance-leader {transfer leader: store 5 to 1} (kind:leader, region:1201876(107,17), createAt:2023-02-14 09:01:42.193314649 +0800 CST m=+941071.757170398, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 1])"”] [additional-info=“{"sourceScore":"235.00","targetScore":"234.00"}”]
[2023/02/14 09:01:42.193 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1201876] [step=“transfer leader from store 5 to store 1”] [source=create]
[2023/02/14 09:01:42.194 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1201876] [from=5] [to=1]
[2023/02/14 09:01:42.195 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1201876] [takes=1.671202ms] [operator=“"balance-leader {transfer leader: store 5 to 1} (kind:leader, region:1201876(107,17), createAt:2023-02-14 09:01:42.193314649 +0800 CST m=+941071.757170398, startAt:2023-02-14 09:01:42.193461837 +0800 CST m=+941071.757317621, currentStep:1, steps:[transfer leader from store 5 to store 1]) finished"”] [additional-info=“{"sourceScore":"235.00","targetScore":"234.00"}”]
[2023/02/14 09:01:54.413 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202580] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 1} (kind:hot-region,leader, region:1202580(87,17), createAt:2023-02-14 09:01:54.413581571 +0800 CST m=+941083.977437355, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 1])"”] [additional-info=]
[2023/02/14 09:01:54.414 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202580] [step=“transfer leader from store 5 to store 1”] [source=create]
[2023/02/14 09:01:54.415 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202580] [from=5] [to=1]
[2023/02/14 09:01:54.415 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202580] [takes=1.654517ms] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 1} (kind:hot-region,leader, region:1202580(87,17), createAt:2023-02-14 09:01:54.413581571 +0800 CST m=+941083.977437355, startAt:2023-02-14 09:01:54.414144201 +0800 CST m=+941083.977999990, currentStep:1, steps:[transfer leader from store 5 to store 1]) finished"”] [additional-info=]
[2023/02/14 09:01:55.714 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202620] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 4} (kind:hot-region,leader, region:1202620(88,17), createAt:2023-02-14 09:01:55.714092001 +0800 CST m=+941085.277947783, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 4])"”] [additional-info=]
[2023/02/14 09:01:55.714 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202620] [step=“transfer leader from store 5 to store 4”] [source=create]
[2023/02/14 09:01:55.716 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202620] [from=5] [to=4]
[2023/02/14 09:01:55.716 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202620] [takes=2.082143ms] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 4} (kind:hot-region,leader, region:1202620(88,17), createAt:2023-02-14 09:01:55.714092001 +0800 CST m=+941085.277947783, startAt:2023-02-14 09:01:55.714406114 +0800 CST m=+941085.278261902, currentStep:1, steps:[transfer leader from store 5 to store 4]) finished"”] [additional-info=]
[2023/02/14 09:04:54.814 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202612] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 1} (kind:hot-region,leader, region:1202612(87,17), createAt:2023-02-14 09:04:54.813887497 +0800 CST m=+941264.377743282, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 1])"”] [additional-info=]
[2023/02/14 09:04:54.814 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202612] [step=“transfer leader from store 5 to store 1”] [source=create]
[2023/02/14 09:04:54.815 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202612] [from=5] [to=1]
[2023/02/14 09:04:54.816 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202612] [takes=1.80506ms] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 1} (kind:hot-region,leader, region:1202612(87,17), createAt:2023-02-14 09:04:54.813887497 +0800 CST m=+941264.377743282, startAt:2023-02-14 09:04:54.814324868 +0800 CST m=+941264.378180654, currentStep:1, steps:[transfer leader from store 5 to store 1]) finished"”] [additional-info=]
[2023/02/14 09:04:56.114 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202624] [operator=“"transfer-hot-read-leader {transfer leader: store 5 to 4} (kind:hot-region,leader, region:1202624(88,17), createAt:2023-02-14 09:04:56.114255348 +0800 CST m=+941265.678111133, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 4])"”] [additional-info=]
[2023/02/14 09:04:56.114 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202624] [step=“transfer leader from store 5 to store 4”] [source=create]
[2023/02/14 09:04:56.116 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202624] [from=5] [to=4]
[2023/02/14 09:04:56.116 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202624] [takes=2.007555ms] [operator=“"transfer-hot-read-leader {transfer leader: store 5 to 4} (kind:hot-region,leader, region:1202624(88,17), createAt:2023-02-14 09:04:56.114255348 +0800 CST m=+941265.678111133, startAt:2023-02-14 09:04:56.114821758 +0800 CST m=+941265.678677547, currentStep:1, steps:[transfer leader from store 5 to store 4]) finished"”] [additional-info=]
[2023/02/14 09:07:55.136 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202576] [operator=“"transfer-hot-read-leader {transfer leader: store 5 to 4} (kind:hot-region,leader, region:1202576(88,17), createAt:2023-02-14 09:07:55.13624988 +0800 CST m=+941444.700105662, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 4])"”] [additional-info=]
[2023/02/14 09:07:55.136 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202576] [step=“transfer leader from store 5 to store 4”] [source=create]
[2023/02/14 09:07:55.138 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202576] [from=5] [to=4]
[2023/02/14 09:07:55.139 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202576] [takes=2.390309ms] [operator=“"transfer-hot-read-leader {transfer leader: store 5 to 4} (kind:hot-region,leader, region:1202576(88,17), createAt:2023-02-14 09:07:55.13624988 +0800 CST m=+941444.700105662, startAt:2023-02-14 09:07:55.136876198 +0800 CST m=+941444.700731999, currentStep:1, steps:[transfer leader from store 5 to store 4]) finished"”] [additional-info=]
[2023/02/14 09:07:56.515 +08:00] [INFO] [operator_controller.go:437] [“add operator”] [region-id=1202564] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 1} (kind:hot-region,leader, region:1202564(87,17), createAt:2023-02-14 09:07:56.514716191 +0800 CST m=+941446.078571973, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 5 to store 1])"”] [additional-info=]
[2023/02/14 09:07:56.515 +08:00] [INFO] [operator_controller.go:635] [“send schedule command”] [region-id=1202564] [step=“transfer leader from store 5 to store 1”] [source=create]
[2023/02/14 09:07:56.517 +08:00] [INFO] [region.go:543] [“leader changed”] [region-id=1202564] [from=5] [to=1]
[2023/02/14 09:07:56.517 +08:00] [INFO] [operator_controller.go:552] [“operator finish”] [region-id=1202564] [takes=1.917522ms] [operator=“"transfer-hot-write-leader {transfer leader: store 5 to 1} (kind:hot-region,leader, region:1202564(87,17), createAt:2023-02-14 09:07:56.514716191 +0800 CST m=+941446.078571973, startAt:2023-02-14 09:07:56.515752957 +0800 CST m=+941446.079608752, currentStep:1, steps:[transfer leader from store 5 to store 1]) finished"”] [additional-info=]
[2023/02/14 09:15:10.515 +08:00] [INFO] [trace.go:145] [“trace[1611567014] put”] [detail=“{key:/tidb/server/minstartts/fc5f7000-6d43-4523-8270-905245b1c2c2; req_size:92; response_revision:7487222; }”] [duration=102.369641ms] [start=2023/02/14 09:15:10.413 +08:00] [end=2023/02/14 09:15:10.515 +08:00] [steps=“["trace[1611567014] ‘process raft request’ (duration: 102.007675ms)"]”]
[2023/02/14 09:15:10.714 +08:00] [WARN] [util.go:144] [“apply request took too long”] [took=102.086683ms] [expected-duration=100ms] [prefix=] [request=“header:<ID:6749072942850525650 > txn:<compare:<target:VALUE key:"/pd/7194634827558957893/leader" value_size:180 > success:<request_put:<key:"/pd/7194634827558957893/timestamp" value_size:8 >> failure:<>>”] [response=size:20] []
[2023/02/14 09:15:10.714 +08:00] [WARN] [tso.go:314] [“clock offset”] [jet-lag=202.134206ms] [prev-physical=2023/02/14 09:15:10.512 +08:00] [now=2023/02/14 09:15:10.714 +08:00] [update-physical-interval=50ms]
[2023/02/14 09:16:12.714 +08:00] [WARN] [stat_persist.go:20] [trace] [elapsed=196.816581ms] [rows=1] [sql=“INSERT INTO keyviz_axis (layer_num,time,axis) VALUES (0,"2023-02-14 09:16:12.513","")”]
[2023/02/14 09:29:38.015 +08:00] [WARN] [tso.go:314] [“clock offset”] [jet-lag=203.701776ms] [prev-physical=2023/02/14 09:29:37.811 +08:00] [now=2023/02/14 09:29:38.015 +08:00] [update-physical-interval=50ms]
[2023/02/14 09:37:21.249 +08:00] [INFO] [periodic.go:135] [“starting auto periodic compaction”] [revision=7485358] [compact-period=1h0m0s]
[2023/02/14 09:37:21.250 +08:00] [INFO] [index.go:189] [“compact tree index”] [revision=7485358]
[2023/02/14 09:37:21.250 +08:00] [INFO] [periodic.go:146] [“completed auto periodic compaction”] [revision=7485358] [compact-period=1h0m0s] [took=1h0m0.00881159s]
[2023/02/14 09:37:21.280 +08:00] [INFO] [kvstore_compaction.go:55] [“finished scheduled compaction”] [compact-revision=7485358] [took=29.434239ms]
[2023/02/14 09:44:08.512 +08:00] [WARN] [tso.go:314] [“clock offset”] [jet-lag=194.761498ms] [prev-physical=2023/02/14 09:44:08.317 +08:00] [now=2023/02/14 09:44:08.512 +08:00] [update-physical-interval=50ms]
[2023/02/14 09:45:10.111 +08:00] [WARN] [tso.go:314] [“clock offset”] [jet-lag=195.61578ms] [prev-physical=2023/02/14 09:45:09.916 +08:00] [now=2023/02/14 09:45:10.111 +08:00] [update-physical-interval=50ms]
[2023/02/14 09:45:10.512 +08:00] [INFO] [trace.go:145] [“trace[1405525310] put”] [detail=“{key:/tidb/server/minstartts/fc5f7000-6d43-4523-8270-905245b1c2c2; req_size:92; response_revision:7488498; }”] [duration=100.270714ms] [start=2023/02/14 09:45:10.412 +08:00] [end=2023/02/14 09:45:10.512 +08:00] [steps=“["trace[1405525310] ‘process raft request’ (duration: 99.569087ms)"]”]
[2023/02/14 09:48:16.612 +08:00] [WARN] [tso.go:314] [“clock offset”] [jet-lag=196.066851ms] [prev-physical=2023/02/14 09:48:16.415 +08:00] [now=2023/02/14 09:48:16.612 +08:00] [update-physical-interval=50ms]

[2023/02/14 10:11:01.114 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m7.000269012s] [role=owner]
[2023/02/14 10:11:02.114 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m8.000694212s] [role=owner]
[2023/02/14 10:11:03.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m8.999812767s] [role=owner]
[2023/02/14 10:11:03.465 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2214488] [boundaryTs=439384345192955912]
[2023/02/14 10:11:04.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m10.000167683s] [role=owner]
[2023/02/14 10:11:04.153 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2552138] [boundaryTs=439384345192955912]
[2023/02/14 10:11:04.657 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2175188] [boundaryTs=439384345192955912]
[2023/02/14 10:11:05.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m11.000290323s] [role=owner]
[2023/02/14 10:11:05.290 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2177820] [boundaryTs=439384345192955912]
[2023/02/14 10:11:05.930 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2532038] [boundaryTs=439384345192955912]
[2023/02/14 10:11:06.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m12.000238722s] [role=owner]
[2023/02/14 10:11:06.473 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2291716] [boundaryTs=439384345192955912]
[2023/02/14 10:11:07.062 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2362074] [boundaryTs=439384345192955912]
[2023/02/14 10:11:07.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m12.999825308s] [role=owner]
[2023/02/14 10:11:07.772 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2504164] [boundaryTs=439384345192955912]
[2023/02/14 10:11:08.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m13.999992284s] [role=owner]
[2023/02/14 10:11:08.279 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2100966] [boundaryTs=439384345192955912]
[2023/02/14 10:11:08.785 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2080592] [boundaryTs=439384345192955912]
[2023/02/14 10:11:09.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m14.999773675s] [role=owner]
[2023/02/14 10:11:09.296 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2520294] [boundaryTs=439384345192955912]
[2023/02/14 10:11:09.800 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2310130] [boundaryTs=439384345192955912]
[2023/02/14 10:11:10.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m15.999846184s] [role=owner]
[2023/02/14 10:11:10.304 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2293200] [boundaryTs=439384345192955912]
[2023/02/14 10:11:10.809 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2278806] [boundaryTs=439384345192955912]
[2023/02/14 10:11:11.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m16.999849849s] [role=owner]
[2023/02/14 10:11:11.330 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2382734] [boundaryTs=439384345192955912]
[2023/02/14 10:11:11.837 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2353290] [boundaryTs=439384345192955912]
[2023/02/14 10:11:12.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m17.999979099s] [role=owner]
[2023/02/14 10:11:12.343 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2544294] [boundaryTs=439384345192955912]
[2023/02/14 10:11:12.849 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2323784] [boundaryTs=439384345192955912]
[2023/02/14 10:11:13.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m19.000151791s] [role=owner]
[2023/02/14 10:11:13.354 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2291294] [boundaryTs=439384345192955912]
[2023/02/14 10:11:13.857 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2295454] [boundaryTs=439384345192955912]
[2023/02/14 10:11:14.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m20.000309385s] [role=owner]
[2023/02/14 10:11:14.364 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2541532] [boundaryTs=439384345192955912]
[2023/02/14 10:11:14.870 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2413333] [boundaryTs=439384345192955912]
[2023/02/14 10:11:15.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m21.000312337s] [role=owner]
[2023/02/14 10:11:15.376 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2311964] [boundaryTs=439384345192955912]
[2023/02/14 10:11:15.881 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2053422] [boundaryTs=439384345192955912]
[2023/02/14 10:11:16.113 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=21h3m21.999652355s] [role=owner]
[2023/02/14 10:11:16.390 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2296884] [boundaryTs=439384345192955912]
[2023/02/14 10:11:16.899 +08:00] [INFO] [owner.go:280] [“add a table”] [tableId=2322164] [boundaryTs=439384345192955912]

以上内容是ticdc的日志,

1、“add a table”对应的是否是schedule的Add Table流程
2、怎么查看add table的进度
3、“etcd client outCh blocking too long, the etcdWorker may be stuck”]这个报错是否出发了这个bughttps://github.com/pingcap/tiflow/issues/4987


看issue是这样的

目前 TiCDC 仅支持同步 10 万表内的 TiDB 集群(包含最新的 6.5 版本)。可以在 changefeed 的配置中添加 [filter.rules] 过滤不必要的表,以减少同步表个数。

https://docs.pingcap.com/zh/tidb/stable/ticdc-changefeed-config

该参数已经配置了,当前每个changefeed同步大约3000张表,配置如下

{
“info”: {
“sink-uri”: “mysql://admin:ghj8d@10…180:21268/?time-zone=\u0026worker-count=16\u0026max-txn-row=5000”,
“opts”: {
“_changefeed_id”: “sink-verify”
},
“create-time”: “2023-02-14T17:42:29.961246216+08:00”,
“start-ts”: 439447996443131906,
“target-ts”: 0,
“admin-job-type”: 0,
“sort-engine”: “unified”,
“sort-dir”: “”,
“config”: {
“case-sensitive”: true,
“enable-old-value”: true,
“force-replicate”: true,
“check-gc-safe-point”: true,
“filter”: {
“rules”: [
“cwbaseshow1025.",
"cwbaseshow1026.
”,
“cwbaseshow1027.",
"cwbaseshow1028.
”,
“cwbaseshow1029.",
"cwbaseshow103.
”,
“cwbaseshow1030.",
"cwbaseshow1031.
”,
“cwbaseshow1032.",
"cwbaseshow1033.
”,
“cwbaseshow1034.",
"cwbaseshow1035.
”,
“cwbaseshow1036.",
"cwbaseshow1037.
”,
“cwbaseshow1038.",
"cwbaseshow1039.
”,
“cwbaseshow104.",
"cwbaseshow1040.
”,
“cwbaseshow1041.",
"cwbaseshow1042.
”,
“cwbaseshow1043.",
"cwbaseshow1044.
”,
“cwbaseshow1045.",
"cwbaseshow1046.
”,
“cwbaseshow1047.",
"cwbaseshow1048.
”,
“cwbaseshow1049.",
"cwbaseshow105.
”,
“cwbaseshow1050.",
"cwbaseshow1051.

],
“ignore-txn-start-ts”: null
},
“mounter”: {
“worker-num”: 16
},
“sink”: {
“dispatchers”: null,
“protocol”: “”,
“column-selectors”: null
},
“cyclic-replication”: {
“enable”: false,
“replica-id”: 0,
“filter-replica-ids”: null,
“id-buckets”: 0,
“sync-ddl”: false
},
“scheduler”: {
“type”: “table-number”,
“polling-time”: -1
},
“consistent”: {
“level”: “none”,
“max-log-size”: 64,
“flush-interval”: 1000,
“storage”: “”
}
},
“state”: “normal”,
“error”: null,
“sync-point-enabled”: false,
“sync-point-interval”: 600000000000,
“creator-version”: “v5.4.1”
},
“status”: {
“resolved-ts”: 439450174251597830,
“checkpoint-ts”: 439450173989191687,
“admin-job-type”: 0
},
“count”: 0,
“task-status”: [
{
“capture-id”: “40618b2a-e18b-47e2-958c-20be07da05a8”,
“status”: {
“tables”: {
“2007506”: {
“start-ts”: 439447996443131906,
“mark-table-id”: 0
},
“2007756”: {
“start-ts”: 439447996443131906,
“mark-table-id”: 0
},
“2007912”: {
“start-ts”: 439447996443131906,
“mark-table-id”: 0
},
“2008074”: {
“start-ts”: 439447996443131906,
“mark-table-id”: 0
},
“2008122”: {
“start-ts”: 439447996443131906,
“mark-table-id”: 0
},

changefeed创建非常慢,大约20min才能创建成功,ticdc日志中有大量的如下提示:

[2023/02/14 17:46:14.634 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=12.997394111s] [role=processor]
[2023/02/14 17:46:15.635 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=13.99796724s] [role=processor]
[2023/02/14 17:57:16.831 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=3m9.997947845s] [role=processor]
[2023/02/14 17:57:17.831 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=3m10.997595524s] [role=processor]
[2023/02/14 18:01:04.293 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=10.58260022s] [role=owner]
[2023/02/14 18:01:05.293 +08:00] [WARN] [client.go:272] [“etcd client outCh blocking too long, the etcdWorker may be stuck”] [duration=11.582521251s] [role=owner]

该类日志说明 TiCDC 已经接近同步极限,v5 的 TiCDC 同步的表数有限,建议使用 v6.1 LTS 版本,会稳定很多。

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。