生产cdc同步异常,提示CDC:ErrMySQLTxnError]Error 8021,在线等~。。

版本:5.0.3

早上出现cdc同步异常提示,Error 8021: inconsistent extra index PRIMARY, handle 14 not found in table
日志里面咩有体现库表信息,这要怎么处理?
求解。

查询官网该错误为:
当向 TiKV 读取的 key 不存在时将出现该错误,该错误用于内部使用,对外表现为读到的结果为空。

[2021/11/27 10:18:14.506 +08:00] [INFO] [client.go:868] [“start new request”] [request="{“header”:{“cluster_id”:6964287679291523214,“ticdc_version”:“5.0.3”},“region_id”:8802582,“region_epoch”:{“conf_ver”:77,“version”:18127},“checkpoint_ts”:429386496464650249,“start_key”:“dIAAAAAAAKf/K19ygAAAAAD/AhcxAAAAAAD6”,“end_key”:“dIAAAAAAAKf/K19ygAAAAAD/A7wfAAAAAAD6”,“request_id”:120049,“extra_op”:1,“Request”:null}"] [addr=172.19.16.199:20160]
[2021/11/27 10:18:14.553 +08:00] [INFO] [client.go:1355] [“region state entry will be replaced because received message of newer requestID”] [regionID=8827863] [oldRequestID=117315] [requestID=120047] [addr=172.19.16.199:20160]
[2021/11/27 10:18:14.605 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=4226] [regionID=9127449] [startKey=74800000000000c1ff095f720000000000fa] [endKey=74800000000000c1ff095f730000000000fa] [checkpointTs=429386495940362258]
[2021/11/27 10:18:14.605 +08:00] [INFO] [region_cache.go:958] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=1] [regionID=9127449]
[2021/11/27 10:18:14.605 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=4226] [regionID=9127449] [startKey=74800000000000c1ff095f720000000000fa] [endKey=74800000000000c1ff095f730000000000fa] [checkpointTs=429386495940362258]
[2021/11/27 10:18:14.605 +08:00] [INFO] [client.go:868] [“start new request”] [request="{“header”:{“cluster_id”:6964287679291523214,“ticdc_version”:“5.0.3”},“region_id”:9127449,“region_epoch”:{“conf_ver”:119,“version”:21489},“checkpoint_ts”:429386495940362258,“start_key”:“dIAAAAAAAMH/CV9yAAAAAAD6”,“end_key”:“dIAAAAAAAMH/CV9zAAAAAAD6”,“request_id”:120050,“extra_op”:1,“Request”:null}"] [addr=172.19.16.199:20160]
[2021/11/27 10:18:14.705 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=4255] [regionID=9127449] [startKey=74800000000000c1ff445f720000000000fa] [endKey=74800000000000c1ff445f728000000000ff0afd650000000000fa] [checkpointTs=429386495940362258]
[2021/11/27 10:18:14.705 +08:00] [INFO] [region_cache.go:958] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=1] [regionID=9127449]
[2021/11/27 10:18:14.705 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=4255] [regionID=9127449] [startKey=74800000000000c1ff445f720000000000fa] [endKey=74800000000000c1ff445f728000000000ff0afd650000000000fa] [checkpointTs=429386495940362258]
[2021/11/27 10:18:14.705 +08:00] [INFO] [client.go:868] [“start new request”] [request="{“header”:{“cluster_id”:6964287679291523214,“ticdc_version”:“5.0.3”},“region_id”:9127449,“region_epoch”:{“conf_ver”:119,“version”:21489},“checkpoint_ts”:429386495940362258,“start_key”:“dIAAAAAAAMH/RF9yAAAAAAD6”,“end_key”:“dIAAAAAAAMH/RF9ygAAAAAD/Cv1lAAAAAAD6”,“request_id”:120051,“extra_op”:1,“Request”:null}"] [addr=172.19.16.199:20160]
[2021/11/27 10:18:14.802 +08:00] [ERROR] [processor.go:1399] [“error on running processor”] [capture=172.19.16.189:8300] [changefeed=hk-to-idc-tidb] [processor=52dc1fd5-75a4-43dd-a095-3171dc2b3da1] [error="[CDC:ErrMySQLTxnError]Error 8021: inconsistent extra index PRIMARY, handle 14 not found in table"] [errorVerbose="[CDC:ErrMySQLTxnError]Error 8021: inconsistent extra index PRIMARY, handle 14 not found in table\ngithub.com/pingcap/errors.AddStack\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ github.com/pingcap/errors.(*Error).GenWithStackByCause\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:279\ github.com/pingcap/ticdc/pkg/errors.WrapError\ \tgithub.com/pingcap/ticdc@/pkg/errors/helper.go:28\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLWithMaxRetries.func2.3\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:905\ github.com/pingcap/ticdc/cdc/sink.(*Statistics).RecordBatchExecution\ \tgithub.com/pingcap/ticdc@/cdc/sink/statistics.go:99\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLWithMaxRetries.func2\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:893\ github.com/pingcap/ticdc/pkg/retry.Run.func1\ \tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:32\ github.com/cenkalti/backoff.RetryNotify\ \tgithub.com/cenkalti/backoff@v2.2.1+incompatible/retry.go:37\ github.com/cenkalti/backoff.Retry\ \tgithub.com/cenkalti/backoff@v2.2.1+incompatible/retry.go:24\ github.com/pingcap/ticdc/pkg/retry.Run\ \tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:31\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLWithMaxRetries\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:885\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLs\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:1044\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSinkWorker).run.func3\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:799\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSinkWorker).run\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:837\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).createSinkWorkers.func1\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:640\ runtime.goexit\ \truntime/asm_amd64.s:1357"]
[2021/11/27 10:18:14.905 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=3686] [regionID=9127449] [startKey=74800000000000c0ffd55f720000000000fa] [endKey=74800000000000c0ffd55f730000000000fa] [checkpointTs=429386495940362258]
[2021/11/27 10:18:14.905 +08:00] [INFO] [region_cache.go:958] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=1] [regionID=9127449]
[2021/11/27 10:18:14.905 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=3686] [regionID=9127449] [startKey=74800000000000c0ffd55f720000000000fa] [endKey=74800000000000c0ffd55f730000000000fa] [checkpointTs=429386495940362258]
[2021/11/27 10:18:14.905 +08:00] [INFO] [client.go:868] [“start new request”] [request="{“header”:{“cluster_id”:6964287679291523214,“ticdc_version”:“5.0.3”},“region_id”:9127449,“region_epoch”:{“conf_ver”:119,“version”:21489},“checkpoint_ts”:429386495940362258,“start_key”:“dIAAAAAAAMD/1V9yAAAAAAD6”,“end_key”:“dIAAAAAAAMD/1V9zAAAAAAD6”,“request_id”:120052,“extra_op”:1,“Request”:null}"] [addr=172.19.16.199:20160]
[2021/11/27 10:18:14.918 +08:00] [INFO] [owner.go:1065] [“handle admin job”] [changefeed=hk-to-idc-tidb] [type=“stop changefeed”]
[2021/11/27 10:18:14.963 +08:00] [INFO] [processor.go:1302] [“stop processor”] [id=52dc1fd5-75a4-43dd-a095-3171dc2b3da1] [capture=172.19.16.189:8300] [changefeed=hk-to-idc-tidb]
[2021/11/27 10:18:14.965 +08:00] [ERROR] [processor.go:1437] [“processor receives redundant error”] [error="[CDC:ErrAdminStopProcessor]stop processor by admin command"] [errorVerbose="[CDC:ErrAdminStopProcessor]stop processor by admin command\ngithub.com/pingcap/errors.AddStack\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ github.com/pingcap/errors.(*Error).GenWithStackByArgs\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:156\ github.com/pingcap/ticdc/cdc.(*oldProcessor).stop\ \tgithub.com/pingcap/ticdc@/cdc/processor.go:1321\ github.com/pingcap/ticdc/cdc.(*Capture).handleTaskEvent\ \tgithub.com/pingcap/ticdc@/cdc/capture.go:284\ github.com/pingcap/ticdc/cdc.(*Capture).Run\ \tgithub.com/pingcap/ticdc@/cdc/capture.go:212\ github.com/pingcap/ticdc/cdc.(*Server).run.func3\ \tgithub.com/pingcap/ticdc@/cdc/server.go:320\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1357"]
[2021/11/27 10:18:15.011 +08:00] [INFO] [client.go:1236] [“stream to store closed”] [addr=172.19.16.199:20160] [storeID=3248593]
[2021/11/27 10:18:15.036 +08:00] [INFO] [client.go:1236] [“stream to store closed”] [addr=172.19.16.199:20160] [storeID=3248593]
[2021/11/27 10:18:15.052 +08:00] [WARN] [mysql.go:882] [“execute DMLs with error, retry later”] [error="[CDC:ErrMySQLTxnError]sql: database is closed"] [errorVerbose="[CDC:ErrMySQLTxnError]sql: database is closed\ngithub.com/pingcap/errors.AddStack\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ github.com/pingcap/errors.(*Error).GenWithStackByCause\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:279\ github.com/pingcap/ticdc/pkg/errors.WrapError\ \tgithub.com/pingcap/ticdc@/pkg/errors/helper.go:28\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLWithMaxRetries.func2.3\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:896\ github.com/pingcap/ticdc/cdc/sink.(*Statistics).RecordBatchExecution\ \tgithub.com/pingcap/ticdc@/cdc/sink/statistics.go:99\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLWithMaxRetries.func2\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:893\ github.com/pingcap/ticdc/pkg/retry.Run.func1\ \tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:32\ github.com/cenkalti/backoff.RetryNotify\ \tgithub.com/cenkalti/backoff@v2.2.1+incompatible/retry.go:37\ github.com/cenkalti/backoff.Retry\ \tgithub.com/cenkalti/backoff@v2.2.1+incompatible/retry.go:24\ github.com/pingcap/ticdc/pkg/retry.Run\ \tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:31\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLWithMaxRetries\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:885\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDMLs\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:1044\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSinkWorker).run.func3\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:799\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSinkWorker).run\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:837\ github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).createSinkWorkers.func1\ \tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:640\ runtime.goexit\ \truntime/asm_amd64.s:1357"]

1 个赞

curl [http://XXX.XXX.XXX.XXX:10080/regions/{region_id}] 可以查看很多库表,但是我如何确定报错的是哪个库?

通过 tikv 的 region 可以查到的
https://docs.pingcap.com/zh/tidb/stable/information-schema-tikv-region-status#tikv_region_status

可以试试

1 个赞

你好, 这个表里的数据和curl region_id获取到的数据一致,表的数量不少,我怎么确定那个表的索引和表数据有异常?

你用过region id 去获取表名,看看这个表的主键是什么?

日志上描述的是主键冲突呢

Error 8021: inconsistent extra index PRIMARY, handle 14 not found in table

然后确认下是否真有主键上的数据冲突,可以选择跳过,这个部分可以参考CDC 的运维操作

https://docs.pingcap.com/zh/tidb/stable/troubleshoot-ticdc#如何判断-ticdc-同步任务出现中断

1 个赞

不好意思哈,我用region_id 查出来的表有50张左右,我怎么判断是哪张表?

上图看看

“checkpoint_ts”:429386495940362258,“start_key”:“dIAAAAAAAMH/CV9yAAAAAAD6”,“end_key”:“dIAAAAAAAMH/CV9zAAAAAAD6”,

追加 start 和 end 进行过滤呢?

然后可以考虑通过 解码 key,找到对应的行记录

你参考下这个:
https://docs.pingcap.com/zh/tidb/stable/tidb-functions#tidb_decode_key

后面的只能靠你自己排查了…

有新的进度,可以更新帖子,我会关注的

1 个赞

使用start end 过滤没有相应的内容,。 前面我是用增量覆盖掉的, 可以正常同步一段时间, 后面还是会出现这个问题,没办法了, 我重头做次全量把。
给个建议, cdc同步失败,直接把同步的库表打印出来到message里面,也方便定位,好几次都i找不到库表靠猜,也不知道是不是我姿势不对

嗯,你的需求我会反馈给产研的大佬,看看后续有没有改进的计划,如果有计划回头我再来更新吧

这个确实很难受,不是你姿势不对(新的东西总会有迭代的一个过程,感谢你的反馈)

查一下报错的 changefeed 有没有 table id 信息
cdc cli processor query --pd=http://pd-ip:2379 --changefeed-id=hk-to-idc-tidb --capture-id=xxx

感谢,我今天删掉重新同步了,如果后续有碰到 我再看看,希望用不上:joy:

Error 8021 看起来是下游 TiDB 返回的报错,意思是向 TiKV 读取的 key 不存在,这个报错并不常见,可能下游集群因为一些 bug 导致事务写入的部分 Key 数据丢失。

可以对下游所有可能出问题的表进行 admin check table,确认数据索引的一致性;

如果有上游数据,可以通过某些手段如跟上游数据进行比对(sync-diff-inspector)修复数据;如果上下游数据一致,仅有数据索引不一致,可以使用 admin recover index 来修复索引数据

1 个赞

出现这个问题的时候 我已经将上下游 region_id 里面涉及的所有表 check过一次 都正常, 主要就是无法确认是哪个表,或者可以通过tso来确认事务涉及的表?

这个问题好了吗,最后怎么处理的啊

最后重新同步,没招了

哦哦哦,重新同步就好了是吧嘛,关于表和索引数据不一致的问题,记得 PE 课程,专门有一个课程做了相关介绍。(asktug 中应该也有)

这个还有当时所有组件的日志吗?想要进一步分析一下。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。