v7.1 好像ddl卡住了?BUG吗

【TiDB 使用环境】生产环境
【TiDB 版本】v7.1.5
【操作系统】

【问题复现路径】做过哪些操作出现的问题
一个小表:一共54条数据,常规加字段DDL操作
但是过了挺久一直处于running状态。

tidb日志一直刷以下日志:

[2025/03/31 11:22:24.346 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.369 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.385 +08:00] [INFO] [coprocessor.go:1288] ["[TIME_COP_PROCESS] resp_time:523.545958ms txnStartTS:18446744073709551615 region_id:7993926 store_addr:192.168.241.76:20160 kv_process_ms:489 kv_wait_ms:0 kv_read_ms:0 processed_versions:15661 total_versions:15662 rocksdb_delete_skipped_count:296 rocksdb_key_skipped_count:31617 rocksdb_cache_hit_count:22 rocksdb_read_count:620 rocksdb_read_byte:9433977"]
[2025/03/31 11:22:24.385 +08:00] [INFO] [region_request.go:1467] ["throwing pseudo region error due to no replica available"] [req-ts=18446744073709551615] [req-type=Cop] [region="{ region id: 7993005, ver: 155490, confVer: 375821 }"] [region-is-valid=unknown] [retry-times=0] [replica-read-type=leader] [replica-selector-state=nil] [stale-read=false] [replica-status=] [total-backoff-ms=0] [total-backoff-times=0] [total-region-errors=]
[2025/03/31 11:22:24.391 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.414 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.436 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.459 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.482 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.505 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.527 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.550 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.573 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.595 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.617 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]
[2025/03/31 11:22:24.639 +08:00] [INFO] [syncer.go:362] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.241.85, port 4000, id 7aa68da4-4a9b-464d-8ac8-79eed371e10f"] ["ddl job id"=2444] [ver=3328]

admin cancel后,一直处于cancelling状态。

发现有类似问题的帖子是V6版本,没有发现最终可解决的方案。

看上去报错一样,也是提的issue

1 个赞

看着不是一个问题啊

是否开启了fast ddl,另外admin show ddl jobs看下行数执行了多少,如果是fast ddl开启,且有更多意外的版本信息就会有这种情况

1 个赞

image

row_count一直是 0

那可能就是这个问题,先把fast ddl关了,这样回滚就会快了,然后explain analyze select count(1) from table_name,看一下total_keys是否远大于真实的keys

1 个赞

我这个表一共就54条数据,感觉不是快慢的问题吧?

几十条确实不应该,如果不是我说的这个问题,且重启tidb也没用,那只能找研发来看看了

1 个赞

还没重启呢?想着先能定位问题
重启的话是所有tidb节点都要重启吗

1 个赞

查一下未关闭的事务连接。kill掉就好了

2 个赞

不行 kill 不掉
kill id ; kill tidb id; 都不行

1 个赞

有这个长事务未关闭会卡住DDL,那就只能重启TiDB节点了

1 个赞

kill了,重来了

1 个赞

遇到过类似问题,cancel job 和kill session都不行,最后重启tidb server解决

1 个赞

刚看了下 还在 cancelling 状态,还有TIDB负载变高了,不知道跟这有没关系~

只能重启一下看看了

只重启 当于ddl运行的tidb节点就行是吧?我们有5个tidb-server节点

是不是有长事务,导致无法完成fast ddl啊,排查一下。实在不行只能重启tidb server节点了。

1 个赞

这不提示这个节点没有同步 schema 吗,看下这个节点 load schema 正常不

1 个赞

有相关的日志么

1 个赞

admin show ddl jobs 在运行的事务就这一个,这个事务表就很少的数据理论上应该很快能完成的,
ROW_COUNT也一直是0