加索引卡在 SCHEMA_STATE delete only 状态

【 TiDB 使用环境】生产环境 /测试/ Poc
生产环境
【 TiDB 版本】
5.7.25-TiDB-v6.5.0
【复现路径】做过哪些操作出现的问题
正常加索引 字段类型 timestamp,索引类型 normal btree
【遇到的问题:问题现象及影响】
查看 ADMIN SHOW DDL JOBS;
卡在 SCHEMA_STATE delete only 状态

尝试过ADMIN CANCEL DDL jobs 也是卡住的 最终job状态是cancelling
【资源配置】
【附件:截图/日志/监控】
对应tidb的日志

[2023/01/09 13:46:49.584 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 172.18.150.111, port 4000, id 0705c04b-250f-44d8-9bd1-83a77063782b"] ["ddl id"=117697] [ver=121240]
t_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="ALTER TABLE `jinka_data`.`dy_video` \nADD INDEX `idx_update_time`(`update_time`) USING BTREE"]
[2023/01/09 13:45:07.253 +08:00] [INFO] [range_task.go:243] ["range task finished"] [name=resolve-locks-runner] [startKey=] [endKey=] ["cost time"=14.811660345s] ["completed regions"=63090]
[2023/01/09 13:45:07.253 +08:00] [INFO] [gc_worker.go:1095] ["[gc worker] finish resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630558572740623] [regions=63090]
[2023/01/09 13:45:45.803 +08:00] [INFO] [domain.go:2298] ["refreshServerIDTTL succeed"] [serverID=2511748] ["lease id"=7065857a8c81f7fb]
[2023/01/09 13:45:52.439 +08:00] [INFO] [gc_worker.go:454] ["[gc worker] gc safepoint blocked by a running session"] [uuid=615ea52f0dc0001] [globalMinStartTS=438626847504990215] [globalMinStartAllowedTS=438626847504990214] [safePoint=438630495658180608]
[2023/01/09 13:45:52.442 +08:00] [INFO] [gc_worker.go:601] ["[gc worker] last safe point is later than current one.No need to gc.This might be caused by manually enlarging gc lifetime"] ["leaderTick on"=615ea52f0dc0001] ["last safe point"=2023/01/09 09:43:55.807 +08:00] ["current safe point"=2023/01/09 09:43:55.807 +08:00]
[2023/01/09 13:45:52.442 +08:00] [INFO] [gc_worker.go:1073] ["[gc worker] start resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630574301380623] [concurrency=3]
[2023/01/09 13:45:52.442 +08:00] [INFO] [range_task.go:137] ["range task started"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=3]
[2023/01/09 13:45:59.207 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=11233.248725059s] [conn_id=5523392264086313751] [user=root] [database=jinka_data] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="ALTER TABLE `jinka_data`.`dy_video` \nADD INDEX `idx_update_time`(`update_time`) USING BTREE"]
[2023/01/09 13:46:07.161 +08:00] [INFO] [range_task.go:243] ["range task finished"] [name=resolve-locks-runner] [startKey=] [endKey=] ["cost time"=14.718393804s] ["completed regions"=63086]
[2023/01/09 13:46:07.161 +08:00] [INFO] [gc_worker.go:1095] ["[gc worker] finish resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630574301380623] [regions=63086]
[2023/01/09 13:46:52.440 +08:00] [INFO] [gc_worker.go:454] ["[gc worker] gc safepoint blocked by a running session"] [uuid=615ea52f0dc0001] [globalMinStartTS=438626847504990215] [globalMinStartAllowedTS=438626847504990214] [safePoint=438630511386820608]
[2023/01/09 13:46:52.444 +08:00] [INFO] [gc_worker.go:601] ["[gc worker] last safe point is later than current one.No need to gc.This might be caused by manually enlarging gc lifetime"] ["leaderTick on"=615ea52f0dc0001] ["last safe point"=2023/01/09 09:43:55.807 +08:00] ["current safe point"=2023/01/09 09:43:55.807 +08:00]
[2023/01/09 13:46:52.444 +08:00] [INFO] [gc_worker.go:1073] ["[gc worker] start resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630590030020623] [concurrency=3]
[2023/01/09 13:46:52.444 +08:00] [INFO] [range_task.go:137] ["range task started"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=3]
[2023/01/09 13:46:59.307 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=11293.348475744s] [conn_id=5523392264086313751] [user=root] [database=jinka_data] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="ALTER TABLE `x`.`x` \nADD INDEX `idx_update_time`(`update_time`) USING BTREE"]

排查一下 TiDB log 中是否有关键词日志报错“[ddl] wait latest schema version change(get the metadata lock if tidb_enable_lock is true)”
参考文档: https://docs.pingcap.com/zh/tidb/dev/metadata-lock

1 Like

没有 查了当时的日志 就是在一直重复的执行加索引的语句 就没有然后了。

没明白您的意思,可以是说手动多次重试同一个 DDL 语句导致的吗? 现在恢复了吗 ?

重启实例恢复的。
就是正常添加索引 没别的操作了。

这个问题得排查一下 DDL owner 的 TiDB server 节点和 DDL owner 的工作状态有啥明显报错。从目前提供的日志来看,定位不了原因。可以再观察一下,后面如果卡主的情况,可以抓一下 groutine 协助排查一下,谢谢 ~