ADD INDEX 使用索引加速功能卡住

【 TiDB 使用环境】测试
【 TiDB 版本】7.1.5
【复现路径】暂时没有复现
【遇到的问题:业务执行加索引操作卡住,表很小(10w)】
【资源配置】


【附件】
业务添加唯一索引,表大小只有 10w,ddl 卡住,执行了几个小时仍然没有完成
在关闭 tidb_ddl_enable_fast_reorg 索引加速功能后,正常执行,卡住时日志如下

报了大量如下的日志,持续数小时

[2024/12/10 18:02:37.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:37.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:38.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:38.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:39.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:39.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:40.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:40.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:41.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:41.916 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:42.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:42.765 +08:00] [INFO] [reorg.go:310] ["[ddl] run reorg job wait timeout"] ["wait time"=5s] ["total added row count"=1147]
[2024/12/10 18:02:42.768 +08:00] [INFO] [ddl_worker.go:1201] ["[ddl] schema version doesn't change"]
[2024/12/10 18:02:42.778 +08:00] [INFO] [ddl_worker.go:997] ["[ddl] run DDL job"] [worker="worker 2, tp add index"] [job="ID:777342, Type:add index, State:running, SchemaState:write reorganization, SchemaID:53, TableID:79, RowCount:1147, ArgLen:0, start time: 2024-12-10 18:02:14.762 +0800 HKT, Err:<nil>, ErrCount:0, SnapshotVersion:454511803776892948, UniqueWarnings:0"]
[2024/12/10 18:02:42.781 +08:00] [INFO] [index.go:852] ["[ddl] index backfill state running"] ["job ID"=777342] [table=usage] ["ingest mode"=true] [index=idx_usage_2]
[2024/12/10 18:02:42.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:43.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:43.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:44.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:44.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:45.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:45.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:46.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:46.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:47.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:47.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:48.160 +08:00] [INFO] [reorg.go:310] ["[ddl] run reorg job wait timeout"] ["wait time"=5s] ["total added row count"=1147]
[2024/12/10 18:02:48.162 +08:00] [INFO] [ddl_worker.go:1201] ["[ddl] schema version doesn't change"]
[2024/12/10 18:02:48.171 +08:00] [INFO] [ddl_worker.go:997] ["[ddl] run DDL job"] [worker="worker 2, tp add index"] [job="ID:777342, Type:add index, State:running, SchemaState:write reorganization, SchemaID:53, TableID:79, RowCount:1147, ArgLen:0, start time: 2024-12-10 18:02:14.762 +0800 HKT, Err:<nil>, ErrCount:0, SnapshotVersion:454511803776892948, UniqueWarnings:0"]
[2024/12/10 18:02:48.173 +08:00] [INFO] [index.go:852] ["[ddl] index backfill state running"] ["job ID"=777342] [table=usage] ["ingest mode"=true] [index=idx_usage_2]
[2024/12/10 18:02:48.416 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:48.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:49.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:49.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:50.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:50.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:51.426 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:51.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:52.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:52.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:53.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:53.631 +08:00] [INFO] [reorg.go:310] ["[ddl] run reorg job wait timeout"] ["wait time"=5s] ["total added row count"=1147]
[2024/12/10 18:02:53.633 +08:00] [INFO] [ddl_worker.go:1201] ["[ddl] schema version doesn't change"]
[2024/12/10 18:02:53.641 +08:00] [INFO] [ddl_worker.go:997] ["[ddl] run DDL job"] [worker="worker 2, tp add index"] [job="ID:777342, Type:add index, State:running, SchemaState:write reorganization, SchemaID:53, TableID:79, RowCount:1147, ArgLen:0, start time: 2024-12-10 18:02:14.762 +0800 HKT, Err:<nil>, ErrCount:0, SnapshotVersion:454511803776892948, UniqueWarnings:0"]
[2024/12/10 18:02:53.643 +08:00] [INFO] [index.go:852] ["[ddl] index backfill state running"] ["job ID"=777342] [table=usage] ["ingest mode"=true] [index=idx_usage_2]
[2024/12/10 18:02:53.917 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:54.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:54.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:55.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:55.916 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:56.418 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:56.918 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:57.417 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]
[2024/12/10 18:02:57.919 +08:00] [INFO] [dispatcher.go:231] ["check task, subtasks aren't finished"] ["task ID"=7] [cnt=3]

看了下其他大佬类似的问题后,检查了 /tmp 路径的权限,排除了权限不足

发下dll leader tidb节点 12/10 18:02 之后的日志看看,你的日志是2024/10/09的

目录属组不对,是用tidb用户部署的吗,我看属组是dumbo,改成tidb再试试呢
chown -R tidb.tidb tmp_ddl-3306/

确认过不是 /tmp 权限的问题,我们 tidb 进程用的 dumbo 用户跑的

更新了下日志,现在的时间对的了

长事务卡住了???

执行的时候,是不是有大事物?

应该是有锁了。

1 个赞

admin show locks查查看

1 个赞

看下所有进程的状态

是不是和元数据锁有关。高版本默认打开了

是3306 还是4000 ?你安装tidb 把端口手动改成mysql默认端口了?

可以读下 专栏 - 天下武功唯快不破:TiDB 在线 DDL 性能提升 10 倍 | TiDB 社区

元数据锁?