DDL卡主

【TiDB 使用环境】/测试/
【TiDB 版本】
DDL卡主

执行
CREATE INDEX i_hse_check_record_item2 ON ehswork.hse_check_record_item (record_id, check_item_id, create_time);

mysql> admin show ddl\G
*************************** 1. row ***************************
SCHEMA_VER: 22288
OWNER_ID: ac05eb3d-aae8-4066-b8d2-c1f46cf78139
OWNER_ADDRESS: 10.60.169.123:4000
RUNNING_JOBS: ID:21320, Type:add index, State:running, SchemaState:write reorganization, SchemaID:1790, TableID:2218, RowCount:0, ArgLen:0, start time: 2025-05-19 15:47:25.287 +0800 CST, Err:, ErrCount:0, SnapshotVersion:458133561877987332, Version: v2, UniqueWarnings:0
SELF_ID: 50853b70-51f4-4fa0-8243-1fa890a58e90
QUERY: – 20250102
CREATE INDEX i_hse_check_record_item2 ON ehswork.hse_check_record_item (record_id, check_item_id, create_time)
1 row in set (0.02 sec)

mysql> show processlist;
±-----------±----------±--------------------±--------±--------±-------±-----------±-----------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
±-----------±----------±--------------------±--------±--------±-------±-----------±-----------------------------------------------------------------------------------------------------+
| 2093150116 | ehswork | 10.60.169.103:42246 | ehswork | Sleep | 2535 | autocommit | NULL |
| 2093150396 | epd | 10.60.178.128:58016 | EPD | Sleep | 400 | autocommit | NULL |
| 2093150402 | ehswork | 10.60.169.103:42486 | ehswork | Sleep | 371 | autocommit | NULL |
| 2093150390 | developer | 10.60.165.137:54736 | zhigate | Sleep | 414 | autocommit | NULL |
| 2093150300 | ssx | 10.60.148.120:55030 | ssx | Sleep | 1613 | autocommit | NULL |
| 2093150314 | ssx | 10.60.148.120:55180 | ssx | Sleep | 1353 | autocommit | NULL |
| 2093150290 | ssx | 10.60.148.120:54934 | ssx | Sleep | 2 | autocommit | NULL |
| 2093150352 | ssx | 10.60.148.120:55520 | ssx | Sleep | 781 | autocommit | NULL |
| 2093150388 | developer | 10.60.165.137:54734 | zhigate | Sleep | 414 | autocommit | NULL |
| 2093150442 | ehswork | 10.60.169.103:42518 | ehswork | Sleep | 14 | autocommit | NULL |
| 2093150410 | ehswork | 10.60.169.103:42494 | ehswork | Sleep | 294 | autocommit | NULL |
| 2093150364 | ssx | 10.60.148.120:55590 | ssx | Sleep | 672 | autocommit | NULL |
| 2093150432 | root | 10.60.143.92:48446 | NULL | Sleep | 17 | autocommit | NULL |
| 2093150438 | ssx | 10.60.148.120:55974 | ssx | Sleep | 19 | autocommit | NULL |
| 2093135554 | root | 10.60.86.249:56644 | ehswork | Query | 163190 | autocommit | – 20250102
CREATE INDEX i_hse_check_record_item2 ON ehswork.hse_check_record_item (record_id, chec |
| 2093150408 | ehswork | 10.60.169.103:42492 | ehswork | Sleep | 323 | autocommit | NULL |
| 2093149728 | root | 10.60.86.90:57103 | yswxapp | Sleep | 7337 | autocommit | NULL |
| 2093150434 | root | 10.60.143.92:48562 | NULL | Query | 0 | autocommit | show processlist |
| 2093150350 | ssx | 10.60.148.120:55502 | ssx | Sleep | 808 | autocommit | NULL |
| 2093150378 | ssx | 10.60.148.120:55718 | ssx | Sleep | 467 | autocommit | NULL |
| 2093150428 | ofo | 10.60.169.101:36006 | ofo | Sleep | 135 | autocommit | NULL |
| 2093150412 | ehswork | 10.60.169.103:42496 | ehswork | Sleep | 135 | autocommit | NULL |
| 2093150386 | developer | 10.60.165.137:54732 | rocboss | Sleep | 413 | autocommit | NULL |
| 2093150436 | ehswork | 10.60.169.103:42514 | ehswork | Sleep | 21 | autocommit | NULL |
| 2093150270 | ehswork | 10.60.169.103:42380 | ehswork | Sleep | 435 | autocommit | NULL |
| 2093150394 | ehswork | 10.60.169.103:42480 | ehswork | Sleep | 413 | autocommit | NULL |
| 2093150384 | developer | 10.60.165.137:54730 | rocboss | Sleep | 415 | autocommit | NULL |
| 2093143884 | root | 10.60.86.90:2605 | NULL | Query | 73556 | autocommit | alter table ehswork.sys_datadict MODIFY CODE VARCHAR(100) COMMENT ‘编号’ |
| 2093150362 | ssx | 10.60.148.120:55572 | ssx | Sleep | 707 | autocommit | NULL |
| 2093150222 | ehswork | 10.60.169.103:42336 | ehswork | Sleep | 2235 | autocommit | NULL |
| 2093150406 | ehswork | 10.60.169.103:42490 | ehswork | Sleep | 354 | autocommit | NULL |
| 2093150440 | ehswork | 10.60.169.103:42516 | ehswork | Sleep | 18 | autocommit | NULL |
| 2093149772 | root | 10.60.86.90:57158 | yswxapp | Sleep | 7507 | autocommit | NULL |
| 2093150422 | ssx | 10.60.148.120:55848 | ssx | Sleep | 231 | autocommit | NULL |
| 2093150252 | ehswork | 10.60.169.103:42362 | ehswork | Sleep | 2153 | autocommit | NULL |
| 2093150254 | ehswork | 10.60.169.103:42364 | ehswork | Sleep | 2153 | autocommit | NULL |
| 2093150382 | ehswork | 10.60.169.103:42476 | ehswork | Sleep | 1 | autocommit | NULL |
| 2093150416 | ehswork | 10.60.169.103:42500 | ehswork | Sleep | 257 | autocommit | NULL |
| 2093150244 | ehswork | 10.60.169.103:42354 | ehswork | Sleep | 2183 | autocommit | NULL |
| 2093135732 | root | 10.60.86.249:59262 | ehswork | Query | 161465 | autocommit | CREATE INDEX i_ext_task_history ON ehswork.ext_task_history (proc_id(191)) |
| 2093150392 | ehswork | 10.60.169.103:42478 | ehswork | Sleep | 413 | autocommit | NULL |
±-----------±----------±--------------------±--------±--------±-------±-----------±-----------------------------------------------------------------------------------------------------+
41 rows in set (0.01 sec)

mysql> admin show ddl jobs ;
±-------±--------±----------------------±--------------±---------------------±----------±---------±----------±---------------------------±---------------------------±---------------------------±---------±------------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS |
±-------±--------±----------------------±--------------±---------------------±----------±---------±----------±---------------------------±---------------------------±---------------------------±---------±------------+
| 21320 | ehswork | hse_check_record_item | add index | write reorganization | 1790 | 2218 | 0 | 2025-05-19 15:47:25.287000 | 2025-05-19 15:47:25.338000 | NULL | running | ingest, DXF |
| 21334 | ehswork | ext_task_history | add index | none | 1790 | 1931 | 0 | 2025-05-19 16:16:10.387000 | NULL | NULL | queueing | |
| 21589 | ehswork | sys_datadict | modify column | none | 1790 | 2777 | 0 | 2025-05-20 16:41:19.388000 | NULL | NULL | queueing | |
| 22258 | zhigate | wechat_used_log | create table | public | 110 | 22257 | 0 | 2025-05-21 11:17:46.587000 | 2025-05-21 11:17:46.587000 | 2025-05-21 11:17:46.687000 | synced | |
| 22256 | zhigate | wechat_used_log | drop table | none | 110 | 5042 | 0 | 2025-05-21 11:17:45.537000 | 2025-05-21 11:17:45.587000 | 2025-05-21 11:17:46.037000 | synced | |
| 22255 | zhigate | wechat_follow_log | create table | public | 110 | 22254 | 0 | 2025-05-21 11:17:44.937000 | 2025-05-21 11:17:44.988000 | 2025-05-21 11:17:45.037000 | synced | |
| 22253 | zhigate | wechat_follow_log | drop table | none | 110 | 5040 | 0 | 2025-05-21 11:17:44.437000 | 2025-05-21 11:17:44.437000 | 2025-05-21 11:17:44.637000 | synced | |
| 22252 | zhigate | sms_send_rec | create table | public | 110 | 22251 | 0 | 2025-05-21 11:17:43.887000 | 2025-05-21 11:17:43.937000 | 2025-05-21 11:17:43.987000 | synced | |
| 22250 | zhigate | sms_send_rec | drop table | none | 110 | 5038 | 0 | 2025-05-21 11:17:43.387000 | 2025-05-21 11:17:43.387000 | 2025-05-21 11:17:43.537000 | synced | |
| 22249 | zhigate | roc_user_summary_copy | create table | public | 110 | 22248 | 0 | 2025-05-21 11:17:42.687000 | 2025-05-21 11:17:42.687000 | 2025-05-21 11:17:42.787000 | synced | |
| 22247 | zhigate | roc_user_summary_copy | drop table | none | 110 | 5036 | 0 | 2025-05-21 11:17:42.187000 | 2025-05-21 11:17:42.187000 | 2025-05-21 11:17:42.338000 | synced | |
| 22246 | zhigate | roc_user_summary | create table | public | 110 | 22245 | 0 | 2025-05-21 11:17:41.437000 | 2025-05-21 11:17:41.437000 | 2025-05-21 11:17:41.493000 | synced | |
| 22244 | zhigate | roc_user_summary | drop table | none | 110 | 5034 | 0 | 2025-05-21 11:17:40.838000 | 2025-05-21 11:17:40.887000 | 2025-05-21 11:17:41.187000 | synced | |
±-------±--------±----------------------±--------------±---------------------±----------±---------±----------±---------------------------±---------------------------±---------------------------±---------±------------+
13 rows in set (0.12 sec)

使用tiup check的方式检查

[tidb@test-tidb-169121 ~]$ tiup cluster check tidbtest --cluster --enable-disk

  • Download necessary tools
    • Downloading check tools for linux/amd64 … Done
  • Collect basic system information
  • Collect basic system information
  • Collect basic system information
  • Collect basic system information
  • Collect basic system information
  • Collect basic system information
  • Collect basic system information
  • Collect basic system information
    • Getting system info of 10.60.169.121:22 … Done
    • Getting system info of 10.60.169.122:22 … Done
    • Getting system info of 10.60.169.123:22 … Done
    • Getting system info of 10.60.169.124:22 … Done
    • Getting system info of 10.60.169.125:22 … Done
    • Getting system info of 10.60.169.127:22 … Done
    • Getting system info of 10.60.169.126:22 … Done
  • Check time zone
    • Checking node 10.60.169.125 … Done
    • Checking node 10.60.169.127 … Done
    • Checking node 10.60.169.126 … Done
    • Checking node 10.60.169.121 … Done
    • Checking node 10.60.169.122 … Done
    • Checking node 10.60.169.123 … Done
    • Checking node 10.60.169.124 … Done
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
  • Check system requirements
    • Checking node 10.60.169.121 … Done
    • Checking node 10.60.169.122 … Done
    • Checking node 10.60.169.123 … Done
    • Checking node 10.60.169.124 … Done
    • Checking node 10.60.169.125 … Done
    • Checking node 10.60.169.127 … Done
    • Checking node 10.60.169.121 … Done
    • Checking node 10.60.169.122 … Done
    • Checking node 10.60.169.123 … Done
    • Checking node 10.60.169.126 … Done
    • Checking node 10.60.169.126 … Done
    • Checking node 10.60.169.126 … Done
    • Checking node 10.60.169.126 … Done
    • Checking node 10.60.169.125 … Done
    • Checking node 10.60.169.127 … Done
    • Checking node 10.60.169.126 … Done
    • Checking node 10.60.169.121 … Done
    • Checking node 10.60.169.122 … Done
    • Checking node 10.60.169.123 … Done
    • Checking node 10.60.169.124 … Done
  • Cleanup check files
    • Cleanup check files on 10.60.169.121:22 … Done
    • Cleanup check files on 10.60.169.122:22 … Done
    • Cleanup check files on 10.60.169.123:22 … Done
    • Cleanup check files on 10.60.169.124:22 … Done
    • Cleanup check files on 10.60.169.125:22 … Done
    • Cleanup check files on 10.60.169.127:22 … Done
    • Cleanup check files on 10.60.169.126:22 … Done
      Node Check Result Message

10.60.169.123 memory Pass memory size is 16384MB
10.60.169.123 cpu-cores Pass number of CPU cores / threads: 4
10.60.169.123 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.123 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.123 selinux Pass SELinux is disabled
10.60.169.123 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.60.169.123 permission Pass /tidb-deploy/pd-2379 is writable
10.60.169.123 permission Pass /tidb-deploy/tidb-4000 is writable
10.60.169.123 permission Pass /tidb-data/pd-2379 is writable
10.60.169.123 fio Pass IOPS of random read: 2575.999371
10.60.169.123 fio Pass IOPS of random read: 375.262404, write: 381.293526
10.60.169.123 fio Pass Latency of random read: 1073481.224649ns, write: 19642.118513ns
10.60.169.123 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.123 thp Pass THP is disabled
10.60.169.123 command Fail numactl not usable, bash: numactl: command not found
10.60.169.124 memory Pass memory size is 16384MB
10.60.169.124 thp Pass THP is disabled
10.60.169.124 permission Pass /tidb-deploy/tikv-20160 is writable
10.60.169.124 permission Pass /tidb-data/tikv-20160 is writable
10.60.169.124 cpu-cores Pass number of CPU cores / threads: 4
10.60.169.124 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.124 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.124 selinux Pass SELinux is disabled
10.60.169.124 command Fail numactl not usable, bash: numactl: command not found
10.60.169.124 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.60.169.124 fio Pass IOPS of random read: 2581.478710
10.60.169.124 fio Pass IOPS of random read: 380.037332, write: 386.503783
10.60.169.124 fio Pass Latency of random read: 1118542.401830ns, write: 26567.640921ns
10.60.169.124 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.125 command Fail numactl not usable, bash: numactl: command not found
10.60.169.125 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.60.169.125 fio Pass IOPS of random read: 2594.303585
10.60.169.125 fio Pass IOPS of random read: 374.845849, write: 380.911909
10.60.169.125 fio Pass Latency of random read: 1190314.405674ns, write: 26007.023202ns
10.60.169.125 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.125 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.125 memory Pass memory size is 16384MB
10.60.169.125 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.125 thp Pass THP is disabled
10.60.169.125 permission Pass /tidb-deploy/tikv-20160 is writable
10.60.169.125 permission Pass /tidb-data/tikv-20160 is writable
10.60.169.125 cpu-cores Pass number of CPU cores / threads: 4
10.60.169.125 selinux Pass SELinux is disabled
10.60.169.127 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.127 selinux Pass SELinux is disabled
10.60.169.127 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.127 swap Warn swap is enabled, please disable it for best performance
10.60.169.127 command Fail numactl not usable, bash: numactl: command not found
10.60.169.127 fio Pass IOPS of random read: 2212.969998
10.60.169.127 fio Pass IOPS of random read: 224.185054, write: 231.717885
10.60.169.127 fio Pass Latency of random read: 930933.335143ns, write: 23068.167969ns
10.60.169.127 permission Pass /tidb-data/tikv-20160 is writable
10.60.169.127 permission Pass /tidb-deploy/tikv-20160 is writable
10.60.169.127 cpu-cores Pass number of CPU cores / threads: 4
10.60.169.127 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.60.169.127 thp Fail THP is enabled, please disable it for best performance
10.60.169.127 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.60.169.127 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.127 memory Pass memory size is 8192MB
10.60.169.126 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.60.169.126 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.126 selinux Pass SELinux is disabled
10.60.169.126 fio Pass IOPS of random read: 269.212824
10.60.169.126 fio Pass Latency of random read: 3621764.810592ns, write: 30402.592274ns
10.60.169.126 fio Pass IOPS of random read: 266.084551
10.60.169.126 fio Pass IOPS of random read: 115.712119, write: 120.594558
10.60.169.126 fio Pass Latency of random read: 2393698.995448ns, write: 26611.461255ns
10.60.169.126 fio Pass IOPS of random read: 284.026331
10.60.169.126 fio Pass IOPS of random read: 111.775882, write: 117.092194
10.60.169.126 fio Pass IOPS of random read: 101.844360, write: 106.500669
10.60.169.126 fio Pass Latency of random read: 3018657.388580ns, write: 28131.433290ns
10.60.169.126 permission Pass /tidb-deploy/prometheus-9090 is writable
10.60.169.126 permission Pass /tidb-deploy/grafana-3000 is writable
10.60.169.126 permission Pass /tidb-deploy/alertmanager-9093 is writable
10.60.169.126 permission Pass /tidb-data/prometheus-9090 is writable
10.60.169.126 permission Pass /tidb-data/tiflash-9000 is writable
10.60.169.126 permission Pass /tidb-data/alertmanager-9093 is writable
10.60.169.126 permission Pass /tidb-deploy/tiflash-9000 is writable
10.60.169.126 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.126 cpu-cores Pass number of CPU cores / threads: 16
10.60.169.126 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.126 memory Pass memory size is 32768MB
10.60.169.126 thp Pass THP is disabled
10.60.169.126 command Fail numactl not usable, bash: numactl: command not found
10.60.169.121 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.121 fio Pass IOPS of random read: 2501.755993
10.60.169.121 fio Pass IOPS of random read: 371.196214, write: 377.328934
10.60.169.121 fio Pass Latency of random read: 1095013.974009ns, write: 30556.998962ns
10.60.169.121 memory Pass memory size is 8192MB
10.60.169.121 cpu-cores Pass number of CPU cores / threads: 4
10.60.169.121 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.121 selinux Pass SELinux is disabled
10.60.169.121 thp Pass THP is disabled
10.60.169.121 command Fail numactl not usable, bash: numactl: command not found
10.60.169.121 permission Pass /tidb-data/pd-2379 is writable
10.60.169.121 permission Pass /tidb-deploy/pd-2379 is writable
10.60.169.121 permission Pass /tidb-deploy/tidb-4000 is writable
10.60.169.121 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.122 permission Pass /tidb-deploy/tidb-4000 is writable
10.60.169.122 permission Pass /tidb-data/pd-2379 is writable
10.60.169.122 permission Pass /tidb-deploy/pd-2379 is writable
10.60.169.122 fio Pass Latency of random read: 1287845.294875ns, write: 40337.087740ns
10.60.169.122 fio Pass IOPS of random read: 2199.416049
10.60.169.122 fio Pass IOPS of random read: 363.207626, write: 368.673655
10.60.169.122 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.60.169.122 disk Warn mount point / does not have ‘noatime’ option set
10.60.169.122 selinux Pass SELinux is disabled
10.60.169.122 thp Pass THP is disabled
10.60.169.122 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.60.169.122 os-version Pass OS is Kylin Linux Advanced Server V10 (Tercel)
10.60.169.122 cpu-cores Pass number of CPU cores / threads: 2
10.60.169.122 memory Pass memory size is 4096MB
10.60.169.122 command Fail numactl not usable, bash: numactl: command not found
Checking region status of the cluster tidbtest…
All regions are healthy.

请问为什么有一个建立索引一直卡主。导致后面的建立索引和增加字段都无法进行?

检查下 磁盘的 IO 情况,可以通过 grafana 查阅指标

或者 登录到 tidb 节点上,直接查

IO情况已经贴了。2M/S

1 个赞

感觉不一定是前一个 ddl 卡住后续 ddl ,不同表间的 DDL 是可以一起执行的,先看下元数据锁情况,https://docs.pingcap.com/zh/tidb/stable/metadata-lock/#元数据锁的可观测性

1 个赞

常规解决办法是把所有tidb server组件重启下

这办法那里常规了?除了 v6.5 早期版本一个 bug 导致元数据锁残留外,没见过这样处理的


重启完,还在回退。实在不知道怎么办。

等会我试试索引只用一个字段行不行?

这是表结构。

CREATE INDEX i_hse_check_record_item2 ON ehswork.hse_check_record_item (record_id, check_item_id, create_time);

等下去掉几个列。一个个加

1 个赞

所有的tidb server都要重启,不能只重启一个

我在6版本和7.1版本上遇到多次了,除了这么处理确实也没啥好办法。不过后面我们把很多ddl都删除了没再遇到

1 个赞
  • rollback done:表示该操作执行失败,回滚完成。
  • rollingback:表示该操作执行失败,正在回滚。

https://docs.pingcap.com/zh/tidb/stable/sql-statement-admin-show-ddl/#admin-show-ddl-jobs

rollback done的意思是已经回滚完了。所以重启应该是有用的。

https://docs.pingcap.com/zh/tidb/stable/system-variables/#tidb_enable_metadata_lock-从-v630-版本开始引入

看看这个变量 tidb_enable_metadata_lock 是不是打开的。关闭掉元数据锁,再试试能不能加上索引。

另外还有一个有可能的点就是

image

我注意到这两个值设置的比较大。

thread对应系统变量 tidb_ddl_reorg_worker_cnt
batch_size对应系统变量 tidb_ddl_reorg_batch_size

有个专门的测试文档给出了这两个值比较大的情况下,碰到表有更新的情况下的tps和加索引时间。

https://docs.pingcap.com/zh/tidb/stable/online-workloads-and-add-index-operations/

这两个值,在有事务冲突的情况下,看上去并不是越大越好。

测试表是200w数据,有事务冲突的情况下,要900s才能建立好索引。

看一下 ddl owner 节点 10.60.169.123:4000 tidb 的日志,日志有时候有更详细的信息