DM同步了一个小表的DDL到tidb-server,tidb-server一直被卡着,使用ADMIN CANCEL DDL JOBS
取消DDL也一直失败,最终等待了近四个小时,才CANCEL成功。期间有大量的[2023/05/24 19:09:28.673 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip {tidb-server-ip} port 4000, id 831b35bd-9d31-4ec4-a005-08ac7c6ead64"] ["ddl id"=47401] [ver=33590]
报错。
过滤后的日志为:
[2023/05/24 19:09:28.569 +08:00] [INFO] [ddl_worker.go:314] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:47401, Type:add column, State:queueing, SchemaState:none, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:4, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "] [table=true]
[2023/05/24 19:09:28.569 +08:00] [INFO] [ddl.go:1011] ["[ddl] start DDL job"] [job="ID:47401, Type:add column, State:queueing, SchemaState:none, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:4, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="ALTER TABLE `db1`.`tb1` ADD COLUMN `col1` VARCHAR(30) NULL COMMENT '收益止盈类型, FIXED:固定值, RANGE:范围值'"]
[2023/05/24 19:09:28.589 +08:00] [INFO] [ddl_worker.go:1168] ["[ddl] run DDL job"] [worker="worker 3, tp general"] [job="ID:47401, Type:add column, State:queueing, SchemaState:none, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:0, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2023/05/24 19:09:28.592 +08:00] [INFO] [column.go:133] ["[ddl] run add column job"] [job="ID:47401, Type:add column, State:running, SchemaState:none, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:4, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [columnInfo="{\"id\":45,\"name\":{\"O\":\"col1\",\"L\":\"col1\"},\"offset\":44,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{},\"state\":0,\"comment\":\"收益止盈类型, FIXED:固定值, RANGE:范围值\",\"hidden\":false,\"change_state_info\":null,\"version\":2}"]
[2023/05/24 19:09:28.634 +08:00] [INFO] [domain.go:726] ["mdl gets lock, update to owner"] [jobID=47401] [version=33590]
[2023/05/24 19:09:28.673 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.1.221, port 4000, id 65867468-6375-4732-b2e7-1501df354479"] ["ddl id"=47401] [ver=33590]
[2023/05/24 19:09:28.673 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 192.168.1.245, port 4000, id 831b35bd-9d31-4ec4-a005-08ac7c6ead64"] ["ddl id"=47401] [ver=33590]
# 中间过滤了很多日志,基本上都是`syncer check all versions, someone is not synced`的报错
[2023/05/25 01:14:22.975 +08:00] [INFO] [ddl_worker.go:1377] ["[ddl] wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [ver=33590] ["take time"=6h4m54.355850116s] [job="ID:47401, Type:add column, State:running, SchemaState:delete only, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:4, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2023/05/25 01:14:22.989 +08:00] [INFO] [ddl_worker.go:1168] ["[ddl] run DDL job"] [worker="worker 3, tp general"] [job="ID:47401, Type:add column, State:cancelling, SchemaState:delete only, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:0, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2023/05/25 01:14:22.992 +08:00] [INFO] [rollingback.go:433] ["[ddl] the DDL job is cancelled normally"] [worker="worker 3, tp general"] [job="ID:47401, Type:add column, State:rollingback, SchemaState:delete only, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:1, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:[ddl:8214]Cancelled DDL job, ErrCount:1, SnapshotVersion:0"] [error="[ddl:8214]Cancelled DDL job"]
[2023/05/25 01:14:23.004 +08:00] [INFO] [ddl_worker.go:1168] ["[ddl] run DDL job"] [worker="worker 3, tp general"] [job="ID:47401, Type:add column, State:rollingback, SchemaState:delete only, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:0, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:[ddl:8214]Cancelled DDL job, ErrCount:1, SnapshotVersion:0"]
[2023/05/25 01:14:23.084 +08:00] [INFO] [domain.go:726] ["mdl gets lock, update to owner"] [jobID=47401] [version=33626]
[2023/05/25 01:15:21.269 +08:00] [INFO] [ddl_worker.go:1377] ["[ddl] wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [ver=33626] ["take time"=58.239228943s] [job="ID:47401, Type:add column, State:rollingback, SchemaState:delete reorganization, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:1, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:[ddl:8214]Cancelled DDL job, ErrCount:1, SnapshotVersion:0"]
[2023/05/25 01:15:21.280 +08:00] [INFO] [ddl_worker.go:1168] ["[ddl] run DDL job"] [worker="worker 3, tp general"] [job="ID:47401, Type:add column, State:rollingback, SchemaState:delete reorganization, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:0, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:[ddl:8214]Cancelled DDL job, ErrCount:1, SnapshotVersion:0"]
[2023/05/25 01:15:21.334 +08:00] [INFO] [domain.go:726] ["mdl gets lock, update to owner"] [jobID=47401] [version=33627]
[2023/05/25 01:15:21.379 +08:00] [INFO] [ddl_worker.go:1377] ["[ddl] wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [ver=33627] ["take time"=74.139844ms] [job="ID:47401, Type:add column, State:rollback done, SchemaState:none, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:1, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:[ddl:8214]Cancelled DDL job, ErrCount:1, SnapshotVersion:0"]
[2023/05/25 01:15:21.393 +08:00] [INFO] [ddl_worker.go:643] ["[ddl] finish DDL job"] [worker="worker 3, tp general"] [job="ID:47401, Type:add column, State:rollback done, SchemaState:none, SchemaID:5747, TableID:7466, RowCount:0, ArgLen:0, start time: 2023-05-24 19:09:28.503 +0800 CST, Err:[ddl:8214]Cancelled DDL job, ErrCount:1, SnapshotVersion:0"]
[2023/05/25 01:15:21.440 +08:00] [INFO] [ddl.go:1114] ["[ddl] DDL job is failed"] [jobID=47401]