TiDB从5.1.2升级到6.5.3之后,执行DDL会被卡住

【 TiDB 使用环境】生产环境
【 TiDB 版本】6.5.3
【遇到的问题:问题现象及影响】
在升级集群后,执行DDL会被卡住处于queueing状态,使用admin cancel ddl jobs后,一直处于cancelling的状态。

期间的日志为:

[2023/07/02 10:02:58.510 +08:00] [INFO] [session.go:3730] ["CRUCIAL OPERATION"] [conn=5514371870693236863] [schemaVersion=1038] [cur_db=] [sql="RENAME TABLE `db_1`.`share_compare` TO `db_1`.`share_compare_bak_20230702`"] [user=core-dm-syncer@%]
[2023/07/02 10:02:58.511 +08:00] [INFO] [txn.go:55] ["Try to create a new txn inside a transaction auto commit"] [conn=5514371870693236863] [schemaVersion=1038] [txnStartTS=442568115084591117] [txnScope=global]
[2023/07/02 10:02:58.511 +08:00] [INFO] [tidb.go:270] ["rollbackTxn called due to ddl/autocommit failure"]
[2023/07/02 10:02:58.511 +08:00] [WARN] [session.go:2242] ["run statement failed"] [conn=5514371870693236863] [schemaVersion=1038] [error="[schema:1050]Table 'db_1.share_compare_bak_20230702' already exists"] [session="{\n  \"currDBName\": \"\",\n  \"id\": 5514371870693236863,\n  \"status\": 2,\n  \"strictMode\": false,\n  \"user\": {\n    \"Username\": \"core-dm-syncer\",\n    \"Hostname\": \"172.18.xxx.xxx\",\n    \"CurrentUser\": false,\n    \"AuthUsername\": \"core-dm-syncer\",\n    \"AuthHostname\": \"%\"\n  }\n}"]
[2023/07/02 10:02:58.511 +08:00] [INFO] [conn.go:1181] ["command dispatched failed"] [conn=5514371870693236863] [connInfo="id:5514371870693236863, addr:172.18.244.12:37298 status:10, collation:utf8mb4_general_ci, user:core-dm-syncer"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="RENAME TABLE `db_1`.`share_compare` TO `db_1`.`share_compare_bak_20230702`"] [txn_mode=OPTIMISTIC] [timestamp=442568115084591124] [err="[schema:1050]Table 'db_1.share_compare_bak_20230702' already exists\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/normalize.go:164\ngithub.com/pingcap/tidb/ddl.ExtractTblInfos\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:6059\ngithub.com/pingcap/tidb/ddl.(*ddl).renameTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:5962\ngithub.com/pingcap/tidb/ddl.(*ddl).RenameTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:5944\ngithub.com/pingcap/tidb/executor.(*DDLExec).executeRenameTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/ddl.go:243\ngithub.com/pingcap/tidb/executor.(*DDLExec).Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/ddl.go:184\ngithub.com/pingcap/tidb/executor.Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:328\ngithub.com/pingcap/tidb/executor.(*ExecStmt).next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1154\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:927\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:776\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:571\ngithub.com/pingcap/tidb/session.runStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2373\ngithub.com/pingcap/tidb/session.(*session).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2237\ngithub.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:252\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2122\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1972\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1403\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1152\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:648\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"]
[2023/07/02 10:02:58.517 +08:00] [WARN] [sysvar_cache.go:84] ["could not find key in global cache"] [name=timestamp]
[2023/07/02 10:02:58.517 +08:00] [INFO] [session.go:1489] ["sysvar not in cache yet. sysvar cache may be stale"] [name=timestamp]
[2023/07/02 10:03:16.539 +08:00] [INFO] [session.go:3730] ["CRUCIAL OPERATION"] [conn=5514371870693236863] [schemaVersion=1038] [cur_db=] [sql="CREATE TABLE IF NOT EXISTS `db_1`.`share_compare` LIKE `db_1`.`share_compare_bak_20230702`"] [user=core-dm-syncer@%]
[2023/07/02 10:03:16.539 +08:00] [INFO] [txn.go:55] ["Try to create a new txn inside a transaction auto commit"] [conn=5514371870693236863] [schemaVersion=1038] [txnStartTS=442568119816552460] [txnScope=global]
[2023/07/02 10:03:16.557 +08:00] [INFO] [ddl_worker.go:314] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:1259, Type:create table, State:queueing, SchemaState:none, SchemaID:54, TableID:1258, RowCount:0, ArgLen:2, start time: 2023-07-02 10:03:16.517 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "] [table=true]
[2023/07/02 10:03:16.559 +08:00] [INFO] [ddl.go:1011] ["[ddl] start DDL job"] [job="ID:1259, Type:create table, State:queueing, SchemaState:none, SchemaID:54, TableID:1258, RowCount:0, ArgLen:2, start time: 2023-07-02 10:03:16.517 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="CREATE TABLE IF NOT EXISTS `db_1`.`share_compare` LIKE `db_1`.`share_compare_bak_20230702`"]
[2023/07/02 10:03:46.261 +08:00] [INFO] [domain.go:2318] ["refreshServerIDTTL succeed"] [serverID=2507646] ["lease id"=57678907b3df78e1]
[2023/07/02 10:04:02.490 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=40.292596ms]
[2023/07/02 10:04:02.490 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=40.391746ms]
[2023/07/02 10:04:16.629 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=60.090551825s] [conn_id=5514371870693236863] [user=core-dm-syncer] [txn_start_ts=442568119816552460] [mem_max="0 Bytes (0 Bytes)"] [sql="CREATE TABLE IF NOT EXISTS `db_1`.`share_compare` LIKE `db_1`.`share_compare_bak_20230702`"]

补充一些截图:
DDL取消之前:


DDL取消后:

需要重启tidb 所有前端 是ddl无法选举的原因

另一个原因是执行时间大于max_exection_time
你看看是不是你设置了最大执行时间

1 个赞

确实是将最大执行时间改成了10s,是超过max_exection_time后,没办法cancel么?

对的 你把这个设置取消掉

把这个设置取消掉 执行完任务ddl任务再改回10秒

老版本的时候没设置max_exection_time吗

https://docs.pingcap.com/zh/tidb/stable/system-variables#max_execution_time

max_execution_time 目前只用于控制只读语句的最大执行时长

max_exection_time 看文档是只对只读语句有效,没说和ddl有关

问题排查到了,是因为在升级TiDB版本时候,有一台tidb-server没有reload导致的,刚才重新reload了一下DDL就正常了 :upside_down_face:

是忘记了 少reload一台还是执行reload有个没成功?

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。