TiDB 6.5.2 升级到7.1.0报错

那你备份下那个配置文件,重新用tiup修改下,保存。看看是否自动生成文件,再看看内容对不对。

tiup cluster edit-config <cluster-name>

如果不行的话,就用默认参数直接启动试试。

# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
#   tidb:
#     aa.b1.c3: value
#     aa.b2.c4: value

deploy_dir: deploy
data_dir: data

你参数文件里面的文件路径有几个地方是相对路径,你改成绝对路径吧。报错里面有提示找不到路径的。
env: bin/tidb-server: No such file or directory

已尝试以下办法,均未能正常启动:

1. 扩容一个新的 TiDB节点
2. 修改TiDB端口
3. 添加 instance.tidb_enable_ddl=false 配置,手动启动 (systemctl start tidb-3306.service)
4. 清空 tidb.toml 配置,手动启动

升级前有没有 admin show 检查 ddl 执行情况呢 https://docs.pingcap.com/zh/tidb/stable/upgrade-tidb-using-tiup#25-检查当前集群的-ddl-和-backup-情况 ,看上去是卡在获取 version=130240 这个 schema version 了,可以发一下升级前后的完整 tidb 日志

还真没仔细看,不知道是不是这个问题。
tidb.log.zip (10.6 MB)

有没有另一个 tidb 的日志?看起来另一个 tidb 在升级的时候可能有继续在做 DDL

这个是另一个tidb的日志
tidb.log.zip (19.3 MB)

还需要其他的信息吗?

这样,你能把环境恢复到刚升级到v7.1.0时期嘛?我看前面又是做替换bin又是做那的,效果都不行,恢复到刚升级到7.1.0的时候,然后 停掉集群,再启动一次,把日志发出来看看。

你发的日志信息不重要,我看日志文件里出现这个奇怪的信息,想看看刚启动 7.1.0 的时候报的什么错

这是两个tidb的启动日志
tidb.zip (12.0 KB)

今天有人帮忙看看这个问题吗??

因为你这边的一顿操作,上面提供的 tidb 日志和最开始的 还是有区别的。您可以提供下 最开始升级 2023/06/05 22:57:43 前后的日志,我们先看看吧。

1 个赞

这俩就是之前备份的日志
tidb.log.zip (10.6 MB)
tidb.log.zip (19.3 MB)

从日志中看,在升级集群时是有在做 DDL 操作,而升级重启导致 DDL 中断了。

[2023/06/05 10:38:39.850 +08:00] [INFO] [server.go:511] ["setting tidb-server to report unhealthy (shutting-down)"]
[2023/06/05 10:38:39.850 +08:00] [ERROR] [http_status.go:504] ["start status/rpc server error"] [error="accept tcp [::]:10080: use of closed network connection"]
[2023/06/05 10:38:39.850 +08:00] [ERROR] [http_status.go:499] ["http server error"] [error="http: Server closed"]
[2023/06/05 10:38:39.850 +08:00] [ERROR] [http_status.go:494] ["grpc server error"] [error="mux: server closed"]
[2023/06/05 10:38:39.869 +08:00] [INFO] [manager.go:247] ["failed to campaign"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"] [error="context canceled"]
[2023/06/05 10:38:39.869 +08:00] [INFO] [manager.go:219] ["etcd session is done, creates a new one"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"]
[2023/06/05 10:38:39.869 +08:00] [INFO] [manager.go:223] ["break campaign loop, NewSession failed"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"] [error="context canceled"]
[2023/06/05 10:38:39.872 +08:00] [INFO] [manager.go:272] ["revoke session"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"] []
[2023/06/05 10:38:39.872 +08:00] [INFO] [server.go:864] ["[server] graceful shutdown."]
[2023/06/05 10:38:39.874 +08:00] [WARN] [manager.go:261] ["is not the owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"]
[2023/06/05 10:38:39.874 +08:00] [INFO] [manager.go:219] ["etcd session is done, creates a new one"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"]
[2023/06/05 10:38:39.874 +08:00] [INFO] [manager.go:223] ["break campaign loop, NewSession failed"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"] [error="context canceled"]
[2023/06/05 10:38:39.881 +08:00] [INFO] [manager.go:272] ["revoke session"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"] []
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_workerpool.go:82] ["[ddl] closing workerPool"]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 6, tp add index"] ["take time"=3.309µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 7, tp add index"] ["take time"=2.138µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 8, tp add index"] ["take time"=1.497µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 9, tp add index"] ["take time"=1.908µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 10, tp add index"] ["take time"=1.419µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 11, tp add index"] ["take time"=1.458µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 4, tp add index"] ["take time"=1.195µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 5, tp add index"] ["take time"=1.894µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_workerpool.go:82] ["[ddl] closing workerPool"]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 3, tp general"] ["take time"=1.541µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 1, tp general"] ["take time"=312ns]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 2, tp add index"] ["take time"=224ns]
[2023/06/05 10:38:39.883 +08:00] [INFO] [delete_range.go:148] ["[ddl] closing delRange"]
[2023/06/05 10:38:39.883 +08:00] [INFO] [session_pool.go:94] ["[ddl] closing sessionPool"]
[2023/06/05 10:38:39.884 +08:00] [INFO] [ddl.go:815] ["[ddl] DDL closed"] [ID=dc79478b-608a-42ad-ac72-d41e98687f9e] ["take time"=9.936703ms]
[2023/06/05 10:38:39.884 +08:00] [INFO] [ddl.go:639] ["[ddl] stop DDL"] [ID=dc79478b-608a-42ad-ac72-d41e98687f9e]
[2023/06/05 10:38:44.037 +08:00] [INFO] [ddl.go:752] ["[ddl] start DDL"] [ID=896e8af7-d24d-4015-8fd5-e41bea5aaa4b] [runWorker=true]
[2023/06/05 10:38:44.037 +08:00] [INFO] [ddl.go:715] ["[ddl] start delRangeManager OK"] ["is a emulator"=false]
[2023/06/05 10:38:44.041 +08:00] [INFO] [manager.go:178] ["start campaign owner"] [ownerInfo="[ddl] /tidb/ddl/fg/owner"]
[2023/06/05 10:38:44.042 +08:00] [INFO] [job_table.go:324] ["[ddl] get global state and global state change"] [oldState=false] [currState=false]
[2023/06/05 10:38:44.049 +08:00] [INFO] [env.go:90] ["[ddl-ingest] the ingest sorted directory"] ["data path:"=/tmp/tidb/tmp_ddl-3306]
[2023/06/05 10:38:44.050 +08:00] [WARN] [backend_mgr.go:59] ["[ddl-ingest] ingest backfill may not be available"] [error="the available disk space(24019136512) in /tmp/tidb/tmp_ddl-3306 should be greater than @@tidb_ddl_disk_quota(107374182400)"]
[2023/06/05 10:38:44.050 +08:00] [INFO] [env.go:68] ["[ddl-ingest] init global ingest backend environment finished"] ["memory limitation"=2147483648] ["disk usage info"="disk usage: 56484765696/80503902208, backend usage: 0"] ["max open file number"=1000000] ["lightning is initialized"=true]

不过官方文档中明确提到了不建议在升级的同时做 ddl 操作的。

现在要怎么做呢?

在升级前几天,有个DDL,是加索引的,本来库里也就几千条数据,结果执行了好几分钟都没结束,于是就去kill 连接,结果也kill 不掉,所以才去尝试升级。

DDL 是需要 admin cancel 的。。。

你这问题我也没遇到过。。。下次升级还是注意 check 一下吧。

新扩容 tidb 也是报相同的错误是吧?

现在已经连不上tidb了,怎么执行呢?

是的,新扩容的TiDB也无法启动