我是咖啡哥
2023 年6 月 6 日 01:52
21
那你备份下那个配置文件,重新用tiup修改下,保存。看看是否自动生成文件,再看看内容对不对。
tiup cluster edit-config <cluster-name>
如果不行的话,就用默认参数直接启动试试。
# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
# tidb:
# aa.b1.c3: value
# aa.b2.c4: value
我是咖啡哥
2023 年6 月 6 日 01:55
22
deploy_dir: deploy
data_dir: data
你参数文件里面的文件路径有几个地方是相对路径,你改成绝对路径吧。报错里面有提示找不到路径的。
env: bin/tidb-server: No such file or directory
Tao
(Tao)
2023 年6 月 6 日 06:02
23
已尝试以下办法,均未能正常启动:
1. 扩容一个新的 TiDB节点
2. 修改TiDB端口
3. 添加 instance.tidb_enable_ddl=false 配置,手动启动 (systemctl start tidb-3306.service)
4. 清空 tidb.toml 配置,手动启动
qizheng
(qizheng)
2023 年6 月 6 日 07:50
24
升级前有没有 admin show 检查 ddl 执行情况呢 https://docs.pingcap.com/zh/tidb/stable/upgrade-tidb-using-tiup#25-检查当前集群的-ddl-和-backup-情况 ,看上去是卡在获取 version=130240 这个 schema version 了,可以发一下升级前后的完整 tidb 日志
Tao
(Tao)
2023 年6 月 6 日 08:04
25
还真没仔细看,不知道是不是这个问题。
tidb.log.zip (10.6 MB)
有没有另一个 tidb 的日志?看起来另一个 tidb 在升级的时候可能有继续在做 DDL
Tao
(Tao)
2023 年6 月 6 日 09:45
27
这个是另一个tidb的日志
tidb.log.zip (19.3 MB)
CuteRay
(Cherry🍒)
2023 年6 月 7 日 06:30
30
这样,你能把环境恢复到刚升级到v7.1.0时期嘛?我看前面又是做替换bin又是做那的,效果都不行,恢复到刚升级到7.1.0的时候,然后 停掉集群,再启动一次,把日志发出来看看。
你发的日志信息不重要,我看日志文件里出现这个奇怪的信息,想看看刚启动 7.1.0 的时候报的什么错
Tao
(Tao)
2023 年6 月 7 日 06:39
31
这是两个tidb的启动日志
tidb.zip (12.0 KB)
小王同学Plus
(小王同学 Plus)
2023 年6 月 9 日 02:43
33
因为你这边的一顿操作,上面提供的 tidb 日志和最开始的 还是有区别的。您可以提供下 最开始升级 2023/06/05 22:57:43 前后的日志,我们先看看吧。
1 个赞
Tao
(Tao)
2023 年6 月 9 日 02:45
34
这俩就是之前备份的日志
tidb.log.zip (10.6 MB)
tidb.log.zip (19.3 MB)
小王同学Plus
(小王同学 Plus)
2023 年6 月 9 日 06:46
35
从日志中看,在升级集群时是有在做 DDL 操作,而升级重启导致 DDL 中断了。
[2023/06/05 10:38:39.850 +08:00] [INFO] [server.go:511] ["setting tidb-server to report unhealthy (shutting-down)"]
[2023/06/05 10:38:39.850 +08:00] [ERROR] [http_status.go:504] ["start status/rpc server error"] [error="accept tcp [::]:10080: use of closed network connection"]
[2023/06/05 10:38:39.850 +08:00] [ERROR] [http_status.go:499] ["http server error"] [error="http: Server closed"]
[2023/06/05 10:38:39.850 +08:00] [ERROR] [http_status.go:494] ["grpc server error"] [error="mux: server closed"]
[2023/06/05 10:38:39.869 +08:00] [INFO] [manager.go:247] ["failed to campaign"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"] [error="context canceled"]
[2023/06/05 10:38:39.869 +08:00] [INFO] [manager.go:219] ["etcd session is done, creates a new one"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"]
[2023/06/05 10:38:39.869 +08:00] [INFO] [manager.go:223] ["break campaign loop, NewSession failed"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"] [error="context canceled"]
[2023/06/05 10:38:39.872 +08:00] [INFO] [manager.go:272] ["revoke session"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 10.10.5.28:10080"] []
[2023/06/05 10:38:39.872 +08:00] [INFO] [server.go:864] ["[server] graceful shutdown."]
[2023/06/05 10:38:39.874 +08:00] [WARN] [manager.go:261] ["is not the owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"]
[2023/06/05 10:38:39.874 +08:00] [INFO] [manager.go:219] ["etcd session is done, creates a new one"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"]
[2023/06/05 10:38:39.874 +08:00] [INFO] [manager.go:223] ["break campaign loop, NewSession failed"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"] [error="context canceled"]
[2023/06/05 10:38:39.881 +08:00] [INFO] [manager.go:272] ["revoke session"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager dc79478b-608a-42ad-ac72-d41e98687f9e"] []
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_workerpool.go:82] ["[ddl] closing workerPool"]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 6, tp add index"] ["take time"=3.309µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 7, tp add index"] ["take time"=2.138µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 8, tp add index"] ["take time"=1.497µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 9, tp add index"] ["take time"=1.908µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 10, tp add index"] ["take time"=1.419µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 11, tp add index"] ["take time"=1.458µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 4, tp add index"] ["take time"=1.195µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 5, tp add index"] ["take time"=1.894µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_workerpool.go:82] ["[ddl] closing workerPool"]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 3, tp general"] ["take time"=1.541µs]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 1, tp general"] ["take time"=312ns]
[2023/06/05 10:38:39.883 +08:00] [INFO] [ddl_worker.go:165] ["[ddl] DDL worker closed"] [worker="worker 2, tp add index"] ["take time"=224ns]
[2023/06/05 10:38:39.883 +08:00] [INFO] [delete_range.go:148] ["[ddl] closing delRange"]
[2023/06/05 10:38:39.883 +08:00] [INFO] [session_pool.go:94] ["[ddl] closing sessionPool"]
[2023/06/05 10:38:39.884 +08:00] [INFO] [ddl.go:815] ["[ddl] DDL closed"] [ID=dc79478b-608a-42ad-ac72-d41e98687f9e] ["take time"=9.936703ms]
[2023/06/05 10:38:39.884 +08:00] [INFO] [ddl.go:639] ["[ddl] stop DDL"] [ID=dc79478b-608a-42ad-ac72-d41e98687f9e]
[2023/06/05 10:38:44.037 +08:00] [INFO] [ddl.go:752] ["[ddl] start DDL"] [ID=896e8af7-d24d-4015-8fd5-e41bea5aaa4b] [runWorker=true]
[2023/06/05 10:38:44.037 +08:00] [INFO] [ddl.go:715] ["[ddl] start delRangeManager OK"] ["is a emulator"=false]
[2023/06/05 10:38:44.041 +08:00] [INFO] [manager.go:178] ["start campaign owner"] [ownerInfo="[ddl] /tidb/ddl/fg/owner"]
[2023/06/05 10:38:44.042 +08:00] [INFO] [job_table.go:324] ["[ddl] get global state and global state change"] [oldState=false] [currState=false]
[2023/06/05 10:38:44.049 +08:00] [INFO] [env.go:90] ["[ddl-ingest] the ingest sorted directory"] ["data path:"=/tmp/tidb/tmp_ddl-3306]
[2023/06/05 10:38:44.050 +08:00] [WARN] [backend_mgr.go:59] ["[ddl-ingest] ingest backfill may not be available"] [error="the available disk space(24019136512) in /tmp/tidb/tmp_ddl-3306 should be greater than @@tidb_ddl_disk_quota(107374182400)"]
[2023/06/05 10:38:44.050 +08:00] [INFO] [env.go:68] ["[ddl-ingest] init global ingest backend environment finished"] ["memory limitation"=2147483648] ["disk usage info"="disk usage: 56484765696/80503902208, backend usage: 0"] ["max open file number"=1000000] ["lightning is initialized"=true]
不过官方文档中明确提到了不建议在升级的同时做 ddl 操作的。
Tao
(Tao)
2023 年6 月 9 日 06:49
37
在升级前几天,有个DDL,是加索引的,本来库里也就几千条数据,结果执行了好几分钟都没结束,于是就去kill 连接,结果也kill 不掉,所以才去尝试升级。
小王同学Plus
(小王同学 Plus)
2023 年6 月 9 日 07:11
38
DDL 是需要 admin cancel 的。。。
你这问题我也没遇到过。。。下次升级还是注意 check 一下吧。