tidb 从7.1到7.5 tidb server 启动不了

【 TiDB 使用环境】生产环境
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】


【资源配置】

【附件:截图/日志/监控】
按照这个文档操作了,创建了两个表也不行

能从头说下你都干了什么 遇到什么问题吗?

tiup cluster upgrade tidb-test v7.5.0
滚动升级tidb,到tidb server 就有问题了。中断了。也启动不了

少了表你贴tidb日志

tidb失败的日志tidb-3306/log 贴一下

@pyuh-北京 看看其它tidb节点是不是有报错。admin show ddl jobs看一下哪一个tidb owner,然后看看owner有没有报错

过滤ERROR
puacct.usage_sys: open /sys/fs/cgroup/cpu,cpuacct/system.slice/tidb-3306.service/cpuacct.usage_sys: no such file or directory"]
[2023/12/26 02:42:46.254 +08:00] [ERROR] [cpu.go:65] [GetCgroupCPU] [error=“error when reading cpu system time from cgroup v1 at /sys/fs/cgroup/cpu,cpuacct/system.slice/tidb-3306.service/cpuacct.usage_sys: open /sys/fs/cgroup/cpu,cpuacct/system.slice/tidb-3306.service/cpuacct.usage_sys: no such file or directory”]
[2023/12/26 02:44:04.261 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:04.261 +08:00] [ERROR] [pd.go:236] [“updateTS error”] [txnScope=global] [error=“rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster”]
[2023/12/26 02:44:04.364 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:04.364 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:04.565 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:04.566 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:04.967 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:04.967 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:05.769 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:05.769 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]


通过admin show ddl jobs;可以看到有DDL持续被cancelled
通过admin show ddl;定位ddl owner节点,并且发现tidb owner节点报错。
建议手动mkdir再重启

主要原因是因为升级过程中需要执行几个 DDL SQL,由于 DDL Owner 目录不存在导致 DDL 无法创建目录并且无法成功执行 DDL 导致升级不成功。
通过修复 DDL Owner节点问题,让DDL顺利执行可以完成升级

1 个赞

可不可以先用1台7.5顶一下,剩下3台7.1缩容扩容的方式重新升级

你的日志图片哪里来的,磁盘空间不够?

/tmp/tidb/这个没有权限老问题了,需要手工建出来给tidb用户读写权限

感觉是和PD的联系出现问题了