执行analyze table表整理命令后tidb服务oom再也启动不了了

【 TiDB 使用环境】生产环境
【 TiDB 版本】The latest version: v1.16.1
Local installed version: v1.14.1
Cluster version: v6.5.0
【复现路径】集群环境,手动执行了analyze table xxxx; 导致在当前执行的tidb服务oom,然后在另外一台tidb机器上执行也oom了,表的数据量 66886675条,分区从2024年 到2028年,按天分区,有几百个分区,但是2024年12月后的分区是没有数据的。
【遇到的问题:问题现象及影响】现在问题是新加服务器也无法正常加入集群,启动老的tidb服务也启动不了。
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
tidb_stderr.log (1.9 MB)



tidb.log (10.0 MB)

tidb-server 没启动时内存余量还剩多少?

一共是48G,剩余44G

限制一下内存试试呢
https://docs.pingcap.com/zh/tidb/v6.5/statistics#统计信息收集的内存限制

也可以参考这里排查一下
https://docs.pingcap.com/zh/tidb/v6.5/troubleshoot-tidb-oom#收集和加载统计信息的过程中消耗太多内存

启动报错了,看法红色的日志信息呢,看具体什么原因造成的。

log文件已上传,请帮忙看一下,感谢感谢。

5a976fbc936b30bc341101a43b73141
这个值设置的是默认的-1 是不会限制的,那现在要怎么操作呀?我上传了tidb的启动log,能帮忙再看一下吗?感谢感谢


先登录正常的tidb-server,把analyze 语句给 kill 掉吧,防止其他节点也 OOM

另外可以通过下面的连接,限制一下analyze table的并发
https://docs.pingcap.com/zh/tidb/stable/statistics#控制-analyze-并发度

另外看报错日志里,是加载 Partition 的统计信息问题导致的,可以尝试一下先 DROP 一下正在执行 analyze table的分区表的统计信息,看能不能正常启动
https://docs.pingcap.com/zh/tidb/stable/sql-statement-drop-stats

1 个赞

老版本的kill 如果不好使, 可以找owner 去kill 或者重启
1、找出 DDL owner 节点:
通过 curl http://{TiDBIP}:10080/info/all 获取当前集群的 Owner,
2、如果 Owner 不存在,尝试手动触发 Owner 选举:
curl -X POST http://{TiDBIP}:10080/ddl/owner/resign


另外设置一下auto analyze 的时间段,只放到凌晨低峰期来执行,让后续处理时暂时不会再自动产生 anlyze table,

看堆栈是已知bug,tidb can not restart after create global binding · Issue #40368 · pingcap/tidb · GitHub

可以先看看有没有partition table的binding

goroutine 1 [running]:
github.com/pingcap/tidb/executor.(*Compiler).Compile.func1()
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/compiler.go:72 +0x445
panic({0x4318e40, 0x6ec6870})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/pingcap/tidb/statistics/handle.(*Handle).GetPartitionStats(0xc00011b400?, 0x4f8da00?, 0x4f72898?, {0x0?, 0xc001cdab70?, 0x18?})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/statistics/handle/handle.go:997 +0x2e
github.com/pingcap/tidb/statistics/handle.(*Handle).GetTableStats(...)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/statistics/handle/handle.go:992
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildDataSource(0xc0072ff1e0, {0x4fafbb0, 0xc0062bbdd0}, 0xc0079f1ad0, 0xc008935d30)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/logical_plan_builder.go:4456 +0x9ce
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildResultSetNode(0xc0072ff1e0, {0x4fafbb0?, 0xc0062bbdd0?}, {0x4fc96b0?, 0xc008935ce0?}, 0x0?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/logical_plan_builder.go:380 +0x19d
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildJoin(0xc00765af50?, {0x4fafbb0?, 0xc0062bbdd0?}, 0x0?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/logical_plan_builder.go:720 +0x71d
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildResultSetNode(0x0?, {0x4fafbb0?, 0xc0062bbdd0?}, {0x4fc8948?, 0xc002798870?}, 0x0?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/logical_plan_builder.go:367 +0x271
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildTableRefs(0xc0072ff1e0?, {0x4fafbb0?, 0xc0062bbdd0?}, 0x393cf94?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/logical_plan_builder.go:359 +0x85
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildSelect(0xc0072ff1e0, {0x4fafbb0, 0xc0062bbdd0}, 0xc00747d0e0)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/logical_plan_builder.go:3916 +0x6c7
github.com/pingcap/tidb/planner/core.(*PlanBuilder).Build(0xc0072ff1e0, {0x4fafbb0, 0xc0062bbdd0}, {0x4fc4080?, 0xc00747d0e0?})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/planbuilder.go:804 +0x745
github.com/pingcap/tidb/planner.buildLogicalPlan({0x4fafbb0, 0xc0062bbdd0}, {0x501e818?, 0xc00011b400}, {0x4fc4080, 0xc00747d0e0}, 0xc0072ff1e0)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/optimize.go:461 +0x12f
github.com/pingcap/tidb/planner.optimize({0x4fafbb0, 0xc0062bbdd0}, {0x501e818?, 0xc00011b400}, {0x4fc4080?, 0xc00747d0e0?}, {0x4fe5b50, 0xc0062bbe60})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/optimize.go:382 +0x473
github.com/pingcap/tidb/planner.Optimize({0x4fafbb0, 0xc0062bbdd0}, {0x501e818, 0xc00011b400}, {0x4fc4080, 0xc00747d0e0}, {0x4fe5b50, 0xc0062bbe60})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/optimize.go:245 +0xf11
github.com/pingcap/tidb/planner/core.(*PlanBuilder).buildExplain(0xc0072ff040, {0x4fafbb0, 0xc0062bbdd0}, 0xc008935dc0)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/planbuilder.go:4783 +0xd9
github.com/pingcap/tidb/planner/core.(*PlanBuilder).Build(0xc0072ff040, {0x4fafbb0, 0xc0062bbdd0}, {0x4fc2c80?, 0xc008935dc0?})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/planbuilder.go:779 +0x432
github.com/pingcap/tidb/planner.buildLogicalPlan({0x4fafbb0, 0xc0062bbdd0}, {0x501e818?, 0xc00011b400}, {0x4fc2c80, 0xc008935dc0}, 0xc0072ff040)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/optimize.go:461 +0x12f
github.com/pingcap/tidb/planner.optimize({0x4fafbb0, 0xc0062bbdd0}, {0x501e818?, 0xc00011b400}, {0x4fc2c80?, 0xc008935dc0?}, {0x4fe5b50, 0xc0062bbe60})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/optimize.go:382 +0x473
github.com/pingcap/tidb/planner.Optimize({0x4fafbb0, 0xc0062bbdd0}, {0x501e818, 0xc00011b400}, {0x4fc2c80, 0xc008935dc0}, {0x4fe5b50, 0xc0062bbe60})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/optimize.go:245 +0xf11
github.com/pingcap/tidb/executor.(*Compiler).Compile(0xc00765cfc8, {0x4fafbb0, 0xc0062bbdd0}, {0x4fc8580, 0xc008935dc0?})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/compiler.go:116 +0x6f8
github.com/pingcap/tidb/session.(*session).ExecuteStmt(0xc00011b400, {0x4fafbb0, 0xc0062bbdd0}, {0x4fc8580?, 0xc008935dc0})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2171 +0x54e
github.com/pingcap/tidb/session.(*session).ExecuteInternal(0xc00011b400, {0x4fafbb0, 0xc0062bbdd0}, {0xc0004e5100, 0x614}, {0x0, 0x0, 0x0})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:1674 +0x3f2
GitHub - pingcap/tidb: TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.bindinfo.getHintsForSQL({0x501e818, 0xc00011b400}, {0xc000495480, 0x5fe})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/bindinfo/handle.go:951 +0x177
github.com/pingcap/tidb/bindinfo.(*BindRecord).prepareHints(0xc007411100, {0x501e818, 0xc00011b400})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/bindinfo/bind_record.go:178 +0x1e7
github.com/pingcap/tidb/bindinfo.(*BindHandle).newBindRecord(0xc007303b00, {0xc0099e0140?, 0x40?})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/bindinfo/handle.go:723 +0xbcf
github.com/pingcap/tidb/bindinfo.(*BindHandle).Update(0xc007303b00, 0x1)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/bindinfo/handle.go:173 +0x6c5
github.com/pingcap/tidb/domain.(*Domain).LoadBindInfoLoop(0xc000cdec00, {0x501e818, 0xc00011b400}, {0x501e818, 0xc000123b80})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/domain/domain.go:1444 +0xe5
github.com/pingcap/tidb/session.BootstrapSession({0x4fd95f0, 0xc000d1a960})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:3301 +0x648
main.createStoreAndDomain()
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:314 +0x1cb
main.main()
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:214 +0x2ca

v6.5.1已经修复

3 个赞

可以这样子不,show process list; 查询到执行的sql ,然后 kill tidb xxx;先将查询停掉

另外看到你有监控可以配合看下topSQL ,关闭 资源使用高的SQL

有没有加过optimize binding?v6.5.0有个bug重启后无法读取binding会导致空指针

试试先升级到 * 6.5.11: 2024-09-20

看日志是没有分配内存,感觉是系统bug,升级下新版本试下呢。

日志中的 panic: runtime error: invalid memory address or nil pointer dereference 表明TiDB在执行过程中遇到了空指针引用,这可能是由于内部bug或者资源竞争导致的错误,可以考虑通过systemctl的方式启动tidb-4000.service服务