tidb server OOM的问题

【TiDB 使用环境】生产环境
【TiDB 版本】v7.5.4
【操作系统】UOS Server 20
【部署方式】云上部署(什么云)/机器部署(什么机器配置、什么硬盘)
【集群数据量】
【集群节点数】
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】 某个tidb server 节点oom 后重启
【复制黏贴 ERROR 报错的日志】
Aug 29 09:45:05 cxbdzxcxdb-8-64 systemd[1]: Started Session 924751 of user shsnc.
Aug 29 09:45:05 cxbdzxcxdb-8-64 systemd-logind[6576]: Session 924751 logged out. Waiting for processes to exit.
Aug 29 09:45:05 cxbdzxcxdb-8-64 systemd[1]: session-924751.scope: Succeeded.
Aug 29 09:45:05 cxbdzxcxdb-8-64 systemd-logind[6576]: Removed session 924751.
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: tidb-server invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: CPU: 123 PID: 186383 Comm: tidb-server Kdump: loaded Not tainted 4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: Hardware name: Suma R6240H0/62DB32, BIOS CXYH051039 06/24/2024
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: Call Trace:
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: dump_stack+0x66/0x8b
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: dump_header+0x4a/0x1ec
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: oom_kill_process+0x24f/0x270
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: ? oom_badness+0x25/0x140
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: out_of_memory+0x141/0x570
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: __alloc_pages_slowpath+0x9f5/0xde0
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: __alloc_pages_nodemask+0x276/0x2b0
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: alloc_pages_vma+0x7c/0x1f0
Aug 29 09:45:26 cxbdzxcxdb-8-64 kernel: do_anonymous_page+0x91/0x3a0

1 个赞

针对OOM,有什么有效的规避手段嘛

1 个赞

[2025/08/29 09:45:52.486 +08:00] [INFO] [meminfo.go:196] [“use physical memory hook”] [cgroupMemorySize=9223372036854771712] [physicalMemorySize=540381315072]
[2025/08/29 09:45:52.487 +08:00] [INFO] [cpuprofile.go:113] [“parallel cpu profiler started”]
[2025/08/29 09:45:52.502 +08:00] [INFO] [pd_service_discovery.go:631] [“[pd] switch leader”] [new-leader=http://xxxxxx.xx :2379] [old-leader=]
[2025/08/29 09:45:52.502 +08:00] [INFO] [pd_service_discovery.go:197] [“[pd] init cluster id”] [cluster-id=7373980106657405475]
[2025/08/29 09:45:52.502 +08:00] [INFO] [client.go:600] [“[pd] changing service mode”] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]

很少会 OOM,你参考这个配置下,https://docs.pingcap.com/zh/tidb/stable/configure-memory-usage/

参考这种看看
https://docs.pingcap.com/zh/tidb/stable/troubleshoot-tidb-oom/#tidb-oom-故障排查

是不是tidb server节点的内存太小了,或者是不是 tidb server 节点上还跑了其它服务,可以限制一下,使用内存总量的参数

会话退出来了吗

是不是有大查询、大写入或者导数之内的

肯定有哇,要不怎么会OOM :yum: