tidb频繁内存gc导致cpu负载高

TiDBer_F44AmLol · 2024 年10 月 23 日 03:53

【 TiDB 使用环境】生产环境
【 TiDB 版本】7.5.0
【复现路径】做过哪些操作出现的问题
【遇到的问题：问题现象及影响】
三个节点的集群，32G内存，采用混部的方式部署，三个tidb实例中其中一个实例CPU每分钟周期性的使用率增加，后面定位到是该实例内存超过了tidb_server_memory_limit * tidb_server_memory_limit_gc_trigger 从而触发gc动作导致CPU增加。目前已经避免将SQL请求转发到该tidb实例，但是该实例还是每分钟触发gc，导致cpu增大。虽然可以通过调整tidb_server_memory_limit_gc_trigger 以及tidb_server_memory_limit 来避免GC，但是一旦后续SQL请求转发到节点，还是会导致该节点内存使用超过tidb_server_memory_limit * tidb_server_memory_limit_gc_trigger ，进而继续触发gc。
疑问：1.tidb实例内存gc不会释放内存使用率吗，为何即使没有SQL请求了，还会不断触发内存gc。
2.即使调整参数或者增加内存，但是访问量增加还是会导致触发gc，导致cpu负载增加，有没有其他什么方式避免频繁内存gc。

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面

Saved profile in /root/pprof/pprof.tidb-server.samples.cpu.004.pb.gz
File: tidb-server
Build ID: 06544e279eb4143741ccbe837cfc00f33a0f042d
Type: cpu
Time: Oct 23, 2024 at 10:57am (CST)
Duration: 9.07s, Total samples = 24.86s (274.13%)
Entering interactive mode (type “help” for commands, “o” for options)
(pprof) top30
Showing nodes accounting for 23.19s, 93.28% of 24.86s total
Dropped 363 nodes (cum <= 0.12s)
Showing top 30 nodes out of 109
flat flat% sum% cum cum%
4.14s 16.65% 16.65% 4.41s 17.74% runtime.findObject
4.08s 16.41% 33.07% 12.78s 51.41% runtime.scanobject
1.68s 6.76% 39.82% 1.76s 7.08% runtime.(*gcBits).bitp (inline)

哈喽沃德 · 2024 年10 月 24 日 01:00

使用监控工具：借助监控工具（如 Prometheus 和 Grafana），监测 TiDB 实例的内存使用情况和 GC 的频率，识别出内存使用的高峰时段和特定 SQL 请求。
分析慢查询：使用 TiDB 提供的慢查询日志，分析并识别耗时较长的查询，这些查询可能导致过多的内存使用。

小龙虾爱大龙虾 · 2024 年10 月 24 日 02:32

机器配置太低了，TiDB Server 里统计信息，schema 信息，计算等等都会占用内存的，太低自然不行了 https://docs.pingcap.com/zh/tidb/stable/hardware-and-software-requirements#开发及测试环境

TiDBer_xTvoCh2f · 2024 年10 月 24 日 04:01

升级配置吧，不过自动gc感觉确实有点问题，有大数据量导入，总是不停地gc

xiaohaozifeifeifei · 2024 年10 月 24 日 05:58

感觉是你配置低了吧，看看监控是不是到资源到瓶颈了呢

chris-zhang · 2024 年10 月 24 日 06:19

确实感觉配置稍微的低了点，单机都有这么大内存配置了