一台tikv机器内存溢出重启,通过监控发现3台tikv机器有一台(重启这台)内存使用率比其他2台高很多,如何查找问题

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

您好: 1. 当前region的数量是多少? 查一下监控,每个节点的region数量,leader数量是否均衡 2. 查看热点,是否集中在这台机器 3. 使用top命令查看每台机器,是否都是tidb占用的内存,排查是否有其他应用的占用,多谢

1、当前region:image
{ “count”: 3, “stores”: [ { “store”: { “id”: 1, “address”: “10.103.6.35:20160”, “version”: “3.0.0”, “state_name”: “Up” }, “status”: { “capacity”: “1000 GiB”, “available”: “743 GiB”, “leader_count”: 6398, “leader_weight”: 1, “leader_score”: 413837, “leader_size”: 413837, “region_count”: 20240, “region_weight”: 1, “region_score”: 1240961, “region_size”: 1240961, “start_ts”: “2019-08-15T20:48:14+08:00”, “last_heartbeat_ts”: “2019-11-13T11:56:13.452877341+08:00”, “uptime”: “2151h7m59.452877341s” } }, { “store”: { “id”: 4, “address”: “10.103.6.32:20160”, “version”: “3.0.0”, “state_name”: “Up” }, “status”: { “capacity”: “1000 GiB”, “available”: “742 GiB”, “leader_count”: 6771, “leader_weight”: 1, “leader_score”: 413258, “leader_size”: 413258, “region_count”: 20240, “region_weight”: 1, “region_score”: 1240961, “region_size”: 1240961, “start_ts”: “2019-11-13T11:04:44+08:00”, “last_heartbeat_ts”: “2019-11-13T11:56:14.417547047+08:00”, “uptime”: “51m30.417547047s” } }, { “store”: { “id”: 7, “address”: “10.103.6.38:20160”, “version”: “3.0.0”, “state_name”: “Up” }, “status”: { “capacity”: “1000 GiB”, “available”: “744 GiB”, “leader_count”: 7071, “leader_weight”: 1, “leader_score”: 413866, “leader_size”: 413866, “region_count”: 20240, “region_weight”: 1, “region_score”: 1240961, “region_size”: 1240961, “start_ts”: “2019-08-15T20:51:55+08:00”, “last_heartbeat_ts”: “2019-11-13T11:56:16.515865671+08:00”, “uptime”: “2151h4m21.515865671s” } } ] } 2、

3、这3台机器上都只有tikv没有其他的服务

您好: 请执行以下top命令,按m,按照内存大小排序,查看占用内存高的进程,记录进程id cat /proc/{pid}/status, 查看内存占用的多少.

您好,由于服务器已经重启,现在的这台10.103.6.32服务器的内存使用是最低的,这么操作是否还有必要?是否可以从其他的层面去分析这个问题?

您好: 那先观察一下吧,等到下次出现这种情况,先查看一下具体是哪个进程,再看下这个进程里内存分配的情况,多谢。