tiflash 7.5.1 cpu负载不均衡

foxchan · 2024 年4 月 30 日 02:59

【 TiDB 使用环境】生产环境
tiflash版本7.5.1

权重一致，10个tiflash。结果总是绿色的64 cpu异常高，有没有参数或者配置让tiflash压力更加均衡
【附件：截图/日志/监控】

TiDBer_QYr0vohO · 2024 年4 月 30 日 03:53

机器配置都是一致的嘛

foxchan · 2024 年4 月 30 日 05:39

配置一致

yiduoyunQ · 2024 年4 月 30 日 06:05

副本数都是 2 么？

SELECT * FROM information_schema.tiflash_replica;

foxchan · 2024 年4 月 30 日 08:25

都是

yiduoyunQ · 2024 年4 月 30 日 08:45

Tiflash – Task Scheduler – Active and Waiting Queries Count 看一下 task 是否均衡？

洪七表哥 · 2024 年5 月 2 日 06:16

有啥特殊业务？

有猫万事足 · 2024 年5 月 5 日 15:33

https://docs.pingcap.com/zh/tidb/stable/top-sql

要分析这种单台比较出挑的情况，推荐使用topsql界面看看这台执行的sql是那些。

还有就是，tidb的ddl只有一台owner负责执行，有可能是这个原因。

foxchan · 2024 年5 月 6 日 02:27

topsql 只分析tidb和tikv，现在cpu分布不均的是tiflash，和tidb ddl 有什么关系？

有猫万事足 · 2024 年5 月 6 日 13:29

ok，是我没注意审题。

我仔细想了想，tiflash的话好像还真没有想到特别好的办法。看看其他大佬怎么说。

Lloyd-Pottiger · 2024 年5 月 7 日 05:59

找出 slow query 里出现概率最高的表 db_x.t_x
跑下面 sql，看看每个 tiflash 节点的数据是否分布均匀。

select TABLE_ID, p.STORE_ID, ADDRESS, count(p.REGION_ID) 
from information_schema.tikv_region_status r, information_schema.tikv_region_peers p, information_schema.tikv_store_status s
where r.db_name = 'db_x' and r.table_name = 't_x'
and r.region_id = p.region_id and p.store_id = s.store_id and json_extract(s.label, "$[0].value") = "tiflash" 
group by TABLE_ID, p.STORE_ID, ADDRESS;

如果是单机多实例，location-label 要设置成 “host”。

如果是单机单实例，可以用 GitHub - Lloyd-Pottiger/tiflash-replica-table-data-balancer: A tool helps to balance the table data of TiFlash replicas between multiple TiFlash instances 这种工具手动调度 TiFlash 副本的 region 分布。