tikvserver进程cpu跑高,出现大量报错日志

【 TiDB 使用环境】生产环境
【 TiDB 版本】8.1.0
【复现路径】做过单机reload,扩容缩容,但是不确定是这个引起的
【遇到的问题:问题现象及影响】
【复制黏贴 ERROR 报错的日志】
[2025/01/15 15:03:48.962 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 146413201, leader may Some(id: 146413206 store_id: 8)" not_leader { region_id: 146413201 leader { id: 146413206 store_id: 8 } }))”] [cid=14100788] [thread_id=44]
[2025/01/15 15:03:48.962 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 133015870, leader may Some(id: 146676613 store_id: 8)" not_leader { region_id: 133015870 leader { id: 146676613 store_id: 8 } }))”] [cid=14100791] [thread_id=47]
[2025/01/15 15:03:48.963 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 133015870, leader may Some(id: 146676613 store_id: 8)" not_leader { region_id: 133015870 leader { id: 146676613 store_id: 8 } }))”] [cid=14100790] [thread_id=47]
[2025/01/15 15:03:48.963 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 156945076, leader may Some(id: 161313310 store_id: 8)" not_leader { region_id: 156945076 leader { id: 161313310 store_id: 8 } }))”] [cid=14100792] [thread_id=43]
[2025/01/15 15:03:48.963 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 156945076, leader may Some(id: 161313310 store_id: 8)" not_leader { region_id: 156945076 leader { id: 161313310 store_id: 8 } }))”] [cid=14100793] [thread_id=44]
[2025/01/15 15:03:48.963 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 133015870, leader may Some(id: 146676613 store_id: 8)" not_leader { region_id: 133015870 leader { id: 146676613 store_id: 8 } }))”] [cid=14100794] [thread_id=47]
[2025/01/15 15:03:48.963 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 133015870, leader may Some(id: 146676613 store_id: 8)" not_leader { region_id: 133015870 leader { id: 146676613 store_id: 8 } }))”] [cid=14100795] [thread_id=47]
[2025/01/15 15:03:48.964 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 146813449, leader may Some(id: 146813451 store_id: 8)" not_leader { region_id: 146813449 leader { id: 146813451 store_id: 8 } }))”] [cid=14100796] [thread_id=43]
[2025/01/15 15:03:48.964 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 156945076, leader may Some(id: 161313310 store_id: 8)" not_leader { region_id: 156945076 leader { id: 161313310 store_id: 8 } }))”] [cid=14100797] [thread_id=44]
[2025/01/15 15:03:48.966 +08:00] [INFO] [scheduler.rs:769] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 146813449, leader may Some(id: 146813451 store_id: 8)" not_leader { region_id: 146813449 leader { id: 146813451 store_id: 8 } }))”] [cid=14100798] [thread_id=44]

:thinking:日志级别是INFO。可以参考官方的问题排查文档。

https://docs.pingcap.com/zh/tidb/stable/tidb-troubleshooting-map#72-tikv

cpu从多少提升到多少了? 从tikv的监控里能看到各个模块的线程的cpu占用率么

光这个日志看不出什么啊

1 个赞

tikv CPU 高,大概率不是你贴的这个日志的原因。

建议:
1、查看grafana的tikv面板,确认是某个或某几个tikv CPU高,还是全部tikv节点高;以及分析内存、网络带宽、磁盘IO等监控;
2、结合TiDB Dashboard的热力图,确认是否有热点问题
3、如果是热点问题,可以根据官网指引解决热点问题;
4、如果是集群有大SQL出现,通常会有慢查询或通过expensive query 查看tidb.log发现到,然后对应去优化SQL。

我觉得是你扩容的配置写错了

查看grafana的tikv面板,确认是某个或某几个tikv CPU高,还是全部tikv节点高;以及分析内存、网络带宽、磁盘IO等监控