tikv 异常

【 TiDB 使用环境】生产环境
【 TiDB 版本】5.4.0
【复现路径】生产环境未查到异常情况
【遇到的问题:问题现象及影响】 所有查询均出现异常,服务器CPU 被 tikv-service 跑满了
【资源配置】 64G *3 ,3tikv

不久,数据平台报错,查看所有接口均超时,随便查询了一个sql 也是超时。

查看服务器 资源,三台 tikv ,只有一台 的cpu 爆满,其余几台均正常,查看该服务器,服务器CPU 被 tikv-service 跑满了,于是

查看tikv 异常信息:
[ERROR] [kv.rs:1167] [“KvService response batch commands fail”] [err=“"SendError(…)"”]

后查看 当前tidb 查询进程;
SHOW PROCESSLIST;
kill tidb process ; 但是无改善效果;

紧急处理,重启这个单节点的tikv 组件
后面重启单个节点 tikv 后无异常;

日志有点大,异常信息一直是如下两个重复
[2023/04/26 15:26:53.404 +08:00] [ERROR] [kv.rs:1167] [“KvService response batch commands fail”] [err=“"SendError(…)"”]
[2023/04/26 15:26:53.404 +08:00] [WARN] [endpoint.rs:606] [error-response] [err=“Coprocessor task canceled due to exceeding max pending tasks”]
[2023/04/26 15:26:53.404 +08:00] [ERROR] [kv.rs:1167] [“KvService response batch commands fail”] [err=“"SendError(…)"”]
[2023/04/26 15:26:53.404 +08:00] [WARN] [endpoint.rs:606] [error-response] [err=“Coprocessor task canceled due to exceeding max pending tasks”]

tikv.log (46.9 MB)

把tidb server 都停掉

你这个是被慢sql打死了

然后重启tidb后 立马用mysql客户端连上去
分析sql
SELECT FLOOR(UNIX_TIMESTAMP(MIN(summary_begin_time))) AS agg_begin_time, FLOOR(UNIX_TIMESTAMP(MAX(summary_end_time))) AS agg_end_time, ANY_VALUE(digest_text) AS agg_digest_text, ANY_VALUE(digest) AS agg_digest, SUM(exec_count) AS agg_exec_count, SUM(sum_latency) AS agg_sum_latency, MAX(max_latency) AS agg_max_latency, MIN(min_latency) AS agg_min_latency, CAST( SUM(exec_count * avg_latency) / SUM(exec_count) AS SIGNED ) AS agg_avg_latency, CAST( SUM(exec_count * avg_mem) / SUM(exec_count) AS SIGNED ) AS agg_avg_mem, MAX(max_mem) AS agg_max_mem, ANY_VALUE(schema_name) AS agg_schema_name, ANY_VALUE(plan_digest) AS agg_plan_digest,query_sample_text,index_names FROM INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY_HISTORY where index_names is null and query_sample_text >‘’ GROUP BY schema_name, digest ORDER BY agg_sum_latency DESC limit 1;

能找出最慢的表
或者加一下索引
我这里有tidb自动添加索引的工具


查询后没有数据