SQL 偶发主键查询速度特别慢

【 TiDB 使用环境】生产环境 3 tidb , 3 pd , 7 tikv
【 TiDB 版本】v5.0.4
【复现路径】主键查询偶发性特别慢,需要十几分钟才能查询出结果
【遇到的问题:问题现象及影响】
select from where 主键= ; 偶发性特慢,需要十几分钟

【附件:截图/日志/监控】
SQL 语句,folder_id 为 主键

SELECT
  dmsfolderd0_.folder_id AS folder1_58_0_,
  dmsfolderd0_.instance_id AS instance2_58_0_,
  dmsfolderd0_.area_id AS area3_58_0_,
  dmsfolderd0_.folder_parentFolderId AS folder4_58_0_,
  dmsfolderd0_.folder_guid AS folder5_58_0_,
  dmsfolderd0_.folder_code AS folder6_58_0_,
  dmsfolderd0_.folder_sortOrder AS folder7_58_0_,
  dmsfolderd0_.folder_name AS folder8_58_0_,
  dmsfolderd0_.folder_path AS folder9_58_0_,
  dmsfolderd0_.folder_size AS folder10_58_0_,
  dmsfolderd0_.folder_maxFolderSize AS folder11_58_0_,
  dmsfolderd0_.folder_alertSize AS folder12_58_0_,
  dmsfolderd0_.folder_maxFileSize AS folder13_58_0_,
  dmsfolderd0_.folder_forbiddenFileExtensions AS folder14_58_0_,
  dmsfolderd0_.folder_childFoldersCount AS folder15_58_0_,
  dmsfolderd0_.folder_childFilesCount AS folder16_58_0_,
  dmsfolderd0_.securityLevel_id AS securitylevel17_58_0_,
  dmsfolderd0_.folder_lockCount AS folder18_58_0_,
  dmsfolderd0_.folder_permStrategy AS folder19_58_0_,
  dmsfolderd0_.folder_verStrategy AS folder20_58_0_,
  dmsfolderd0_.folder_defaultViewId AS folder21_58_0_,
  dmsfolderd0_.folder_state AS folder22_58_0_,
  dmsfolderd0_.folder_ownerId AS folder23_58_0_,
  dmsfolderd0_.folder_createType AS folder24_58_0_,
  dmsfolderd0_.folder_createTime AS folder25_58_0_,
  dmsfolderd0_.folder_createOperator AS folder26_58_0_,
  dmsfolderd0_.folder_modifyTime AS folder27_58_0_,
  dmsfolderd0_.folder_modifyOperator AS folder28_58_0_,
  dmsfolderd0_.folder_type AS folder29_58_0_,
  dmsfolderd0_.folder_offlinePermission AS folder30_58_0_,
  dmsfolderd0_.folder_ExtensionVerifyType AS folder31_58_0_,
  dmsfolderd0_.folder_remark AS folder32_58_0_,
  dmsfolderd0_.folder_modifyOperatorName AS folder33_58_0_,
  dmsfolderd0_.folder_createOperatorName AS folder34_58_0_,
  dmsfolderd0_.folder_deleteTime AS folder35_58_0_,
  dmsfolderd0_.folder_deleteOperator AS folder36_58_0_,
  dmsfolderd0_.folder_purgeTime AS folder37_58_0_,
  dmsfolderd0_.folder_purgeOperator AS folder38_58_0_,
  dmsfolderd0_.folder_isDeleted AS folder39_58_0_,
  dmsfolderd0_.folder_isCascadeDelete AS folder40_58_0_,
  dmsfolderd0_.folder_deleteName AS folder41_58_0_,
  dmsfolderd0_.folder_deleteOperatorName AS folder42_58_0_,
  dmsfolderd0_.folder_purgeOperatorName AS folder43_58_0_,
  dmsfolderd0_.folder_rootFolderId AS folder44_58_0_
FROM
  dms_folder dmsfolderd0_
WHERE
  dmsfolderd0_.folder_id = 28145829;

慢速发生时的分析

	id         	task	estRows	operator info                    	actRows	execution info                                                                                                                                   	memory	disk
	Point_Get_1	root	1      	table:dms_folder, handle:28145829	0      	time:11m21.7s, loops:1, Get:{num_rpc:32, total_time:10m40s},tikvRPC_backoff:{num:31, total_time:41.6s},regionMiss_backoff:{num:1, total_time:2ms}	N/A   	N/A

正常时候的分析

系统改成7.5了 tidb tikv忙 rpctime会拉很长。你这个11分钟应该是排队的原因 你看看是否可以优化

表定义也发一下?(脱敏后的)

能不能截图看下慢查询执行时间具体是卡在哪个阶段?

24年考虑升级7.5吧


表定义关系应该不大。
不好意思,无法复制,只能这样截图拼接看了

意思是说,升级 7.5 有针对这个 rpctime 排队时间长的优化, 可能会解决这种情况是吗?

看起来。。。。 遇到 Region 调度了,不过需要这么长的时间 不是很合理啊 :thinking:

1 个赞

慢的这些SQL 条件都是一样的吗? 放到白天执行什么样

肯定不是执行计划的问题,backoff很多,集群整体是不是有啥问题啊

3 个赞

:thinking:表健康度下降导致没有走主键?表读写热点?

这位大佬是官方的

肯定不是执行计划的问题,backoff很多,集群整体是不是有啥问题啊 我估计他们烂sql贼多。都backoff了

慢的这些sql 条件除了 where 具体的主键值不同,sql 语句都是一样的。这个 sql 是我们程序上凌晨才执行的定时任务里边的,这个程序只在凌晨执行,所以在慢速记录上只有凌晨的记录,不过凌晨也不是一直都慢,也是时快时慢。我们白天执行这个语句非常快。

是的。这个主键查询时间看起来太久了。而且我有一个定时计划任务,每天凌晨1点会 analyze 所有健康度低于80的表,并且会生成 analyze 日志。查看了日志,每天的健康状态更新的计划任务都有在执行,并且在凌晨1点30分就会执行完成。 这个具体原因我可以在哪里排查一下吗? 或者我想了解下,咱们 7.5 版本有针对性的优化,是否有可能升级后这种情况就可能消失呢。

对应时间,资源整体利用率如何?

感觉多少是有什么操作占用了cpu,io,带宽。不然点查10分钟是有点难以理解。
现在给的信息还是太少。

backoff
的含义是 我kv忙不过来了 你们等等吧。监控出现backoff 就说明你烂sql太多了

烂sql 太多 kv先返回backoff 给你 可能和你们1点做表数据统计有关。你12点的时候表健康度很差了

还是先看看cpu io 情况

慢sql太多了吧