select * from CLUSTER_SLOW_QUERY order by Time desc limit 1查询出来的为什么不是集群最新的一条慢日志

jiyf · 2022 年5 月 18 日 08:55

mysql> explain analyze select * from CLUSTER_SLOW_QUERY order by Time desc limit 1;
+----------------------------+---------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+---------+------+
| id                         | estRows | actRows | task      | access object            | execution info                                                                                                                   | operator info                       | memory  | disk |
+----------------------------+---------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+---------+------+
| Limit_11                   | 1.00    | 1       | root      |                          | time:174.7ms, loops:2                                                                                                            | offset:0, count:1                   | N/A     | N/A  |
| └─TableReader_21           | 1.00    | 1       | root      |                          | time:174.7ms, loops:1, cop_task: {num: 1, max: 174.6ms, proc_keys: 0, rpc_num: 1, rpc_time: 174.5ms, copr_cache_hit_ratio: 0.00} | data:Limit_20                       | 3.01 KB | N/A  |
|   └─Limit_20               | 1.00    | 0       | cop[tidb] |                          |                                                                                                                                  | offset:0, count:1                   | N/A     | N/A  |
|     └─TableFullScan_19     | 1.00    | 0       | cop[tidb] | table:CLUSTER_SLOW_QUERY |                                                                                                                                  | keep order:true, desc, stats:pseudo | N/A     | N/A  |
+----------------------------+---------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+---------+------+
4 rows in set (0.17 sec)

从执行计划看，它使用的 limit 算子，并且 limit 算子下推到各个 tidb server 节点，那么它的执行逻辑是：

接收到 cop[tidb] 处理请求的 tidb server，根据下推的 limit 算子 ”order by Time desc limit 1“ 返回 1 条记录
上层的 limit 算子调用 next 函数，读取到 1 条记录就满足 limit 1 要求了，这个时候执行就结束

也就是说，这里实际执行结果逻辑上是：只在单个 tidb server 日志上的 order by Time desc limit 1，并不是 cluster 的。

mysql> explain analyze select * from CLUSTER_SLOW_QUERY order by Time desc limit 10000, 1;
+--------------------------+----------+---------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+----------+------+
| id                       | estRows  | actRows | task      | access object            | execution info                                                                                                                                    | operator info                                                          | memory   | disk |
+--------------------------+----------+---------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+----------+------+
| TopN_9                   | 1.00     | 1       | root      |                          | time:4.75s, loops:2                                                                                                                               | information_schema.cluster_slow_query.time:desc, offset:10000, count:1 | 183.1 MB | N/A  |
| └─TableReader_18         | 10000.00 | 85415   | root      |                          | time:4.3s, loops:85, cop_task: {num: 5, max: 4.52s, min: 101.6ms, avg: 1.8s, p95: 4.52s, rpc_num: 5, rpc_time: 9.02s, copr_cache_hit_ratio: 0.00} | data:TableFullScan_17                                                  | 308.8 MB | N/A  |
|   └─TableFullScan_17     | 10000.00 | 0       | cop[tidb] | table:CLUSTER_SLOW_QUERY |                                                                                                                                                   | keep order:false, stats:pseudo                                         | N/A      | N/A  |
+--------------------------+----------+---------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+----------+------+
3 rows in set (4.76 sec)

查询条件改为 order by Time desc limit 10000, 1; 执行计划变为了 topN，这取出了所有的慢日志进行了全局 topN 排序，选出 1 条，这个是真正意义上的 cluster。

所以从实际执行计划看，从 CLUSTER_SLOW_QUERY 查询出来的结果，有时候并不是真正的 cluster 意义的结果。