tpch第五条查询执行时间接近1个小时,不确定时间消耗在哪个阶段

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
tidbv4.0.15

【概述】 场景 + 问题概述
tpch第五条查询执行时间接近1个小时,不确定时间消耗在哪个阶段
【背景】 做过哪些操作

【现象】 业务和数据库现象

【问题】 当前遇到的问题

【业务影响】

【TiDB 版本】
v4.0.15
【应用软件及版本】
慢日志:

# Time: 2022-01-05T17:52:17.80957878+08:00
# Txn_start_ts: 430276048001433605
# User@Host: root[root] @ 100.73.36.117 [100.73.36.117]
# Conn_ID: 150882
# Query_time: 3473.662378462
# Parse_time: 0.002966899
# Compile_time: 0.007719799
# Rewrite_time: 0.004006152
# Optimize_time: 0.003514227
# Wait_TS: 0.000015495
# Cop_time: 160.476881742 Process_time: 87.076 Wait_time: 12.42 Request_count: 253477 Total_keys: 31996061 Process_keys: 29101484
# DB: tpch_10
# Is_internal: false
# Digest: deb9eb197db2e9aac54011358d29e1249565c487c8e3ef0ae790b7cd106f51d2
# Stats: orders:430275890505318401,lineitem:430275865247744019,region:pseudo,nation:pseudo,supplier:430274624978944004,customer:430274609879449608
# Num_cop_tasks: 253477
# Cop_proc_avg: 0.000343526 Cop_proc_p90: 0.001 Cop_proc_max: 0.441 Cop_proc_addr: 100.73.36.128:40160
# Cop_wait_avg: 4.8998e-05 Cop_wait_p90: 0 Cop_wait_max: 0.004 Cop_wait_addr: 100.73.36.127:40160
# Mem_max: 132454326418
# Prepared: false
# Plan_from_cache: false
# Plan_from_binding: false
# Has_more_results: false
# KV_total: 228.846164192
# PD_total: 0.000015425
# Backoff_total: 0
# Write_sql_response_total: 0.000006484
# Succ: true

【附件】 相关日志及配置信息

  • TiUP Cluster Display 信息
  • TiUP CLuster Edit config 信息

监控(https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana监控
  • TiDB Grafana 监控
  • TiKV Grafana 监控
  • PD Grafana 监控
  • 对应模块日志(包含问题前后 1 小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

2赞

是tpch22个场景中的第五个吗?可以用单独的sql + explain analyze 查看详细的执行计划
SQL语句:https://github.com/pingcap/tidb-bench/tree/master/tpch/queries

select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = ‘MIDDLE EAST’
and o_orderdate >= ‘1994-01-01’
and o_orderdate < date_add(‘1994-01-01’, interval ‘1’ year)
group by
n_name
order by
revenue desc;

2赞

MySQL [tpch_10]> explain analyze SELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = ‘MIDDLE EAST’ AND o_orderdate >= ‘1994-01-01’ AND o_orderdate < date_add(‘1994-01-01’, INTERVAL ‘1’ year) GROUP BY n_name ORDER BY revenue DESC;
±-----------------------------------------------------±------------±-----------±----------±--------------------------------------------------------±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±----------±--------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
±-----------------------------------------------------±------------±-----------±----------±--------------------------------------------------------±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±----------±--------+
| Sort_23 | 1.00 | 5 | root | | time:1h14m53.7s, loops:2 | Column#54:desc | 9.39 KB | 0 Bytes |
| └─Projection_25 | 1.00 | 5 | root | | time:1h14m53.7s, loops:6, Concurrency:OFF | tpch_10.nation.n_name, Column#54 | 1.88 KB | N/A |
| └─HashAgg_26 | 1.00 | 5 | root | | time:1h14m53.7s, loops:6, PartialConcurrency:32, FinalConcurrency:32 | group by:Column#60, funcs:sum(Column#58)->Column#54, funcs:firstrow(Column#59)->tpch_10.nation.n_name | 4.09 MB | N/A |
| └─Projection_65 | 9351572.05 | 72985 | root | | time:1h14m53.7s, loops:97, Concurrency:32 | mul(tpch_10.lineitem.l_extendedprice, minus(1, tpch_10.lineitem.l_discount))->Column#58, tpch_10.nation.n_name, tpch_10.nation.n_name | 2.68 MB | N/A |
| └─IndexHashJoin_32 | 9351572.05 | 72985 | root | | time:1h14m53.7s, loops:97, inner:{total:38h44m20.2s, concurrency:32, task:71124, construct:31m22.3s, fetch:38h11m8.4s, build:9m32.7s, join:1m39.7s} | inner join, inner:IndexLookUp_29, outer key:tpch_10.orders.o_orderkey, inner key:tpch_10.lineitem.l_orderkey, equal cond:eq(tpch_10.orders.o_orderkey, tpch_10.lineitem.l_orderkey), eq(tpch_10.supplier.s_suppkey, tpch_10.lineitem.l_suppkey) | 89.0 GB | N/A |
| ├─HashJoin_36(Build) | 2319210.31 | 1820575546 | root | | time:3m23.5s, loops:1777923, build_hash_table:{total:2m11.1s, fetch:11s, build:2m0s}, probe:{concurrency:32, total:39h43m44.3s, max:1h14m48s, probe:38h33m49.3s, fetch:1h9m54.9s} | inner join, equal:[eq(tpch_10.customer.c_custkey, tpch_10.orders.o_custkey)] | 33.2 GB | 0 Bytes |
| │ ├─HashJoin_38(Build) | 1500000.00 | 1201884359 | root | | time:5.9s, loops:1173736, build_hash_table:{total:59.4ms, fetch:57.3ms, build:2.1ms}, probe:{concurrency:32, total:1h9m38.8s, max:2m11.1s, probe:1h9m33s, fetch:5.8s} | inner join, equal:[eq(tpch_10.supplier.s_nationkey, tpch_10.customer.c_nationkey)] | 639.5 KB | 0 Bytes |
| │ │ ├─HashJoin_40(Build) | 25.00 | 20042 | root | | time:56.8ms, loops:33, build_hash_table:{total:784.5µs, fetch:781.1µs, build:3.36µs}, probe:{concurrency:32, total:1.82s, max:57.2ms, probe:64.6ms, fetch:1.76s} | inner join, equal:[eq(tpch_10.nation.n_nationkey, tpch_10.supplier.s_nationkey)] | 868 Bytes | 0 Bytes |
| │ │ │ ├─HashJoin_49(Build) | 0.01 | 5 | root | | time:696.2µs, loops:2, build_hash_table:{total:536.9µs, fetch:132.5µs, build:404.4µs}, probe:{concurrency:32, total:19ms, max:627.5µs, probe:24.4µs, fetch:19ms} | inner join, equal:[eq(tpch_10.region.r_regionkey, tpch_10.nation.n_regionkey)] | 20.5 KB | 0 Bytes |
| │ │ │ │ ├─TableReader_54(Build) | 0.01 | 1 | root | | time:43.6µs, loops:2, cop_task: {num: 1, max: 879µs, proc_keys: 5, rpc_num: 1, rpc_time: 823.3µs, copr_cache: disabled} | data:Selection_53 | 236 Bytes | N/A |
| │ │ │ │ │ └─Selection_53 | 0.01 | 1 | cop[tikv] | | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 5, total_keys: 6} | eq(tpch_10.region.r_name, “MIDDLE EAST”) | N/A | N/A |
| │ │ │ │ │ └─TableFullScan_52 | 5.00 | 5 | cop[tikv] | table:region | tikv_task:{time:0s, loops:1} | keep order:false, stats:pseudo | N/A | N/A |
| │ │ │ │ └─TableReader_51(Probe) | 25.00 | 25 | root | | time:77.5µs, loops:2, cop_task: {num: 1, max: 1.22ms, proc_keys: 25, rpc_num: 1, rpc_time: 1.11ms, copr_cache: disabled} | data:TableFullScan_50 | 993 Bytes | N/A |
| │ │ │ │ └─TableFullScan_50 | 25.00 | 25 | cop[tikv] | table:nation | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 25, total_keys: 26} | keep order:false, stats:pseudo | N/A | N/A |
| │ │ │ └─TableReader_56(Probe) | 100000.00 | 100000 | root | | time:54.9ms, loops:99, cop_task: {num: 1, max: 56.2ms, proc_keys: 100000, tot_proc: 51ms, tot_wait: 1ms, rpc_num: 1, rpc_time: 56.1ms, copr_cache: disabled} | data:TableFullScan_55 | 1.53 MB | N/A |
| │ │ │ └─TableFullScan_55 | 100000.00 | 100000 | cop[tikv] | table:supplier | tikv_task:{time:48ms, loops:102}, scan_detail: {total_process_keys: 100000, total_keys: 100001} | keep order:false | N/A | N/A |
| │ │ └─TableReader_58(Probe) | 1500000.00 | 1500000 | root | | time:195.4ms, loops:1467, cop_task: {num: 4, max: 726.4ms, min: 181.6ms, avg: 359.5ms, p95: 726.4ms, max_proc_keys: 434849, p95_proc_keys: 434849, tot_proc: 903ms, rpc_num: 4, rpc_time: 1.44s, copr_cache: disabled} | data:TableFullScan_57 | 22.9 MB | N/A |
| │ │ └─TableFullScan_57 | 1500000.00 | 1500000 | cop[tikv] | table:customer | tikv_task:{proc max:271ms, min:153ms, p80:271ms, p95:271ms, iters:1482, tasks:4}, scan_detail: {total_process_keys: 1500000, total_keys: 1500004} | keep order:false | N/A | N/A |
| │ └─TableReader_61(Probe) | 2307583.33 | 2275919 | root | | time:857.4ms, loops:2217, cop_task: {num: 27, max: 966ms, min: 181.3ms, avg: 588.3ms, p95: 966ms, max_proc_keys: 680574, p95_proc_keys: 559877, tot_proc: 12s, rpc_num: 27, rpc_time: 15.9s, copr_cache: disabled} | data:Selection_60 | 52.5 MB | N/A |
| │ └─Selection_60 | 2307583.33 | 2275919 | cop[tikv] | | tikv_task:{proc max:699ms, min:174ms, p80:530ms, p95:681ms, iters:14767, tasks:27}, scan_detail: {total_process_keys: 15000000, total_keys: 15000027} | ge(tpch_10.orders.o_orderdate, 1994-01-01 00:00:00.000000), lt(tpch_10.orders.o_orderdate, 1995-01-01) | N/A | N/A |
| │ └─TableFullScan_59 | 15000000.00 | 15000000 | cop[tikv] | table:orders | tikv_task:{proc max:678ms, min:160ms, p80:512ms, p95:663ms, iters:14767, tasks:27} | keep order:false | N/A | N/A |
| └─IndexLookUp_29(Probe) | 4.03 | 8831654 | root | | time:38h8m42.5s, loops:142248, index_task: {total_time: 5m48.6s, fetch_handle: 5m45.5s, build: 1.71s, wait: 1.43s}, table_task: {total_time: 23h36m24.8s, num: 71124, concurrency: 2275968} | | 11.6 KB | N/A |
| ├─IndexRangeScan_27(Build) | 4.03 | 8831654 | cop[tikv] | table:lineitem, index:PRIMARY(L_ORDERKEY, L_LINENUMBER) | time:5m45.3s, loops:213372, cop_task: {num: 150037, max: 4.22s, min: 196.7µs, avg: 2.18ms, p95: 1.24ms, max_proc_keys: 196, p95_proc_keys: 126, tot_proc: 55.7s, tot_wait: 8.19s, rpc_num: 150037, rpc_time: 4m52.2s, copr_cache: disabled}, tikv_task:{proc max:3ms, min:0s, p80:1ms, p95:1ms, iters:287327, tasks:150037}, scan_detail: {total_process_keys: 8831654, total_keys: 11041338} | range: decided by [eq(tpch_10.lineitem.l_orderkey, tpch_10.orders.o_orderkey)], keep order:false | N/A | N/A |
| └─TableRowIDScan_28(Probe) | 4.03 | 8831654 | cop[tikv] | table:lineitem | time:11m2.4s, loops:142248, cop_task: {num: 222438, max: 8.04s, min: 250.3µs, avg: 4.18ms, p95: 1.28ms, max_proc_keys: 131, p95_proc_keys: 79, tot_proc: 52.4s, tot_wait: 10.2s, rpc_num: 222438, rpc_time: 12m25.3s, copr_cache: disabled}, tikv_task:{proc max:3ms, min:0s, p80:1ms, p95:1ms, iters:353373, tasks:222438}, scan_detail: {total_process_keys: 8831654, total_keys: 10715382} | keep order:false | N/A | N/A |
±-----------------------------------------------------±------------±-----------±----------±--------------------------------------------------------±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±----------±--------+

1赞

image

image
健康度,没有问题,但是到了tidb侧实际执行的行数远大于预估

1赞

有预估的执行计划吗,另外表结构如何,磁盘性能?
附件是v5.3上的执行计划,可以参考下(磁盘是NVMe)
明天再在普通SSD上跑下看看
tidb_v5.3_Q5.txt (24.8 KB)

磁盘是nvme,我的是4.0.15版本,看了执行计划,两个analyze执行计划不一样;tidb_v5.3_4.0.15_Q5.txt (52.9 KB)

image
表结构应该没问题

1赞


看了一下我的执行计划,走了indexHashJoin
image
占用89G tidb内存,不太清楚为什么会走indexHashJoin

1赞

方便贴下表lineitem的表结构吗?我看表lineitem走了二级索引(indexlookup,有个回表动作,和预期的不一致)
这是我这边的表结构:

1赞

image

1赞

没看不来什么不同

image

1赞

你测试的数据量是多少仓的?我这边弄个v4.0.15版本试试

10G

把集群升级到v4.0.16,再把查询内存阈值调到10G(set tidb_mem_quota_query = 10737418240;)

我这边的测试结果:
v4.0.15,把查询内存调大到10G,10分钟出不来结果;
将集群升级到v4.0.16,把查询内存阈值调大到10G,9秒出结果
So,我想这可能是v4.0.15优化器不足的地方:joy:

https://docs.pingcap.com/zh/tidb/v4.0/release-4.0.16

如果内存够、不想每次都去设置的话,可以调tidb-server参数 mem-quota-query


5.3我也试了一下,也会出现很慢的情况

1赞

这个参数我之前调过

不改查询内存试试?我在5.3上不改直接查,10秒;加到10G,1分钟出不来。。。:joy (这是BUG?)


有时候有用,有时候很慢,很玄学