【求助】left join 查询耗时过久

【 TiDB 使用环境】业务测试
【 TiDB 版本】v5.4.1
【遇到的问题】left join 查询耗时过长
【复现路径】首次执行耗时 18s,查看慢日志发现统计信息过期,analyze 之后还是耗时 18s
【问题现象及影响】
驱动表数据量为 1200w,被驱动表数据量为 70w
TiKV 查询耗时 18s 左右,TiFlash 耗时 30s
查看执行计划发现,耗时都在 HashJoin 算子的 probe
查看系统资源,TiKV CPU 占用率不高,TiDB server CPU 使用率飙升至 2000% 以上
请问这个查询速度正常吗?

【附件】

请提供各个组件的 version 信息,如 cdc/tikv,可通过执行 cdc version/tikv-server --version 获取。

TiKV 执行计划如下

Projection_8 2304.50 6377 root time:28.3s, loops:9, Concurrency:8 yd_result.yd_material_summary.appid, yd_result.yd_material_summary.mid, yd_result.yd_material_summary.eid, yd_result.yd_material_summary.mtype, yd_result.yd_material_summary.valid_recdate, Column#168, Column#169, Column#170, Column#171, Column#172, Column#173, Column#174, Column#175, Column#176, Column#177 2.24 MB N/A
└─HashAgg_9 2304.50 6377 root time:28.3s, loops:9, partial_worker:{wall_time:28.26943855s, concurrency:8, task_num:96317, tot_wait:1m17.229505248s, tot_exec:2m28.810792245s, tot_time:3m46.142556324s, max:28.269358296s, p95:28.269358296s}, final_worker:{wall_time:28.291671228s, concurrency:8, task_num:64, tot_wait:3m46.140120196s, tot_exec:169.672801ms, tot_time:3m46.309897564s, max:28.291580971s, p95:28.291580971s} group by:Column#230, Column#231, Column#232, funcs:sum(Column#215)->Column#168, funcs:sum(Column#216)->Column#169, funcs:sum(Column#217)->Column#170, funcs:sum(Column#218)->Column#171, funcs:sum(Column#219)->Column#172, funcs:sum(Column#220)->Column#173, funcs:sum(Column#221)->Column#174, funcs:sum(Column#222)->Column#175, funcs:sum(Column#223)->Column#176, funcs:sum(Column#224)->Column#177, funcs:firstrow(Column#225)->yd_result.yd_material_summary.mid, funcs:firstrow(Column#226)->yd_result.yd_material_summary.eid, funcs:firstrow(Column#227)->yd_result.yd_material_summary.mtype, funcs:firstrow(Column#228)->yd_result.yd_material_summary.valid_recdate, funcs:firstrow(Column#229)->yd_result.yd_material_summary.appid 28.6 MB N/A
└─Projection_58 9788468.14 98624017 root time:27.9s, loops:96318, Concurrency:8 yd_result.yd_material_summary.payment, cast(yd_result.yd_material_summary.viewcount, decimal(10,0) BINARY)->Column#216, cast(yd_result.yd_material_summary.clickcount, decimal(10,0) BINARY)->Column#217, cast(yd_result.yd_material_summary.newlogincount, decimal(10,0) BINARY)->Column#218, cast(yd_result.yd_material_summary.regdevcount, decimal(10,0) BINARY)->Column#219, cast(yd_result.yd_material_summary.reglogincount, decimal(10,0) BINARY)->Column#220, yd_result.yd_material_summary.newuserincome_after, cast(yd_result.yd_material_summary.newuserpaypeople, decimal(10,0) BINARY)->Column#222, yd_result.yd_material_summary_ltv.ltv3income_after, yd_result.yd_material_summary_ltv.ltv7income_after, yd_result.yd_material_summary.mid, yd_result.yd_material_summary.eid, yd_result.yd_material_summary.mtype, yd_result.yd_material_summary.valid_recdate, yd_result.yd_material_summary.appid, yd_result.yd_material_summary.mid, yd_result.yd_material_summary.eid, yd_result.yd_material_summary.mtype 5.81 MB N/A
└─HashJoin_16 9788468.14 98624017 root time:27.6s, loops:96318, build_hash_table:{total:325.6ms, fetch:167.5ms, build:158.1ms}, probe:{concurrency:8, total:3m41.4s, max:28.3s, probe:3m38.8s, fetch:2.63s} inner join, equal:[eq(yd_result.yd_material_summary.mid, yd_result.yd_material_summary_ltv.mid) eq(yd_result.yd_material_summary.eid, yd_result.yd_material_summary_ltv.eid)] 62.5 MB 0 Bytes
├─IndexLookUp_35(Build) 446206.71 348412 root time:264.8ms, loops:342, index_task: {total_time: 76.6ms, fetch_handle: 74.2ms, build: 142.4µs, wait: 2.26ms}, table_task: {total_time: 618.8ms, num: 27, concurrency: 8} 62.1 MB N/A
│ ├─Selection_33(Build) 557179.65 471667 cop[tikv] time:8.65ms, loops:520, cop_task: {num: 2, max: 3.94ms, min: 3.74ms, avg: 3.84ms, p95: 3.94ms, rpc_num: 2, rpc_time: 7.59ms, copr_cache_hit_ratio: 1.00}, tikv_task:{proc max:1.37s, min:58ms, p80:1.37s, p95:1.37s, iters:733, tasks:2} or(or(eq(yd_result.yd_material_summary.appid, 301), eq(yd_result.yd_material_summary.appid, 180)), or(eq(yd_result.yd_material_summary.appid, 190), or(eq(yd_result.yd_material_summary.appid, 283), and(eq(yd_result.yd_material_summary.appid, 293), in(yd_result.yd_material_summary.gameid, 1639021042620310, 1639446704207520, 1639019434662730))))) N/A N/A
│ │ └─IndexRangeScan_31 703862.74 741026 cop[tikv] table:a, index:PRIMARY(recdate, gameid, osid, cid, aid, oid, mid, eid, site_set, pt, appid) tikv_task:{proc max:1.19s, min:51ms, p80:1.19s, p95:1.19s, iters:733, tasks:2} range:[2022-06-21,2022-07-21], keep order:false N/A N/A
│ └─Selection_34(Probe) 446206.71 348412 cop[tikv] time:152.7ms, loops:431, cop_task: {num: 111, max: 6.27ms, min: 955µs, avg: 1.91ms, p95: 3.76ms, max_proc_keys: 616, p95_proc_keys: 318, tot_proc: 19ms, tot_wait: 16ms, rpc_num: 111, rpc_time: 207.9ms, copr_cache_hit_ratio: 0.90}, tikv_task:{proc max:131ms, min:1ms, p80:37ms, p95:73ms, iters:945, tasks:111}, scan_detail: {total_process_keys: 2853, total_process_keys_size: 951723, total_keys: 2862, rocksdb: {delete_skipped_count: 0, key_skipped_count: 5546, block: {cache_hit_count: 110, read_count: 0, read_byte: 0 Bytes}}} eq(yd_result.yd_material_summary.mtype, 2) N/A N/A
│ └─TableRowIDScan_32 557179.65 471667 cop[tikv] table:a tikv_task:{proc max:130ms, min:1ms, p80:36ms, p95:71ms, iters:945, tasks:111} keep order:false N/A N/A
└─TableReader_41(Probe) 449258.69 449187 root time:59.5ms, loops:444, cop_task: {num: 8, max: 149.9ms, min: 50.5ms, avg: 118.5ms, p95: 149.9ms, rpc_num: 8, rpc_time: 947.8ms, copr_cache_hit_ratio: 0.00} data:Selection_40 44.6 MB N/A
└─Selection_40 449258.69 449187 cop[tiflash] tiflash_task:{proc max:33.5ms, min:21.8ms, p80:33.3ms, p95:33.5ms, iters:30, tasks:8, threads:8} eq(2, yd_result.yd_material_summary_ltv.mtype) N/A N/A
└─TableFullScan_39 714973.00 656836 cop[tiflash] table:b tiflash_task:{proc max:29.3ms, min:21.8ms, p80:29ms, p95:29.3ms, iters:30, tasks:8, threads:8} keep order:false N/A N/A

b表从TiFlash查完是449187行,a表从TiKV查完是348412行,JOIN完是98624017行,所以需要检查JOIN的字段是否存在多对多的情况导致笛卡尔积,尽管你最终的HashAgg把重复数据聚合掉了

感谢解答,之前没有看仔细,以为是900W,这个应该是开发的 SQL 有问题了,谢谢!