因优化器问题导致TPCH的Q5语句执行过慢

我的只有一个虚拟机,tikv、pd、tidb混合部署,全部一个节点,且操作系统内存共有62GB、12CPU。
看你这个执行计划可以看到,你这个应该是调整了 tidb_mem_quota_query参数,因为默认的1GB的情况下会落盘(参考我那个),落盘的情况下tidb会非常慢,可以参考我之前的帖子:对于hashAgg算子非并行模式下还是发生OOMSort算子落盘性能太慢,希望产品层面进行优化 。另一方面你三个节点配置相对高一些,所以tikv并发读取数据效率也会更好一些,我感觉如果你走了手工调优后的执行计划应该会更快。
我这边所有的操作都是多次重复执行后做的结论。
还有我禁用掉了tikv-client.copr-cache.capacity-mb=0,目的是避免region查询结果缓存,这种语句一般在生产中短时间重复执行概率较低不太容易命中缓存。
我机器和集群配置情况:

[tidb@host0 ~]$ tiup cluster display tidb-test
tiup is checking updates for component cluster ...
A new version of cluster is available:
   The latest version:         v1.12.3
   Local installed version:    v1.11.0
   Update current component:   tiup update cluster
   Update all components:      tiup update --all

Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.11.0/tiup-cluster display tidb-test
Cluster type:       tidb
Cluster name:       tidb-test
Cluster version:    v6.5.2
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://192.168.31.201:2379/dashboard
Grafana URL:        http://192.168.31.201:3000
ID                    Role          Host            Ports                            OS/Arch       Status   Data Dir                           Deploy Dir
--                    ----          ----            -----                            -------       ------   --------                           ----------
192.168.31.201:9093   alertmanager  192.168.31.201  9093/9094                        linux/x86_64  Up       /data/tidb-data/alertmanager-9093  /data/tidb-deploy/alertmanager-9093
192.168.31.201:3000   grafana       192.168.31.201  3000                             linux/x86_64  Up       -                                  /data/tidb-deploy/grafana-3000
192.168.31.201:2379   pd            192.168.31.201  2379/2380                        linux/x86_64  Up|L|UI  /data/tidb-data/pd-2379            /data/tidb-deploy/pd-2379
192.168.31.201:9090   prometheus    192.168.31.201  9090/12020                       linux/x86_64  Up       /data/tidb-data/prometheus-9090    /data/tidb-deploy/prometheus-9090
192.168.31.201:4000   tidb          192.168.31.201  4000/10080                       linux/x86_64  Up       -                                  /data/tidb-deploy/tidb-4000
192.168.31.201:9000   tiflash       192.168.31.201  9000/8123/3930/20170/20292/8234  linux/x86_64  Up       /data/tidb-data/tiflash-9000       /data/tidb-deploy/tiflash-9000
192.168.31.201:20160  tikv          192.168.31.201  20160/20180                      linux/x86_64  Up       /data/tidb-data/tikv-20160         /data/tidb-deploy/tikv-20160
Total nodes: 7
[tidb@host0 ~]$ tiup cluster exec tidb-test --command "free -g && lscpu" 2>/dev/null
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [Parallel] - UserSSH: user=tidb, host=192.168.31.201
+ [ Serial ] - Shell: host=192.168.31.201, sudo=false, command=`free -g && lscpu`
Run command on 192.168.31.201(sudo:false): free -g && lscpu
Outputs of free -g && lscpu on 192.168.31.201:22:
stdout:
              total        used        free      shared  buff/cache   available
Mem:             62          28          16           0          18          33
Swap:             1           1           0
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           6
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               151
Model name:          12th Gen Intel(R) Core(TM) i7-12700
Stepping:            2
CPU MHz:             2112.001
BogoMIPS:            4224.00
Hypervisor vendor:   VMware
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            25600K
NUMA node0 CPU(s):   0-11
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat umip pku ospke gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities

另外,我执行时候禁用了tiflash,主要考察tidb的运算能力,毕竟很多情况下不都每个集群都加上tiflash,且这种语句不算很复杂。