TiSpark效率问题

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【概述】 场景 + 问题概述

Spark 用TiSpark查询单条数据快,查询多条慢

   // 像这种,速度就很快
spark.sql(
  """
    |select * from dim4 where pemail ='lian.chongchong@jklmnopq.app'
    |""".stripMargin).show()
//速度慢
spark.sql(
  """
    |select * from dim4 where pemail in ('lian.chongchong@jklmnopq.app','murong.chongyu@8abcd.io')
    |""".stripMargin).show()

其中 dim4表,数据量2000万,pemail 建立了索引。TiFlash已开启

两条查询语句,效率差异很大,第一条查询语句,在3s左右,第二条查询语句,在执行了6分钟后,仍然未出结果

【 TiDB 版本】

TiDB v5.0.0
TiFlash v5.0.0
TiSpark 2.4.1
Spark V2.4.8


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

能麻烦先反馈下在 tidb 执行这两个 sql 的执行计划吗? 多谢。
explain anlayze sql

mysql> explain analyze select * from dim4 where pemail ='lian.chongchong@jklmnopq.app';
+--------------------------------+---------+---------+-----------+-------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+---------+------+
| id                             | estRows | actRows | task      | access object                             | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                | operator info                                                                           | memory  | disk |
+--------------------------------+---------+---------+-----------+-------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+---------+------+
| IndexLookUp_13                 | 1.00    | 2       | root      |                                           | time:4.54ms, loops:2, index_task: {total_time: 1.2ms, fetch_handle: 1.19ms, build: 1.38µs, wait: 9.95µs}, table_task: {total_time: 9.39ms, num: 1, concurrency: 5}                                                                                                                                                                                                                                                                            |                                                                                         | 17.8 KB | N/A  |
| ├─IndexRangeScan_11(Build)     | 1.00    | 2       | cop[tikv] | table:dim4, index:idx_hello_email(pemail) | time:1.18ms, loops:3, cop_task: {num: 1, max: 1.17ms, proc_keys: 2, rpc_num: 1, rpc_time: 1.13ms, copr_cache_hit_ratio: 0.00}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 2, total_keys: 3, rocksdb: {delete_skipped_count: 0, key_skipped_count: 2, block: {cache_hit_count: 4, read_count: 0, read_byte: 0 Bytes}}}                                                                                                    | range:["lian.chongchong@jklmnopq.app","lian.chongchong@jklmnopq.app"], keep order:false | N/A     | N/A  |
| └─TableRowIDScan_12(Probe)     | 1.00    | 2       | cop[tikv] | table:dim4                                | time:3.02ms, loops:2, cop_task: {num: 2, max: 2.86ms, min: 1.16ms, avg: 2.01ms, p95: 2.86ms, max_proc_keys: 1, p95_proc_keys: 1, rpc_num: 2, rpc_time: 3.94ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 2, total_keys: 2, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 12, read_count: 0, read_byte: 0 Bytes}}} | keep order:false                                                                        | N/A     | N/A  |
+--------------------------------+---------+---------+-----------+-------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+---------+------+
mysql> explain analyze select * from dim4 where pemail in ('lian.chongchong@jklmnopq.app','murong.chongyu@8abcd.io');
+--------------------------------+---------+---------+-----------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                             | estRows | actRows | task      | access object                             | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                   | operator info                                                                                                                                  | memory  | disk |
+--------------------------------+---------+---------+-----------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexLookUp_13                 | 2.00    | 3       | root      |                                           | time:3ms, loops:2, index_task: {total_time: 1.07ms, fetch_handle: 1.06ms, build: 1.73µs, wait: 6.53µs}, table_task: {total_time: 7.39ms, num: 1, concurrency: 5}                                                                                                                                                                                                                                                                                                 |                                                                                                                                                | 17.8 KB | N/A  |
| ├─IndexRangeScan_11(Build)     | 2.00    | 3       | cop[tikv] | table:dim4, index:idx_hello_email(pemail) | time:1.06ms, loops:3, cop_task: {num: 2, max: 1.01ms, min: 596.6µs, avg: 805.4µs, p95: 1.01ms, max_proc_keys: 2, p95_proc_keys: 2, rpc_num: 2, rpc_time: 1.54ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 3, total_keys: 5, rocksdb: {delete_skipped_count: 0, key_skipped_count: 3, block: {cache_hit_count: 6, read_count: 0, read_byte: 0 Bytes}}}                   | range:["lian.chongchong@jklmnopq.app","lian.chongchong@jklmnopq.app"], ["murong.chongyu@8abcd.io","murong.chongyu@8abcd.io"], keep order:false | N/A     | N/A  |
| └─TableRowIDScan_12(Probe)     | 2.00    | 3       | cop[tikv] | table:dim4                                | time:1.63ms, loops:2, cop_task: {num: 2, max: 1.57ms, min: 642.6µs, avg: 1.1ms, p95: 1.57ms, max_proc_keys: 2, p95_proc_keys: 2, tot_proc: 1ms, rpc_num: 2, rpc_time: 2.14ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:1ms, p95:1ms, iters:2, tasks:2}, scan_detail: {total_process_keys: 3, total_keys: 3, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 15, read_count: 0, read_byte: 0 Bytes}}}  | keep order:false                                                                                                                               | N/A     | N/A  |
+--------------------------------+---------+---------+-----------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
3 rows in set (0.01 sec)

如上,感谢:pray:

这个表是否已经有tiflash 副本,麻烦强制指定走 tiflash 看看执行计划,多谢。

select /*+ read_from_storage(tiflash[table_name]) */ ... from table_name;

https://docs.pingcap.com/zh/tidb/stable/use-tiflash#手工-hint

在TiDB里,两条SQL都能快速执行,但用TiSpark,其中一条,就一直无法出结果

    val spark = SparkSession.builder()
      .appName("test")
      .config("spark.sql.extensions", "org.apache.spark.sql.TiExtensions")
      .config("spark.tispark.pd.addresses", "192.168.1.1:2379")
      .getOrCreate()

    spark.sql("use test")

    spark.sql(
      """
        |select * from dim4
        |""".stripMargin).show

    spark.sql(
      """
        |select * from t1 where pemail in ('lian.chongchong@jklmnopq.app' ,'murong.chongyu@8abcd.io')
        |""".stripMargin).show


    spark.stop()

pom文件

<dependencies>
    <dependency>
      <groupId>com.pingcap.tispark</groupId>
      <artifactId>tispark-assembly</artifactId>
      <version>2.4.1</version>
    </dependency>
</dependencies>
mysql> explain analyze select /*+ read_from_storage(tiflash[dim4]) */ * from dim4 where pemail ='lian.chongchong@jklmnopq.app';
+--------------------------------+---------+---------+-----------+-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+---------+------+
| id                             | estRows | actRows | task      | access object                             | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                  | operator info                                                                           | memory  | disk |
+--------------------------------+---------+---------+-----------+-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+---------+------+
| IndexLookUp_13                 | 1.00    | 2       | root      |                                           | time:2.89ms, loops:2, index_task: {total_time: 769.9µs, fetch_handle: 760.9µs, build: 1.36µs, wait: 7.69µs}, table_task: {total_time: 6.07ms, num: 1, concurrency: 5}                                                                                                                                                                                                                                                                           |                                                                                         | 17.8 KB | N/A  |
| ├─IndexRangeScan_11(Build)     | 1.00    | 2       | cop[tikv] | table:dim4, index:idx_hello_email(pemail) | time:751.7µs, loops:3, cop_task: {num: 1, max: 694µs, proc_keys: 2, rpc_num: 1, rpc_time: 650.1µs, copr_cache_hit_ratio: 0.00}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 2, total_keys: 3, rocksdb: {delete_skipped_count: 0, key_skipped_count: 2, block: {cache_hit_count: 4, read_count: 0, read_byte: 0 Bytes}}}                                                                                                     | range:["lian.chongchong@jklmnopq.app","lian.chongchong@jklmnopq.app"], keep order:false | N/A     | N/A  |
| └─TableRowIDScan_12(Probe)     | 1.00    | 2       | cop[tikv] | table:dim4                                | time:1.81ms, loops:2, cop_task: {num: 2, max: 1.65ms, min: 682.3µs, avg: 1.17ms, p95: 1.65ms, max_proc_keys: 1, p95_proc_keys: 1, rpc_num: 2, rpc_time: 2.27ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 2, total_keys: 2, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 12, read_count: 0, read_byte: 0 Bytes}}}  | keep order:false                                                                        | N/A     | N/A  |
+--------------------------------+---------+---------+-----------+-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+---------+------+
3 rows in set (0.01 sec)

mysql> explain analyze select /*+ read_from_storage(tiflash[dim4]) */ * from dim4 where pemail in ('lian.chongchong@jklmnopq.app','murong.chongyu@8abcd.io');
+--------------------------------+---------+---------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                             | estRows | actRows | task      | access object                             | execution info                                                                                                                                                                                                                                                                                                                                                                                                                               | operator info                                                                                                                                  | memory  | disk |
+--------------------------------+---------+---------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexLookUp_13                 | 2.00    | 3       | root      |                                           | time:5.84ms, loops:2, index_task: {total_time: 2.09ms, fetch_handle: 2.08ms, build: 1.11µs, wait: 6.21µs}, table_task: {total_time: 14.3ms, num: 1, concurrency: 5}                                                                                                                                                                                                                                                                          |                                                                                                                                                | 17.8 KB | N/A  |
| ├─IndexRangeScan_11(Build)     | 2.00    | 3       | cop[tikv] | table:dim4, index:idx_hello_email(pemail) | time:2.07ms, loops:3, cop_task: {num: 2, max: 2.02ms, min: 1.65ms, avg: 1.83ms, p95: 2.02ms, max_proc_keys: 2, p95_proc_keys: 2, rpc_num: 2, rpc_time: 3.6ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 3, total_keys: 5, rocksdb: {delete_skipped_count: 0, key_skipped_count: 3, block: {cache_hit_count: 6, read_count: 0, read_byte: 0 Bytes}}}  | range:["lian.chongchong@jklmnopq.app","lian.chongchong@jklmnopq.app"], ["murong.chongyu@8abcd.io","murong.chongyu@8abcd.io"], keep order:false | N/A     | N/A  |
| └─TableRowIDScan_12(Probe)     | 2.00    | 3       | cop[tikv] | table:dim4                                | time:3.48ms, loops:2, cop_task: {num: 2, max: 3.33ms, min: 1.04ms, avg: 2.18ms, p95: 3.33ms, max_proc_keys: 2, p95_proc_keys: 2, rpc_num: 2, rpc_time: 4.3ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 3, total_keys: 3, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 15, read_count: 0, read_byte: 0 Bytes}}} | keep order:false                                                                                                                               | N/A     | N/A  |
+--------------------------------+---------+---------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
3 rows in set (0.00 sec)
  1. 您好,目前 TiSpark 不支持下推 in 操作,所以查询会比较慢。可以看到执行使用 tidb-server 查询时间都很短。
  2. 可以在 建议-反馈 版块 提交需求,多谢。
  3. 或者能否改写sql,使用 union+ where 条件,速度应该会快很多

好的,谢谢答疑