聚簇索引在分区表上未生效

【 TiDB 使用环境】生产
【 TiDB 版本】v6.5.0
【复现路径】
使用如下两张表进行测试

CREATE TABLE `test_nonclustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7));

CREATE TABLE `test_clustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7));

INSERT into test_nonclustered (day,partner_org_id,partner_name,customer_org_id,license_id) VALUES
("2020-01-01","1","partner1","customer1","license1")
...
INSERT into test_nonclustered (day,partner_org_id,partner_name,customer_org_id,license_id) VALUES
("2020-01-01","1","partner1","customer1","license1")
...
(数据插入语句见附件)

Data (31.8 KB)
【遇到的问题:问题现象及影响】
在分区表上,如果不使用/*+ USE_INDEX(tn, PRIMARY) */来做指定索引,非聚簇索引表也不会使用索引,而对于聚簇索引表,无论是否指定,都无法走索引,直接扫描全表
这个在数据量较大时很麻烦(问题来源),所以想问一下是否有优化方法?

explain select /*+ USE_INDEX(tn, PRIMARY) */ * from test_nonclustered tn order by day limit 10;
TopN_8	10.00	root		test.test_nonclustered.day, offset:0, count:10
└─IndexLookUp_15	10.00	root	partition:all	
  ├─Limit_14(Build)	10.00	cop[tikv]		offset:0, count:10
  │ └─IndexFullScan_12	1000.00	cop[tikv]	table:tn, index:PRIMARY(day, partner_org_id, customer_org_id, license_id)	keep order:false
  └─TableRowIDScan_13(Probe)	10.00	cop[tikv]	table:tn	keep order:false
explain select /*+ USE_INDEX(tn, PRIMARY) */ * from test_clustered order by day limit 10;
TopN_7	10.00	root		test.test_clustered.day, offset:0, count:10
└─TableReader_14	10.00	root	partition:all	data:Limit_13
  └─Limit_13	10.00	cop[tikv]		offset:0, count:10
    └─TableFullScan_12	1000.00	cop[tikv]	table:test_clustered	keep order:false

表创建完成之后,使用show create table 表名 确认一下自己创建的表具体是什么情况,然后再做判断 ,所以你把这两个表的show语句发出来,大家好做判断

show create table展示的语句和上面一样的,看起来创建中没有发生变化

CREATE TABLE `test_clustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7))

CREATE TABLE `test_nonclustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7))

1、非聚簇表不走索引时优化器的选择,如果你觉得走主键回表是最优的,你可以把两个执行计划发出来看下。
2、聚簇表不走主键索引的原因,是因为聚簇表本身就是按照主键组织数据的,表就是主键索引,主键索引就是表,正所谓“胡万就是麻匪,麻匪就是胡万”,所以聚簇表表扫描就是主键索引扫描。

1 个赞

生产数据量有多少呢,从当前测试数据看,执行计划式对的,表总共有5个列,4个都在主键,查找数据后还是要回表取全列数据哪就没有意义了,本身聚簇表的全表扫描也是根据主键,你可以查部分列式可以走索引的,还有我记得tidb的分区表都是本地索引(这个没找到资料,忘记在哪里看的了,希望大佬更正),你的日期是按日范围的那么在每个分区里的时间其实是无序的,索引索引的意义就没有

1 个赞

感谢解答,生产环境的数据量是3亿条左右,具体情况在这个问题中,这张表在order by查询时非常慢,基本无法查询,查看执行计划是扫描了全表,我理解因为聚簇索引的特性,所以TablFullScan就表示IndexFullScan,所以出现扫描全表实际上就表示走了索引

但我不太理解的是:索引应该表示数据是有序的,所以可以看到在v4.0.11中的查询计划中从7各分区中各自取了最大的前五个,然后在TiDB中进行比较

你的日期是按日范围的那么在每个分区里的时间其实是无序的,索引索引的意义就没有

v6.5.0中却直接扫描了全部的记录,那这个是否表示聚簇索引在分区内不再有序了?我理解即便是按照WEEKDAY分区,分区内还是可以保持有序的,如果不在有序的话应该需要特别说明,但是我没有看到聚簇索引的文档中提到过这一点

TopN_24 5.00  root    analytics.day:desc, offset:0, count:5
└─PartitionUnion_28 35.00 root    
  ├─Projection_44 5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_43  5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_42(Build) 5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_40  5.00  cop[tikv] table:subscription_analytics, partition:p0, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_41(Probe)  5.00  cop[tikv] table:subscription_analytics, partition:p0  keep order:false, stats:pseudo
  ├─Projection_64 5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_63  5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_62(Build) 5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_60  5.00  cop[tikv] table:subscription_analytics, partition:p1, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_61(Probe)  5.00  cop[tikv] table:subscription_analytics, partition:p1  keep order:false, stats:pseudo
  ├─Projection_84 5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_83  5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_82(Build) 5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_80  5.00  cop[tikv] table:subscription_analytics, partition:p2, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_81(Probe)  5.00  cop[tikv] table:subscription_analytics, partition:p2  keep order:false, stats:pseudo
  ├─Projection_104  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_103 5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_102(Build)  5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_100 5.00  cop[tikv] table:subscription_analytics, partition:p3, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_101(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p3  keep order:false, stats:pseudo
  ├─Projection_124  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_123 5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_122(Build)  5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_120 5.00  cop[tikv] table:subscription_analytics, partition:p4, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_121(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p4  keep order:false, stats:pseudo
  ├─Projection_144  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_143 5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_142(Build)  5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_140 5.00  cop[tikv] table:subscription_analytics, partition:p5, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_141(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p5  keep order:false, stats:pseudo
  └─Projection_164  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
    └─IndexLookUp_163 5.00  root    limit embedded(offset:0, count:5)
      ├─Limit_162(Build)  5.00  cop[tikv]   offset:0, count:5
      │ └─IndexFullScan_160 5.00  cop[tikv] table:subscription_analytics, partition:p6, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
      └─TableRowIDScan_161(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p6  keep order:false, stats:pseudo

是的,聚簇索引的TableFullScan就等于IndexFullScan,这个我现在理解了
但是我不理解的是在这个问题中,同样的分区条件下,v4.0.11可以从7个分区中获取最大的5个来比较快速输出结果,但v6.5.0就扫描了全部的记录

我理解聚簇索引也应该是有序的,这样同样可以在7个分区内获取最大的5个来进行比较,但目前来看这个失效了,是因为聚簇索引在分区内并非有序吗?

聚簇索引是有序的,由于你新集群上跑半小时都跑不完,你可以跑下SQL,然后用explain for connection看下执行计划情况,看能有啥信息不。

感谢回复,我在 索引失效 中做了一个测试,看起来聚簇索引在分区内确实是失效了,actRows和全量一致,确实是扫描了全表,而非取出前几个交给上游合并

这个是否说明聚簇索引在分区内失效了?否则如果聚簇索引在分区内有序,那actRows应该和非聚簇索引一致才对

你要用索引必须带 周的条件 比如week=3

这么对比的话我感觉执行计划确实有点问题,我用你的测试表跟数据测试了下,当查询列只要包含(partner_name)没有在主键的列,就会走全表扫描,estrow就是全分区数据,不太科学,主键就是day的排序,可以走上主键索引才对