varchar类型聚簇主键与region的start_key怎么换算

【 TiDB 使用环境】poc

【遇到的问题:问题现象及影响】
现有系统的主键都是uuid,发现最大的字母是f,最小的数字是0,我按最小值000 … 000,最大值fff … fff对表做split region,发现与region的start_key对不上。

【资源配置】

【附件:截图/日志/监控】



还有一个隐藏的table id和那个对应

聚簇主键表没有_tidb_rowid的,隐藏主键列式bigint,与region的key是一致的

你这个是varchar呀

聚簇索引表,主键作为key值的一部分,主键会按照从大到小的顺序进行排列吧, 感觉可以把uuid 提取出来按照从小到大的顺序排列起来,看看和region 的 start _key 是否一致

主键字段id 是varchar

参考文档:
https://docs.pingcap.com/zh/tidb/stable/sql-statement-split-region#split-table-region

均匀切分

由于 row_id 是整数,所以根据指定的 lower_valueupper_value 以及 region_num,可以推算出需要切分的 key。TiDB 先计算 step (step = (upper_value - lower_value)/region_num),然后在 lower_valueupper_value 之间每隔 step 区间切一次,最终切出 region_num 个 Region。

例如,对于表 t,如果想要从 minInt64~maxInt64 之间均匀切割出 16 个 Region,可以用以下语句:

SPLIT TABLE t BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 16;

该语句会把表 t 从 minInt64 到 maxInt64 之间均匀切割出 16 个 Region。如果已知主键的范围没有这么大,比如只会在 0~1000000000 之间,那可以用 0 和 1000000000 分别代替上面的 minInt64 和 maxInt64 来切分 Region。

SPLIT TABLE t BETWEEN (0) AND (1000000000) REGIONS 16;

不均匀切分

如果已知数据不是均匀分布的,比如想要 -inf ~ 10000 切一个 Region,10000 ~ 90000 切一个 Region,90000 ~ +inf 切一个 Region,可以通过手动指定点来切分 Region,示例如下:

SPLIT TABLE t BY (10000), (90000);

大家一直在强调varchar,我还以为varchar不能拆分呢。特意测试了下,可以的。

root@127.0.0.1:4000[hqh]>show create table t1;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                           |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` varchar(64) NOT NULL,
  `intkey` int(11) NOT NULL,
  `pad1` varbinary(1024) DEFAULT NULL,
  KEY `intkey` (`intkey`),
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>show table t1 regions;
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
|      1003 | t_110_    |         |      1004 |               1 | 1004  |          0 |          2700 |       1355 |                    1 |                0 |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
1 row in set (0.03 sec)
root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>SPLIT TABLE t1 BY (10000), (90000);
+--------------------+----------------------+
| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
+--------------------+----------------------+
|                  2 |                    1 |
+--------------------+----------------------+
1 row in set (0.01 sec)

root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>show table t1 regions;
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY     | END_KEY       | LEADER_ID | LEADER_STORE_ID | PEERS  | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
|    108003 | t_110_        | t_110_r_10000 |    108004 |               1 | 108004 |          0 |             0 |          0 |                    1 |                0 |
|    108005 | t_110_r_10000 | t_110_r_90000 |    108006 |               1 | 108006 |          0 |             0 |          0 |                    1 |                0 |
|      1003 | t_110_r_90000 |               |      1004 |               1 | 1004   |          0 |             0 |          0 |                    1 |                0 |
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
3 rows in set (0.01 sec)

看他的表结构 主键是聚簇索引,如果是聚簇索引的话应该是没有row_id 的吧

可以split

root@1:4000[tidbdemo]>show create table hot_word\G
*************************** 1. row ***************************
       Table: hot_word
Create Table: CREATE TABLE `hot_word` (
  `id` varchar(64) NOT NULL,
  `word` varchar(1024) NOT NULL DEFAULT '',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

root@1:4000[tidbdemo]>show table hot_word regions;
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
|         2 | t_188_    |         |         3 |               1 | 3     |          0 |             0 |          0 |                    1 |                0 |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
1 row in set (0.02 sec)

root@1:4000[tidbdemo]>SPLIT TABLE hot_word BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 4;

+--------------------+----------------------+
| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
+--------------------+----------------------+
|                  3 |                    1 |
+--------------------+----------------------+
1 row in set (2.18 sec)

root@1:4000[tidbdemo]>
root@1:4000[tidbdemo]>show table hot_word regions;
+-----------+----------------------------+----------------------------+-----------+-----------------+---------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY                  | END_KEY                    | LEADER_ID | LEADER_STORE_ID | PEERS   | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+----------------------------+----------------------------+-----------+-----------------+---------+------------+---------------+------------+----------------------+------------------+
|   2556003 | t_188_                     | t_188_r_0130377232733435f1 |   2556004 |               1 | 2556004 |          0 |             0 |          0 |                    1 |                0 |
|   2556005 | t_188_r_0130377232733435f1 | t_188_r_013335b232b33534b0 |   2556006 |               1 | 2556006 |          0 |             0 |          0 |                    1 |                0 |
|   2556007 | t_188_r_013335b232b33534b0 | t_188_r_013633f232f336336f |   2556008 |               1 | 2556008 |          0 |             0 |          0 |                    1 |                0 |
|         2 | t_188_r_013633f232f336336f |                            |         3 |               1 | 3       |          0 |             0 |          0 |                    1 |                0 |
+-----------+----------------------------+----------------------------+-----------+-----------------+---------+------------+---------------+------------+----------------------+------------------+
4 rows in set (0.10 sec)

是的,没有_tidb_rowid隐藏列

你的有_tidb_rowid列吗

CLUSTERED 表是没有_tidb_rowid的

明白了 谢谢咖啡哥

https://docs.pingcap.com/zh/tidb/stable/sql-statement-split-region#均匀切分-1


应该是跟索引regin split原理差不多