varchar类型聚簇主键与region的start_key怎么换算

参考文档:
https://docs.pingcap.com/zh/tidb/stable/sql-statement-split-region#split-table-region

均匀切分

由于 row_id 是整数,所以根据指定的 lower_valueupper_value 以及 region_num,可以推算出需要切分的 key。TiDB 先计算 step (step = (upper_value - lower_value)/region_num),然后在 lower_valueupper_value 之间每隔 step 区间切一次,最终切出 region_num 个 Region。

例如,对于表 t,如果想要从 minInt64~maxInt64 之间均匀切割出 16 个 Region,可以用以下语句:

SPLIT TABLE t BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 16;

该语句会把表 t 从 minInt64 到 maxInt64 之间均匀切割出 16 个 Region。如果已知主键的范围没有这么大,比如只会在 0~1000000000 之间,那可以用 0 和 1000000000 分别代替上面的 minInt64 和 maxInt64 来切分 Region。

SPLIT TABLE t BETWEEN (0) AND (1000000000) REGIONS 16;

不均匀切分

如果已知数据不是均匀分布的,比如想要 -inf ~ 10000 切一个 Region,10000 ~ 90000 切一个 Region,90000 ~ +inf 切一个 Region,可以通过手动指定点来切分 Region,示例如下:

SPLIT TABLE t BY (10000), (90000);

大家一直在强调varchar,我还以为varchar不能拆分呢。特意测试了下,可以的。

root@127.0.0.1:4000[hqh]>show create table t1;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                           |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` varchar(64) NOT NULL,
  `intkey` int(11) NOT NULL,
  `pad1` varbinary(1024) DEFAULT NULL,
  KEY `intkey` (`intkey`),
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>show table t1 regions;
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
|      1003 | t_110_    |         |      1004 |               1 | 1004  |          0 |          2700 |       1355 |                    1 |                0 |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
1 row in set (0.03 sec)
root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>SPLIT TABLE t1 BY (10000), (90000);
+--------------------+----------------------+
| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
+--------------------+----------------------+
|                  2 |                    1 |
+--------------------+----------------------+
1 row in set (0.01 sec)

root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>show table t1 regions;
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY     | END_KEY       | LEADER_ID | LEADER_STORE_ID | PEERS  | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
|    108003 | t_110_        | t_110_r_10000 |    108004 |               1 | 108004 |          0 |             0 |          0 |                    1 |                0 |
|    108005 | t_110_r_10000 | t_110_r_90000 |    108006 |               1 | 108006 |          0 |             0 |          0 |                    1 |                0 |
|      1003 | t_110_r_90000 |               |      1004 |               1 | 1004   |          0 |             0 |          0 |                    1 |                0 |
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
3 rows in set (0.01 sec)