参数shard_row_id_bits和pre_split_regions配合使用,如何均匀分配region存储大小

现在有这么一个问题,我按照官方文档提示, 使用带有 shard_row_id_bits 的表时,如果希望建表时就均匀切分 Region,可以考虑配合 pre_split_regions 一起使用。 1、创建表指定SHARD_ROW_ID_BITS=5 PRE_SPLIT_REGIONS=5,也就是切分成了16个行数据region,SHARD_ROW_ID_BITS=5 为随机打散id。(前提我的表没有主键,有两个唯一索引)

2、然后往这个表里面插入五十万数据,观察region存储情况。发现第一个region存储数据量有点大,这个如何能均匀分配呢》 region具体情况如下: region id: 17385 cf default region size: 71.733 MB cf write region size: 28.358 MB cf lock region size: 0 B

region id: 17293 cf default region size: 80.891 MB cf write region size: 828920 B cf lock region size: 0 B

region id: 17381 cf default region size: 113.025 MB cf write region size: 1.105 MB cf lock region size: 0 B

region id: 17297 cf default region size: 75.509 MB cf write region size: 773766 B cf lock region size: 0 B

region id: 17397 cf default region size: 95.067 MB cf write region size: 974188 B cf lock region size: 0 B

region id: 17301 cf default region size: 57.557 MB cf write region size: 589812 B cf lock region size: 0 B

region id: 17373 cf default region size: 104.045 MB cf write region size: 1.017 MB cf lock region size: 0 B

region id: 17305 cf default region size: 48.579 MB cf write region size: 497812 B cf lock region size: 0 B

region id: 17401 cf default region size: 95.067 MB cf write region size: 974188 B cf lock region size: 0 B

region id: 17365 cf default region size: 71.823 MB cf write region size: 736000 B cf lock region size: 0 B

region id: 17309 cf default region size: 48.579 MB cf write region size: 497812 B cf lock region size: 0 B

region id: 17313 cf default region size: 125.690 MB cf write region size: 1.228 MB cf lock region size: 0 B

region id: 17317 cf default region size: 107.735 MB cf write region size: 1.053 MB cf lock region size: 0 B

region id: 17369 cf default region size: 122.001 MB cf write region size: 1.192 MB cf lock region size: 0 B

region id: 17321 cf default region size: 75.518 MB cf write region size: 773858 B cf lock region size: 0 B

region id: 17389 cf default region size: 113.023 MB cf write region size: 1.105 MB cf lock region size: 0 B

region id: 17325 cf default region size: 66.535 MB cf write region size: 681812 B cf lock region size: 0 B

region id: 17333 cf default region size: 71.823 MB cf write region size: 736000 B cf lock region size: 0 B

region id: 17337 cf default region size: 107.735 MB cf write region size: 1.053 MB cf lock region size: 0 B

region id: 17393 cf default region size: 95.067 MB cf write region size: 974188 B cf lock region size: 0 B

region id: 17341 cf default region size: 48.579 MB cf write region size: 497812 B cf lock region size: 0 B

region id: 17405 cf default region size: 95.067 MB cf write region size: 974188 B cf lock region size: 0 B

region id: 17345 cf default region size: 48.579 MB cf write region size: 497812 B cf lock region size: 0 B

region id: 17349 cf default region size: 107.733 MB cf write region size: 1.053 MB cf lock region size: 0 B

region id: 17377 cf default region size: 113.024 MB cf write region size: 1.105 MB cf lock region size: 0 B

region id: 17161 cf default region size: 84.486 MB cf write region size: 865766 B cf lock region size: 0 B

可能是数据分布不够均匀导致,可以看下这个文档,对 region 按照适合的方式 split

https://pingcap.com/docs-cn/stable/reference/sql/statements/split-region/#split-region-使用文档

问题一:这个文档我们都看过好多遍了,一直比较困惑的一点是:三个Tikv-》对应的region分布不一样呢?

问题二:若创建一个表,指定8个或者16个region,这些region是咋分配到这三台Tikv上?是否有参数能指定一下到不同的tikv上呢?还是随机就到这三台tikv上呢?

目前 region 调度是由 PD 完成的,PD 会根据调度的策略进行分配

https://pingcap.com/docs-cn/stable/reference/best-practices/pd-scheduling/

如果需要手动调度,可以通过 pd-ctl 完成调度

https://pingcap.com/docs-cn/stable/reference/tools/pd-control/#scheduler-show--add--remove