TiDB Partition Table -ERROR 8216 (HY000): Invalid auto random: column id is not the integer primary key, or table is created with alter-primary-key enabled

【TiDB 版本】: v4.0.10

【问题描述】: 由於表格數據量大,欲遷移原本表最近14天數據到Partitioned Table, 創建分區表時出現
ERROR 8216 (HY000): Invalid auto random: column id is not the integer primary key, or table is created with alter-primary-key enabled

1. 是否分區表不能使用型態是BIGINT UNSIGNED AUTO_RANDOM欄位?
(我們有嘗試不使用AUTO_RANDOM 即可建立成功)

2. 如果是,我們如何從原始表搬移最近14天數據到Partitioned table ? 且仍需要此欄位有AUTO_RANDOM特性

原始表schema:
CREATE TABLE match_outcome_odds (
id bigint(20) unsigned NOT NULL /*T![auto_rand] AUTO_RANDOM(5) */ COMMENT ‘TIDB自動亂數產生不重複key’,
source varchar(20) NOT NULL ,
market_line_id varchar(200) NOT NULL ,
selection_id varchar(200) NOT NULL ,
match_id bigint(20) unsigned NOT NULL,
market_id smallint(5) unsigned NOT NULL’,
outcome_id mediumint(8) unsigned NOT NULL,
odds decimal(6,3) DEFAULT NULL ,
probability decimal(8,7) DEFAULT NULL ,
source_status tinyint(4) NOT NULL DEFAULT ‘-99’,
status tinyint(4) NOT NULL DEFAULT ‘-99’,
is_open tinyint(4) NOT NULL DEFAULT ‘0’,
source_created_at datetime DEFAULT CURRENT_TIMESTAMP COMMENT ‘source資料產生時間點’,
source_updated_at datetime DEFAULT CURRENT_TIMESTAMP COMMENT ‘source資料更新時間點’,
created_at datetime(6) DEFAULT CURRENT_TIMESTAMP(6) COMMENT ‘資料產生時間點’,
updated_at datetime(6) DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6) COMMENT ‘資料最後修改時點’,
PRIMARY KEY (id),
KEY idx_match_id (match_id),
KEY idx_market_line_id (market_line_id),
UNIQUE KEY uk_selection_id_source (selection_id,source),
KEY idx_updated_market_id (source_updated_at,match_id,market_id,market_line_id,source)
);

分區表schema:
CREATE TABLE match_outcome_odds_history2 (
id bigint(20) unsigned NOT NULL AUTO_RANDOM COMMENT ‘TIDB自動亂數產生不重複key’,
source varchar(20) NOT NULL ,
selection_id varchar(200) NOT NULL,
match_id bigint(20) unsigned NOT NULL,
market_id smallint(5) unsigned NOT NULL,
market_line_id varchar(200) NOT NULL,
outcome_id mediumint(8) unsigned NOT NULL ,
odds decimal(6,3) DEFAULT NULL,
source_status varchar(20) NOT NULL ,
source_created_at datetime DEFAULT CURRENT_TIMESTAMP COMMENT ‘來源資料產生時間點’,
created_at datetime(6) DEFAULT CURRENT_TIMESTAMP(6) COMMENT ‘資料產生時間點’,
PRIMARY KEY (id,created_at),
KEY idx_match_id (match_id),
KEY idx_created_at (created_at)
) PARTITION BY RANGE (to_days(created_at) ) (
PARTITION p0 VALUES LESS THAN (to_days(‘2021-02-03’)),
PARTITION p1 VALUES LESS THAN (to_days(‘2021-02-04’)),
PARTITION p2 VALUES LESS THAN (to_days(‘2021-02-05’)),
PARTITION p3 VALUES LESS THAN (to_days(‘2021-02-06’)),
PARTITION p4 VALUES LESS THAN (to_days(‘2021-02-07’)),
PARTITION p5 VALUES LESS THAN (to_days(‘2021-02-08’)),
PARTITION p6 VALUES LESS THAN (to_days(‘2021-02-09’)),
PARTITION p7 VALUES LESS THAN (to_days(‘2021-02-10’)),
PARTITION p8 VALUES LESS THAN (to_days(‘2021-02-11’)),
PARTITION p9 VALUES LESS THAN (to_days(‘2021-02-12’)),
PARTITION p10 VALUES LESS THAN (to_days(‘2021-02-13’)),
PARTITION p11 VALUES LESS THAN (to_days(‘2021-02-14’)),
PARTITION p12 VALUES LESS THAN (to_days(‘2021-02-15’)),
PARTITION p13 VALUES LESS THAN (to_days(‘2021-02-16’)),
PARTITION p14 VALUES LESS THAN (to_days(‘2021-02-17’)),
PARTITION p15 VALUES LESS THAN (to_days(‘2021-02-18’)),
PARTITION p16 VALUES LESS THAN (to_days(‘2021-02-19’)),
PARTITION p17 VALUES LESS THAN (to_days(‘2021-02-20’)),
PARTITION p18 VALUES LESS THAN (to_days(‘2021-02-21’)),
PARTITION p19 VALUES LESS THAN (to_days(‘2021-02-22’)),
PARTITION p20 VALUES LESS THAN (to_days(‘2021-02-23’)),
PARTITION p21 VALUES LESS THAN (to_days(‘2021-02-24’)),
PARTITION p22 VALUES LESS THAN (to_days(‘2021-02-25’)),
PARTITION p23 VALUES LESS THAN (to_days(‘2021-02-26’)),
PARTITION p24 VALUES LESS THAN (to_days(‘2021-02-27’)),
PARTITION p25 VALUES LESS THAN (to_days(‘2021-02-28’)),
PARTITION p26 VALUES LESS THAN (MAXVALUE)
) ;

分区表可以使用 auto-random 功能,上面的报错原因应该是不满足 auto-random 的使用限制 ,表的主键是复合索引且包含了非整数列,可以参考官方文档的使用限制说明:
https://docs.pingcap.com/zh/tidb/stable/auto-random#使用限制

顧問好,

若我將主鍵中的created_at拿掉會不符合創建Partition table限制
而出現ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table’s partitioning function


在如下TiDB官方文件,若要切分Partition, 是的確要將要切割的欄位created_at放進PK做複合主鍵的

目前在分区表上使用 auto-random 确实有限制,这个还在优化中;使用 auto-random 本身也是为了打散热点,可以先考虑使用 SHARD_ROW_ID_BITS 和 PRE_SPLIT_REGIONS 来替代下。

1赞

顧問 你好,

感謝你的回覆!

由於上述提及AUTO_RANDOM與分區表限制互相牴觸關係

目前有想到兩個方案 如下:
想再請教一下顧問以下關於 表設計上是否產生熱點效能上的問題:

  1. 使用id(AUTO_INCREMENT)搭配 SHARD_ROW_ID_BITS = 4創建分區表
    => 若AUTO_INCREMENT與SHARD_ROW_ID_BITS = 4混用 則會有熱點產生嗎?

CREATE TABLE IF NOT EXISTS match_outcome_odds_history (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
source varchar(20) NOT NULL,
selection_id varchar(200) NOT NULL,
match_id BIGINT UNSIGNED NOT NULL,
market_id SMALLINT UNSIGNED NOT NULL,
market_line_id varchar(200) NOT NULL,
outcome_id VARCHAR(50) NOT NULL,
odds DECIMAL(6,3),
source_status tinyint(4) NOT NULL DEFAULT -99,
status TINYINT NOT NULL DEFAULT -99,
is_open TINYINT NOT NULL DEFAULT 0,
source_created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
created_at DATETIME(6) DEFAULT CURRENT_TIMESTAMP(6),
PRIMARY KEY(id,created_at),
KEY idx_match_id (match_id),
KEY idx_market_line_id (market_line_id),
KEY idx_created_market_line_id (source_created_at, match_id, market_id, market_line_id, source)
) SHARD_ROW_ID_BITS = 4
PARTITION BY RANGE (to_days(created_at) ) (
PARTITION p0 VALUES LESS THAN (to_days(‘2021-01-20’)),
PARTITION p1 VALUES LESS THAN (to_days(‘2021-01-21’)),
PARTITION p2 VALUES LESS THAN (MAXVALUE)
) ;

  1. 大表不建立AUTO_INCREMENT PK,僅用SHARD_ROW_ID_BITS=4
    (1)這樣確認資料會分散 但沒有PK情況下 會對TiDB效能上造成影響嗎?(若查尋條件都會加index)
    (2)TiDB一定需要使用PK嗎?

CREATE TABLE IF NOT EXISTS match_outcome_odds_history (
source varchar(20) NOT NULL,
selection_id varchar(200) NOT NULL,
match_id BIGINT UNSIGNED NOT NULL,
market_id SMALLINT UNSIGNED NOT NULL,
market_line_id varchar(200) NOT NULL,
outcome_id VARCHAR(50) NOT NULL,
odds DECIMAL(6,3) ,
source_status tinyint(4) NOT NULL DEFAULT -99,
status TINYINT NOT NULL DEFAULT -99 ,
is_open TINYINT NOT NULL DEFAULT 0 ,
source_created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
created_at DATETIME(6) DEFAULT CURRENT_TIMESTAMP(6) ,
KEY idx_match_id (match_id),
KEY idx_market_line_id (market_line_id),
KEY idx_created_market_line_id (source_created_at, match_id, market_id, market_line_id, source)
) SHARD_ROW_ID_BITS = 4
PARTITION BY RANGE (to_days(created_at) ) (
PARTITION p0 VALUES LESS THAN (to_days(‘2021-01-20’)),
PARTITION p1 VALUES LESS THAN (to_days(‘2021-01-21’)),
PARTITION p2 VALUES LESS THAN (MAXVALUE)
) ;

@Ricklee
顧問好,

為了分析Partition表熱點狀況 我們創建了以下兩個表做測試
分別塞入100萬筆數據

(1) 使用id(AUTO_INCREMENT)搭配 SHARD_ROW_ID_BITS = 4創建分區表
CREATE TABLE t
(
id BIGINT UNSIGNED AUTO_INCREMENT ,
created_at datetime(6) DEFAULT CURRENT_TIMESTAMP(6) ,
PRIMARY KEY(id,created_at)
) SHARD_ROW_ID_BITS = 4
PARTITION BY RANGE (to_days(created_at) ) (
PARTITION p0 VALUES LESS THAN (to_days(‘2021-02-19’)),
PARTITION p1 VALUES LESS THAN (to_days(‘2021-02-20’)),
PARTITION p2 VALUES LESS THAN (to_days(‘2021-02-21’)),
PARTITION p3 VALUES LESS THAN (to_days(‘2021-02-22’)),
PARTITION p4 VALUES LESS THAN (to_days(‘2021-02-23’)),
PARTITION p5 VALUES LESS THAN (to_days(‘2021-02-24’)),
PARTITION p6 VALUES LESS THAN (to_days(‘2021-02-25’)),
PARTITION p7 VALUES LESS THAN (to_days(‘2021-02-26’)),
PARTITION p8 VALUES LESS THAN (to_days(‘2021-02-27’)),
PARTITION p9 VALUES LESS THAN (to_days(‘2021-02-28’)),
PARTITION p10 VALUES LESS THAN (MAXVALUE)
) ;
(2) 使用id(AUTO_INCREMENT) ,『不』使用SHARD_ROW_ID_BITS = 4,創建分區表
CREATE TABLE t2
(
id BIGINT UNSIGNED AUTO_INCREMENT ,
created_at datetime(6) DEFAULT CURRENT_TIMESTAMP(6) ,
PRIMARY KEY(id,created_at)
)
PARTITION BY RANGE (to_days(created_at) ) (
PARTITION p0 VALUES LESS THAN (to_days(‘2021-02-19’)),
PARTITION p1 VALUES LESS THAN (to_days(‘2021-02-20’)),
PARTITION p2 VALUES LESS THAN (to_days(‘2021-02-21’)),
PARTITION p3 VALUES LESS THAN (to_days(‘2021-02-22’)),
PARTITION p4 VALUES LESS THAN (to_days(‘2021-02-23’)),
PARTITION p5 VALUES LESS THAN (to_days(‘2021-02-24’)),
PARTITION p6 VALUES LESS THAN (to_days(‘2021-02-25’)),
PARTITION p7 VALUES LESS THAN (to_days(‘2021-02-26’)),
PARTITION p8 VALUES LESS THAN (to_days(‘2021-02-27’)),
PARTITION p9 VALUES LESS THAN (to_days(‘2021-02-28’)),
PARTITION p10 VALUES LESS THAN (MAXVALUE)
) ;

查看熱力圖,如下:


有以下疑問想請教顧問:
(1) 熱力圖上的table_48、table_61 我們怎分辨是分別對應到原的t表 還是 t2表 ?
(2) 原本預期上述第二種情況會產生『梯形』熱點圖,但卻『沒有』,是因為有切割Partition的原故嗎?
(3) 是否有指令可查出TiDB有熱點狀況的『表格清單』?

(1) 正常情况下选中热力图的一个区域后会显示数据库名和表名,看上面显示的并不是你创建的表名,麻烦刷新下试下;
(2)新建分区表后会为每个分区分配一个 region ,初始化灌数据时由于 region 来不及分裂出现热点这个是符合预期的,可以在建表时使用 PRE_SPLIT_REGIONS 将 region 打散,参考下:https://docs.pingcap.com/zh/tidb/stable/high-concurrency-best-practices#更复杂的热点问题

1赞

感謝顧問!

我們已經採取顧問建議,在建立表時先宣告SHARD_ROW_ID_BITS = 4 PRE_SPLIT_REGIONS=4
預計讓數據寫一陣子後,觀察『熱點』情況,再回報上來。

好的,有问题可以再反馈下。

我理解重点在于 PRIMARY KEY( id , created_at ) 这种主键,row_id的Shard是可以生效的吗?

引用原文如下:

如果表没有主键或者主键不是整数类型,而且用户也不想自己生成一个随机分布的主键 ID 的话,TiDB 内部有一个隐式的 _tidb_rowid 列作为行 ID。

@Ricklee 请教一下~

上面这种主键的表使用 SHARD_ROW_ID_BITS 可以打散 rowid ;但如果主键就是 primary key(id) 且 id 为整数类型,默认情况下 tidb 会把主键值作为 rowid ,这时就无法使用 SHARD_ROW_ID_BITS 了。