跨分区group by 有时会出现 Internal error: UnionExec chunk column count mismatch, req: 3, result: 4

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
tidb 5.1.2

【概述】 场景 + 问题概述

【背景】 做过哪些操作

【现象】 业务和数据库现象

按天分区的表 。 跨天查询的话会报错 执行计划走的 tikv ,强制tiflash 也一样报错 。
SELECT a.buttonDisplayName,COUNT(1) AS num FROM (
SELECT buttonDisplayName,userId,COUNT(1) cnt FROM tbl_new_buttonclick_202111 b WHERE createTime > ‘2021-11-10’ AND createTime < ‘2021-11-12’ AND appId=1072
GROUP BY buttonDisplayName,userId) a GROUP BY a.buttonDisplayName
错误代码: 1105
Internal error: UnionExec chunk column count mismatch, req: 3, result: 4

如果限定 一天的话不报错 。
SELECT a.appVersionName,COUNT(1) AS num FROM (
SELECT appVersionName,userId,COUNT(1) cnt FROM tbl_new_pageopen_202111 b WHERE createTime > ‘2021-11-10’ AND createTime < ‘2021-11-11’ AND appId=1072
GROUP BY appVersionName,userId) a GROUP BY a.appVersionName

【问题】 当前遇到的问题

【业务影响】

【 TiDB 版本】
tidb 5.1.2
【附件】 相关日志及监控(https://metricstool.pingcap.com/)


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

tidb日志
[2021/11/11 16:13:26.658 +08:00] [INFO] [conn.go:877] [“command dispatched failed”] [conn=845179] [connInfo=“id:845179, addr:192.168.1.254:13545 status:10, collation:utf8_general_ci, user:heming”] [command=Query] [status=“inTxn:0, autocommit:1”] [sql=“SELECT a.buttonDisplayName,COUNT(1) AS num FROM (\nSELECT buttonDisplayName,userId,COUNT(1) cnt FROM tbl_new_buttonclick_202111 b WHERE createTime > ‘2021-11-10’ AND createTime < ‘2021-11-12’ AND appId=1072 \n GROUP BY buttonDisplayName,userId) a GROUP BY a.buttonDisplayName\n LIMIT 0, 1000”] [txn_mode=PESSIMISTIC] [err=“Internal error: UnionExec chunk column count mismatch, req: 3, result: 4\ngithub.com/pingcap/tidb/executor.(*UnionExec).Next\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/executor.go:1620\ngithub.com/pingcap/tidb/executor.Next\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/executor.go:286\ngithub.com/pingcap/tidb/executor.(*HashAggExec).fetchChildData\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/aggregate.go:743\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371”]

收到,我们看一下

这个是一个 bug,已经在 v5.2.1 进行了修复(不过建议升级的话,可以考虑 v5.2.2)

好的 谢谢 ,5.1.3 会修复这个bug吗?

已经在 v5.1.2 就修复了

https://github.com/pingcap/tidb/pull/27798

描述有误, 出问题的就是5.1.2

我们确认一下

@foxchan @heming 帮提供一下 这个 SQL的 explain analyze。和表结构吧

SQL:
SELECT
a.buttonDisplayName,
COUNT(1) AS num
FROM
(SELECT
buttonDisplayName,
userId,
COUNT(1) cnt
FROM
tbl_new_buttonclick_202111 b
WHERE createTime > ‘2021-11-11’
AND createTime < ‘2021-11-13’
AND appId = 1072
GROUP BY buttonDisplayName,
userId) a
GROUP BY a.buttonDisplayName
explain analyze 报错

表结构

CREATE TABLE tbl_new_buttonclick_202111 (
id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
asId varchar(64) DEFAULT NULL,
userId bigint(20) DEFAULT NULL,
appVersionCode int(11) DEFAULT NULL COMMENT ‘APP内部版本号’,
appVersionName varchar(64) DEFAULT NULL COMMENT ‘版本名称’,
appPlatform int(11) DEFAULT NULL COMMENT ‘1 android 2ios’,
appChannel varchar(32) DEFAULT NULL COMMENT ‘推广渠道’,
registerChannel varchar(32) DEFAULT NULL COMMENT ‘用户注册的channel’,
timeStamp timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT ‘utc时间,事件发生的时间戳,毫秒的long类型’,
eventPage varchar(64) DEFAULT NULL COMMENT ‘事件发生页面’,
itemId varchar(500) DEFAULT NULL,
itemName varchar(500) DEFAULT NULL,
packageName varchar(128) DEFAULT NULL COMMENT ‘项目包名’,
appId int(11) DEFAULT NULL COMMENT ‘项目APPID’,
createTime datetime NOT NULL,
pageAsId varchar(64) DEFAULT NULL COMMENT ‘page_open和page_close用这个id串联,代表是一对事件uuid’,
eventNewPage varchar(128) DEFAULT NULL COMMENT ‘事件发生,产生的新页面名称’,
viewName varchar(128) DEFAULT NULL COMMENT ‘该页面的子模块名称,或者弹窗名称,例如主页面的种植模块’,
viewDisplayName varchar(128) DEFAULT NULL COMMENT ‘模块的中文名称’,
buttonId varchar(64) DEFAULT NULL COMMENT ‘button的唯一命名,例如 tixian’,
buttonDisplayName varchar(128) DEFAULT NULL COMMENT ‘button上的显示文案,例如 提现’,
userGold bigint(20) DEFAULT NULL COMMENT ‘用户当前的金币’,
userLevel int(11) DEFAULT NULL COMMENT ‘用户当前等级’,
userVit bigint(20) DEFAULT NULL COMMENT ‘用户当前体力值’,
userMoney varchar(32) DEFAULT NULL COMMENT ‘用户当前的红包券数量’,
awardTypeId bigint(20) DEFAULT NULL COMMENT ‘事件如果发生奖励id,没有填空’,
awardTypeName varchar(128) DEFAULT NULL COMMENT ’ 事件如果发生奖励,奖励类型,红包券,金币,体力,狗粮,化肥,种子’,
awardNumber varchar(64) DEFAULT NULL COMMENT ‘事件如果发生奖励,奖励额度’,
planId varchar(64) DEFAULT NULL,
createTimeStr varchar(32) DEFAULT NULL,
extra varchar(128) DEFAULT NULL,
first varchar(50) DEFAULT NULL,
second varchar(50) DEFAULT NULL,
third varchar(50) DEFAULT NULL,
promotion tinyint(2) DEFAULT NULL COMMENT ‘推广商: 0 头条 , 1 快手 ,2广点通’,
isNewUser int(2) DEFAULT NULL COMMENT ‘区别新老用户: 1 新 ,0 老’,
PRIMARY KEY (id,createTime) /*T![clustered_index] NONCLUSTERED */,
KEY index_cacps (createTime,appId,promotion,appChannel,second)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=131610001
PARTITION BY RANGE COLUMNS(createtime) (
PARTITION p20211101 VALUES LESS THAN (“20211102”),
PARTITION p20211102 VALUES LESS THAN (“20211103”),
PARTITION p20211103 VALUES LESS THAN (“20211104”),
PARTITION p20211104 VALUES LESS THAN (“20211105”),
PARTITION p20211105 VALUES LESS THAN (“20211106”),
PARTITION p20211106 VALUES LESS THAN (“20211107”),
PARTITION p20211107 VALUES LESS THAN (“20211108”),
PARTITION p20211108 VALUES LESS THAN (“20211109”),
PARTITION p20211109 VALUES LESS THAN (“20211110”),
PARTITION p20211110 VALUES LESS THAN (“20211111”),
PARTITION p20211111 VALUES LESS THAN (“20211112”),
PARTITION p20211112 VALUES LESS THAN (“20211113”),
PARTITION p20211113 VALUES LESS THAN (“20211114”),
PARTITION p20211114 VALUES LESS THAN (“20211115”),
PARTITION p20211115 VALUES LESS THAN (“20211116”),
PARTITION p20211116 VALUES LESS THAN (“20211117”),
PARTITION p20211117 VALUES LESS THAN (“20211118”),
PARTITION p20211118 VALUES LESS THAN (“20211119”),
PARTITION p20211119 VALUES LESS THAN (“20211120”),
PARTITION p20211120 VALUES LESS THAN (“20211121”),
PARTITION p20211121 VALUES LESS THAN (“20211122”),
PARTITION p20211122 VALUES LESS THAN (“20211123”),
PARTITION p20211123 VALUES LESS THAN (“20211124”),
PARTITION p20211124 VALUES LESS THAN (“20211125”),
PARTITION p20211125 VALUES LESS THAN (“20211126”),
PARTITION p20211126 VALUES LESS THAN (“20211127”),
PARTITION p20211127 VALUES LESS THAN (“20211128”),
PARTITION p20211128 VALUES LESS THAN (“20211129”),
PARTITION p20211129 VALUES LESS THAN (“20211130”),
PARTITION p20211130 VALUES LESS THAN (“20211201”)
)

哪给我一下 explain 的不带 analyze的结果吧

id estRows task access object operator info
Projection_13 2.00 root cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, Column#43
└─HashAgg_14 2.00 root group by:cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, funcs:count(1)->Column#43, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname)->cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname
└─HashAgg_19 2.00 root group by:cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid, funcs:firstrow(Column#45)->cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, funcs:count(1)->Column#47
└─PartitionUnion_24 2.00 root
├─HashAgg_28 1.00 root group by:cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname)->Column#45, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname)->cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid)->cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid, funcs:count(1)->Column#48
│ └─IndexLookUp_52 0.25 root
│ ├─Selection_51(Build) 0.25 cop[tikv] eq(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.appid, 1072)
│ │ └─IndexRangeScan_49 250.00 cop[tikv] table:b, partition:p20211111, index:index_cacps(createTime, appId, promotion, appChannel, second) range:(2021-11-11 00:00:00,2021-11-13 00:00:00), keep order:false, stats:pseudo
│ └─TableRowIDScan_50(Probe) 0.25 cop[tikv] table:b, partition:p20211111 keep order:false, stats:pseudo
└─HashAgg_57 1.00 root group by:cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname)->Column#45, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname)->cbd_user_behavior_analysis.tbl_new_buttonclick_202111.buttondisplayname, funcs:firstrow(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid)->cbd_user_behavior_analysis.tbl_new_buttonclick_202111.userid, funcs:count(1)->Column#49
└─IndexLookUp_81 0.25 root
├─Selection_80(Build) 0.25 cop[tikv] eq(cbd_user_behavior_analysis.tbl_new_buttonclick_202111.appid, 1072)
│ └─IndexRangeScan_78 250.00 cop[tikv] table:b, partition:p20211112, index:index_cacps(createTime, appId, promotion, appChannel, second) range:(2021-11-11 00:00:00,2021-11-13 00:00:00), keep order:false, stats:pseudo
└─TableRowIDScan_79(Probe) 0.25 cop[tikv] table:b, partition:p20211112 keep order:false, stats:pseudo

确认原因了,提交 issue了: https://github.com/pingcap/tidb/issues/29705,预计下个版本修复:5.1.3/5.2.3

好的,谢谢

:ok_hand::ok_hand: