数据库事务删除错误

Bug 反馈
数据库由5.0升级到5.2.1时出现了问题,以前版本没有这个问题。
stat_data 是一个以date字段按照月分区的分区表,应用程序根据machine_id和date删除数据时发现放在事务里无法删除,错误见下面截图,不在事务里是可以删除的。
【 Bug 的影响】

【可能的问题复现步骤】

【看到的非预期行为】

【期望看到的行为】

【相关组件及具体版本】

【其他背景信息或者截图】
image

1赞

收到,感谢反馈

1赞

@Hacker_j9odkS8a 能提供一下表结构不,或者我看看咱们的分区键是啥样的,我想模拟一下

2赞

REATE TABLE stat_data (
machine_id int(11) DEFAULT NULL COMMENT ‘所属设备Id’,
date datetime DEFAULT NULL COMMENT ‘日期’,
code int(11) DEFAULT NULL COMMENT ‘统计类型代码’,
name varchar(100) DEFAULT NULL COMMENT ‘统计类型名称’,
time datetime DEFAULT NULL COMMENT ‘统计时间’,
value decimal(20,3) DEFAULT NULL COMMENT ‘统计值’,
KEY stat_data_index1 (machine_id,date,code)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT=‘统计数据表’
PARTITION BY RANGE ( TO_DAYS(date) ) (
PARTITION p201901 VALUES LESS THAN (737456),
PARTITION p201902 VALUES LESS THAN (737484),
PARTITION p201903 VALUES LESS THAN (737515),
PARTITION p201904 VALUES LESS THAN (737545),
PARTITION p201905 VALUES LESS THAN (737576),
PARTITION p201906 VALUES LESS THAN (737606),
PARTITION p201907 VALUES LESS THAN (737637),
PARTITION p201908 VALUES LESS THAN (737668),
PARTITION p201909 VALUES LESS THAN (737698),
PARTITION p201910 VALUES LESS THAN (737729),
PARTITION p201911 VALUES LESS THAN (737759),
PARTITION p201912 VALUES LESS THAN (737790),
PARTITION p202001 VALUES LESS THAN (737821),
PARTITION p202002 VALUES LESS THAN (737850),
PARTITION p202003 VALUES LESS THAN (737881),
PARTITION p202004 VALUES LESS THAN (737911),
PARTITION p202005 VALUES LESS THAN (737942),
PARTITION p202006 VALUES LESS THAN (737972),
PARTITION p202007 VALUES LESS THAN (738003),
PARTITION p202008 VALUES LESS THAN (738034),
PARTITION p202009 VALUES LESS THAN (738064),
PARTITION p202010 VALUES LESS THAN (738095),
PARTITION p202011 VALUES LESS THAN (738125),
PARTITION p202012 VALUES LESS THAN (738156),
PARTITION p202101 VALUES LESS THAN (738187),
PARTITION p202102 VALUES LESS THAN (738215),
PARTITION p202103 VALUES LESS THAN (738246),
PARTITION p202104 VALUES LESS THAN (738276),
PARTITION p202105 VALUES LESS THAN (738307),
PARTITION p202106 VALUES LESS THAN (738337),
PARTITION p202107 VALUES LESS THAN (738368),
PARTITION p202108 VALUES LESS THAN (738399),
PARTITION p202109 VALUES LESS THAN (738429),
PARTITION p202110 VALUES LESS THAN (738460),
PARTITION p202111 VALUES LESS THAN (738490),
PARTITION p202112 VALUES LESS THAN (738521),
PARTITION p202201 VALUES LESS THAN (738552),
PARTITION p202202 VALUES LESS THAN (738580),
PARTITION p202203 VALUES LESS THAN (738611),
PARTITION p202204 VALUES LESS THAN (738641),
PARTITION p202205 VALUES LESS THAN (738672),
PARTITION p202206 VALUES LESS THAN (738702),
PARTITION p202207 VALUES LESS THAN (738733),
PARTITION p202208 VALUES LESS THAN (738764),
PARTITION p202209 VALUES LESS THAN (738794),
PARTITION p202210 VALUES LESS THAN (738825),
PARTITION p202211 VALUES LESS THAN (738855),
PARTITION p202212 VALUES LESS THAN (738886),
PARTITION p202301 VALUES LESS THAN (738917),
PARTITION p202302 VALUES LESS THAN (738945),
PARTITION p202303 VALUES LESS THAN (738976),
PARTITION p202304 VALUES LESS THAN (739006),
PARTITION p202305 VALUES LESS THAN (739037),
PARTITION p202306 VALUES LESS THAN (739067),
PARTITION p202307 VALUES LESS THAN (739098),
PARTITION p202308 VALUES LESS THAN (739129),
PARTITION p202309 VALUES LESS THAN (739159),
PARTITION p202310 VALUES LESS THAN (739190),
PARTITION p202311 VALUES LESS THAN (739220),
PARTITION p202312 VALUES LESS THAN (739251)
);

1赞

数据库升级后能回退不,主要现在是在线系统,日常使用中。或者是快速的解决方案

1赞

不能回退,我先看看啥原因吧

1赞

数据不存在时删除是不会错的,当满足删除条件才会出现错误

1赞

写入,查询都没有问题

稍等,我们先看看有没有临时解决办法,先解决问题

  1. 是否有在加列或者删除列之类的 ddl 操作在执行?

  2. 执行计划是什么样子的?

begin;
explain delete from stat_data where machine_id = 256 and date = ‘2021-09-17 00:00:00’
rollback;

  1. 收集一下 tidb log 里面报错有没有完整的错误栈信息

没有ddl操作在执行

日志的报错?
直接找找文件,或者去 dashboard 里面查找过滤下 error

[2021/09/23 11:46:26.718 +08:00] [INFO] [conn.go:995] [“command dispatched failed”] [conn=196291] [connInfo=“id:196291, addr:192.168.6.191:57850 status:1, collation:utf8_general_ci,”] [command=Query] [status=“inTxn:1, autocommit:0”] [sql=“DELETE FROM history_data WHERE ( ( machine_id = 375 )\n \n \n \n and ( tertime = ‘2021-09-23 11:45:00.0’\n \n \n \n or tertime = ‘2021-09-23 11:40:00.0’ ) )”] [txn_mode=PESSIMISTIC] [err=“EncodeRow error: data and columnID count not match 56 vs 55\ngithub.com/pingcap/tidb/tablecodec.EncodeOldRow\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/tablecodec/tablecodec.go:305\ngithub.com/pingcap/tidb/table/tables.(*TableCommon).addDeleteBinlog\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/table/tables/tables.go:1146\ngithub.com/pingcap/tidb/table/tables.(*TableCommon).RemoveRecord\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/table/tables/tables.go:1091\ngithub.com/pingcap/tidb/table/tables.(*partitionedTable).RemoveRecord\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/table/tables/partition.go:1161\ngithub.com/pingcap/tidb/executor.(*DeleteExec).removeRow\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/delete.go:207\ngithub.com/pingcap/tidb/executor.(*DeleteExec).deleteOneRow\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/delete.go:62\ngithub.com/pingcap/tidb/executor.(*DeleteExec).deleteSingleTableByChunk\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/delete.go:112\ngithub.com/pingcap/tidb/executor.(*DeleteExec).Next\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/delete.go:50\ngithub.com/pingcap/tidb/executor.Next\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/executor.go:285\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:590\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handlePessimisticDML\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:609\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:469\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:420\ngithub.com/pingcap/tidb/session.runStmt\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/session/session.go:1786\ngithub.com/pingcap/tidb/session.(*session).ExecuteStmt\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/session/session.go:1680\ngithub.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/driver_tidb.go:218\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1818\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1690\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1215\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:978\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:501\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371”]

乐观事务删除是没有问题,悲观事务删除有问题,没有分区表在事务里删除数据也没有问题,由事务,有分区删除就出现问题

我大概明白一个可能的范围了,我去构造一下case看是不是跟我想的一致的
如果是生产环境,可能需要版本回退或者暂停掉 binlog (这个错误是从写 binlog 的时候抛出来的)

https://github.com/pingcap/tidb/pull/28297
问题已查明,我们近期会尽快修复掉的
稳妥点的做法还是先回滚到旧版本