tiflash 类型转换错误 无法启动

【 Bug 的影响】
tiflash 无法启动

【可能的问题复现步骤】
修改表字段类型
无符号 unsigned int(11)改成有有符号int(11)

【看到的非预期行为】
tiflash无限报错,无法启动

【期望看到的行为】
正常启动

【相关组件及具体版本】
v5.2.0

tiflash日志

[2021/09/09 14:29:06.964 +08:00] [ERROR] [<unknown>] ["DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap*, DB::WriteCmdsView, DB::RaftCmdHeader): Code: 48, e.displayText() = DB::Exception: Reading mismatch data type pack. Cast from UInt32 to Int32 is NOT supported!: (while reading from DTFile: /data/tidb-data/tiflash-9000/data/t_87694/stable/dmf_3840), e.what() = DB::Exception, Stack trace:\n\n0. bin/tiflash/tiflash(StackTrace::StackTrace()+0x16) [0x3921fe6]\n1. bin/tiflash/tiflash(DB::DM::convertColumnByColumnDefineIfNeed(std::shared_ptr<DB::IDataType const> const&, COWPtr<DB::IColumn>::immutable_ptr<DB::IColumn>&&, DB::DM::ColumnDefine const&)+0x1ed) [0x7e2b4cd]\n2. bin/tiflash/tiflash(DB::DM::DMFileReader::read()+0x37d) [0x794450d]\n3. bin/tiflash/tiflash(DB::DM::DMFileBlockInputStream::read()+0x15) [0x78ec0b5]\n4. bin/tiflash/tiflash(DB::DM::ConcatSkippableBlockInputStream::read()+0xa8) [0x790f668]\n5. bin/tiflash/tiflash(DB::DM::DeltaMergeBlockInputStream<DB::DM::DeltaValueReader, DB::DM::DTCompactedEntries<55ul, 20ul, 3ul>::Iterator, false>::fillStableBlockIfNeeded()+0x90) [0x7903990]\n6. bin/tiflash/tiflash(DB::DM::DeltaMergeBlockInputStream<DB::DM::DeltaValueReader, DB::DM::DTCompactedEntries<55ul, 20ul, 3ul>::Iterator, false>::read()+0x680) [0x7904380]\n7. bin/tiflash/tiflash(DB::DM::DMRowKeyFilterBlockInputStream<true>::read()+0x5b) [0x78f476b]\n8. bin/tiflash/tiflash(DB::DM::PKSquashingBlockInputStream<false>::read()+0x6f4) [0x7901954]\n9. bin/tiflash/tiflash(DB::DM::DMVersionFilterBlockInputStream<1>::initNextBlock()+0x31) [0x7e1ee51]\n10. bin/tiflash/tiflash(DB::DM::DMVersionFilterBlockInputStream<1>::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool)+0x1bb8) [0x7e21028]\n11. bin/tiflash/tiflash(DB::DM::DMVersionFilterBlockInputStream<1>::read()+0x18) [0x78eb7e8]\n12. bin/tiflash/tiflash(DB::DM::writeIntoNewDMFile(DB::DM::DMContext&, std::shared_ptr<std::vector<DB::DM::ColumnDefine, std::allocator<DB::DM::ColumnDefine> > > const&, std::shared_ptr<DB::IBlockInputStream> const&, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::DM::DMFileWriter::Flags)+0x1ab) [0x78d9fdb]\n13. bin/tiflash/tiflash(DB::DM::createNewStable(DB::DM::DMContext&, std::shared_ptr<std::vector<DB::DM::ColumnDefine, std::allocator<DB::DM::ColumnDefine> > > const&, std::shared_ptr<DB::IBlockInputStream> const&, unsigned long, DB::DM::WriteBatches&)+0xbc) [0x78da6dc]\n14. bin/tiflash/tiflash(DB::DM::Segment::prepareMergeDelta(DB::DM::DMContext&, std::shared_ptr<std::vector<DB::DM::ColumnDefine, std::allocator<DB::DM::ColumnDefine> > > const&, std::shared_ptr<DB::DM::SegmentSnapshot> const&, DB::DM::WriteBatches&) const+0xd0) [0x78e45c0]\n15. bin/tiflash/tiflash(DB::DM::DeltaMergeStore::segmentMergeDelta(DB::DM::DMContext&, std::shared_ptr<DB::DM::Segment> const&, DB::DM::DeltaMergeStore::TaskRunThread)+0x4af) [0x78b1c1f]\n16. bin/tiflash/tiflash(DB::DM::DeltaMergeStore::checkSegmentUpdate(std::shared_ptr<DB::DM::DMContext> const&, std::shared_ptr<DB::DM::Segment> const&, DB::DM::DeltaMergeStore::ThreadType)+0xb59) [0x78b3509]\n17. bin/tiflash/tiflash(DB::DM::DeltaMergeStore::waitForWrite(std::shared_ptr<DB::DM::DMContext> const&, std::shared_ptr<DB::DM::Segment> const&)+0x193) [0x78b51f3]\n18. bin/tiflash/tiflash(DB::DM::DeltaMergeStore::write(DB::Context const&, DB::Settings const&, DB::Block&&)+0x731) [0x78b5ac1]\n19. bin/tiflash/tiflash() [0x7a7a36e]\n20. bin/tiflash/tiflash() [0x7a7ae37]\n21. bin/tiflash/tiflash(DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::vector<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> >, std::allocator<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool)+0x84) [0x7a7b164]\n22. bin/tiflash/tiflash(DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&)+0x30a) [0x7a88a5a]\n23. bin/tiflash/tiflash(DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&)+0x5b) [0x7a7320b]\n24. bin/tiflash/tiflash(HandleWriteRaftCmd+0x31) [0x7a7ea31]\n25. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1389469) [0x7f8e721d5469]\n26. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x138f564) [0x7f8e721db564]\n27. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1364e2f) [0x7f8e721b0e2f]\n28. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xd3968b) [0x7f8e71b8568b]\n29. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xd7f642) [0x7f8e71bcb642]\n30. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xdad9f7) [0x7f8e71bf99f7]\n31. /data/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x14ef91b) [0x7f8e7233b91b]\n"] [thread_id=45]

麻烦说明清楚一下:
版本
出现的问题
可复现操作
脚本
日志

又完善了下

如果是 bug,建议你提供一下复现步骤、包括完整表结构、DDL 语句,以及 TiFlash 副本设置情况。另外麻烦提供完整的 TiFlash 的日志,以上信息麻烦提供文本信息,重要信息可脱敏。

看你上面的描述简单测试了下,我这边测试了一下是正常的。

MySQL [test]> CREATE TABLE t ( a INT UNSIGNED primary key, b INT UNSIGNED );
Query OK, 0 rows affected (0.12 sec)

MySQL [test]> alter table t SET TIFLASH REPLICA 1;
Query OK, 0 rows affected (0.10 sec)

MySQL [test]> 
MySQL [test]> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test         | t          |       53 |             1 |                 |         1 |        1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.01 sec)

MySQL [test]> alter table t modify b int; 
Query OK, 0 rows affected (2.86 sec)

MySQL [test]> select * from t;           
+---+------+
| a | b    |
+---+------+
| 1 |    1 |
| 2 |    2 |
+---+------+
2 rows in set (0.01 sec)

MySQL [test]> explain select * from t;  
+------------------------+---------+--------------+---------------+------------------------------------------------+
| id                     | estRows | task         | access object | operator info                                  |
+------------------------+---------+--------------+---------------+------------------------------------------------+
| TableReader_7          | 2.00    | root         |               | data:TableRangeScan_6                          |
| └─TableRangeScan_6     | 2.00    | cop[tiflash] | table:t       | range:[0,+inf], keep order:false, stats:pseudo |
+------------------------+---------+--------------+---------------+------------------------------------------------+
2 rows in set (0.00 sec)

MySQL [test]> show create table t ;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                       |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` int(10) unsigned NOT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

是你测试的这个逻辑。

106240	data_test		   create table	queueing	87461	106239	0	2021-09-09 13:49:54	2021-09-09 13:49:55	cancelled
106238	data_test	dy_daily	modify column	public	87461	87694	0	2021-09-09 13:01:27	2021-09-09 13:49:55	synced
106237	data_test	dy_auto	create table	public	87461	106236	0	2021-09-09 11:04:38	2021-09-09 13:49:54	synced
106235	data_test	dy_daily	modify column	public	87461	87694	243936702	2021-09-09 10:07:04	2021-09-09 13:49:54	synced

这个是DDL JOBS。
job id 106235 就是从无符号修改成有符号的语句

ALTER TABLE `dy_daily` 
MODIFY COLUMN `new_digg_count` int(11) NOT NULL DEFAULT 0

106237 是排在后面的建表语句

106238 106240 好像是系统执行的。

这个问题有点奇怪 但是很确定是有符号无符号引起的。 因为报错里明确说的是new_digg_count字段无法转换。

我现在解决方案是 先把副本数值为0 ,然后进入对应tifalsh机器把/data/tidb-data/tiflash-9000/data/t_87694/stable/dmf_3840目录把报错文件给删了。 暂时能让tiflash起来了。 但是不知道会不会有什么问题。 目前我自己也无法复现。

请问下 dy_daily 表设置了几个 tiflash 副本数?
另外可以拿一下 curl http://{TiDBIP}:10080/ddl/history ,过滤下 dy_daily 的相关 ddl 操作。
再拿一下完整的 tiflash 日志,包括错误日志等。

因我这里测试没复现问题,并且您这边环境中也没复现,不一定能准确定位问题,可以把相关内容提供下,我们排查看看。

如果日志信息已删除,就没办法定位了,后续再有问题可以再开新帖。

副本数是两个
ddl执行完下面的sql开始报错,然后就把副本数设置成0,物理删除掉了tiflash报错的表,在就没有dy_daily相关的ddl了

ALTER TABLE `dy_daily` 
MODIFY COLUMN `new_digg_count` int(11) NOT NULL DEFAULT 0

error 日志。
logs-tiflash_172_18_150_166_3930.zip (14.7 KB) logs-tiflash_172_18_150_166_3930.zip

原表结构也需要拿一下。还是按照上面说的提供内容吧。多谢

1赞

尽最大可能提供了信息。

我用阿里云盘分享了「ddl.txt」,你可以不限速下载:rocket:
复制这段内容打开「阿里云盘」App 即可获取
链接:https://alywp.net/UOkHk

我用阿里云盘分享了「tiflash.log」,你可以不限速下载:rocket:
复制这段内容打开「阿里云盘」App 即可获取
链接:https://alywp.net/4LJKcD

CREATE TABLE `dy_daily` (
  `video_id` bigint(20) unsigned NOT NULL,
  `date` date NOT NULL,
  `new_digg_count` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`video_id`,`date`) /*T![clustered_index] NONCLUSTERED */,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

好的,辛苦
另外如果是线上环境,建议升级到 5.2.1,修复了 5.2.0 的几个 bug,可以看下官网 release note 信息 https://docs.pingcap.com/zh/tidb/stable/release-5.2.1

1赞