tiflash同步副本过程中出现报错,无法启动

在和您确认下场景:

  1. 上游 MySQL 使用 DM 同步数据到 TiDB
  2. 在 DM 同步的过程中,同时使用了 lightning 导入数据?
  3. TiFlash set replica 是在哪个阶段? lightning 导入过程中,还是导入完成?
  4. lightning 导入模式是什么? 哪种 backend 模式? lightning 导入的表和 DM 同步的表有冲突吗?
  5. 已经重新缩容扩容tiflash了吗?还有问题吗?
  1. 上游 MySQL 使用 DM 同步数据到 TiDB
  2. 使用lightning 全量导入完成后,再使用dm做增量同步
  3. TiFlash set replica 是 在lightning 导入完成后,使用dm同步时;
  4. task-mode: incremental
  5. 重新缩扩容后,还是不能同步导致tiflash奔溃的表

缩容扩容前,删除 replica 了吗? 即设置所有 replica 为 0. 再缩容扩容。

是的,已经删除

mark一下,我也是类似错误,在同步第一张表到tiflash的过程中异常,tiflash频繁重启。

[ERROR] [<unknown>] ["DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap*, DB::WriteCmdsView, DB::RaftCmdHeader): Code: 9008, e.displayText() = DB::Exception: Raw TiDB PK: 8000000007B4571A, Prewrite ts: 424615159928979479 can not found in default cf for key: 7480000000000010FF5D5F728000000007FFB4571A0000000000FAFA1B76C2C53FFFFE, e.what() = DB::Exception, Stack trace:\
\
0. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x367c835]\
1. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x36733c5]\
2. bin/tiflash/tiflash(DB::RegionData::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x50c) [0x734ef3c]\
3. bin/tiflash/tiflash(DB::Region::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x17) [0x732ddd7]\
4. bin/tiflash/tiflash(DB::ReadRegionCommitCache(std::shared_ptr<DB::Region> const&, bool)+0x1c1) [0x7322fa1]\
5. bin/tiflash/tiflash(DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::vector<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> >, std::allocator<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool)+0x111) [0x7326051]\
6. bin/tiflash/tiflash(DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&)+0x2c4) [0x7330ce4]\
7. bin/tiflash/tiflash(DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&)+0x5a) [0x73201aa]\
8. bin/tiflash/tiflash(DB::HandleWriteRaftCmd(DB::EngineStoreServerWrap const*, DB::WriteCmdsView, DB::RaftCmdHeader)+0x30) [0x7327a50]\
9. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe31f7) [0x7f55dbb251f7]\
10. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe9e9a) [0x7f55dbb2be9a]\
11. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbbd541) [0x7f55dbaff541]\
12. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x51d0d6) [0x7f55db45f0d6]\
13. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x5632dd) [0x7f55db4a52dd]\
14. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x596f3f) [0x7f55db4d8f3f]\
15. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x156ea8f) [0x7f55dc4b0a8f]\
16. /lib64/libpthread.so.0(+0x7dd4) [0x7f55da707dd4]\
17. /lib64/libc.so.6(clone+0x6c) [0x7f55da12f02c]\
"] [thread_id=10]

Hello ~ 你的这个问题解决了吗 ?

没解决。tiflash不停的重启,打的日志居然一堆DM同步的信息,我添加到tiflash的表又不在下面日志包含的表中。

[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_syncer_checkpoint(1849) synced during sync all schemas"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_loader_checkpoint(1938) syncing during sync all schemas"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_loader_checkpoint(1938) name identical, not renaming."] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [INFO] [<unknown>] ["SchemaBuilder: Altering table dm_meta(56).xxxxxxxx_loader_checkpoint(1938)"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [INFO] [<unknown>] ["SchemaBuilder: No schema change detected for table dm_meta(56).xxxxxxxx_loader_checkpoint(1938), not altering"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_loader_checkpoint(1938) synced during sync all schemas"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_syncer_checkpoint(1940) syncing during sync all schemas"] [thread_id

看一下 dmsg 日志,是不是 oom 了?

并没有OOM

帮忙确认一下是不是 sync 的 tale 多了,按道理 dm_meta tiflash 不需要 sync 的。

我dm同步了上游几十个表吧。

尝试重新部署tiflash,再添加一个小表作为tiflash副本,正常。
再新增大表至tiflash后,tiflash down。

2021.05.15 09:57:22.362169 [ 1 ] <Warning> Application: The configuration "path" is deprecated. Check [storage] section for new style.
2021.05.15 09:57:25.696505 [ 21 ] <Warning> DMFile: Existing dmfile, removed :/tidb/tidb-data/tiflash-9000/data/t_4189/stable/.tmp.dmf_396
2021.05.15 09:57:25.910633 [ 24 ] <Error> DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap*, DB::WriteCmdsView, DB::RaftCmdHeader): Code: 9008, e.displayText() = DB::Exception: Raw TiDB PK: 8000000008FE945C, Prewrite ts: 424906226029821955 can not found in default cf for key: 7480000000000010FF5D5F728000000008FFFE945C0000000000FAFA1A6E09E6A3FFF6, e.what() = DB::Exception, Stack trace:

0. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x367c835]
1. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x36733c5]
2. bin/tiflash/tiflash(DB::RegionData::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x50c) [0x734ef3c]
3. bin/tiflash/tiflash(DB::Region::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x17) [0x732ddd7]
4. bin/tiflash/tiflash(DB::ReadRegionCommitCache(std::shared_ptr<DB::Region> const&, bool)+0x1c1) [0x7322fa1]
5. bin/tiflash/tiflash(DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::vector<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> >, std::allocator<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool)+0x111) [0x7326051]
6. bin/tiflash/tiflash(DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&)+0x2c4) [0x7330ce4]
7. bin/tiflash/tiflash(DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&)+0x5a) [0x73201aa]
8. bin/tiflash/tiflash(DB::HandleWriteRaftCmd(DB::EngineStoreServerWrap const*, DB::WriteCmdsView, DB::RaftCmdHeader)+0x30) [0x7327a50]
9. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe31f7) [0x7f105d0581f7]
10. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe9e9a) [0x7f105d05ee9a]
11. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbbd541) [0x7f105d032541]
12. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x51d0d6) [0x7f105c9920d6]
13. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x5632dd) [0x7f105c9d82dd]
14. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x596f3f) [0x7f105ca0bf3f]
15. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x156ea8f) [0x7f105d9e3a8f]
16. /lib64/libpthread.so.0(+0x7dd4) [0x7f105bc3add4]
17. /lib64/libc.so.6(clone+0x6c) [0x7f105b66202c]

大表会带有什么样的查询逻辑?建议发一下集群配置、资源情况以及大表的大小,我们评估一下资源。

大表同步至tiflash,还没同步完就tiflash就挂了,根本没到执行查询之类的操作。

@dockerfile建议开一个新帖完整描述一下你的问题,如果只有 DM 同步,没有用 lighting 导入过数据,那可能与这个帖子中的问题不是同一个问题
提供下信息:

  1. 版本
  2. tiflash 日志
  3. DM 同步之前是否有用 lightning 导入数据?是通过什么方式导入的?

请问这个bug会在下一个版本解决吗,如果是,大概会是什么时候发布呢

这个 bug 在下一个版本会修复

该问题属于 tiflash 和 tikv 默认开启的新特性 compaction-filter 的兼容性问题,预计 5.0.2 修复。对于出现上述问题的 tiflash 节点,由于已有部分脏数据被持久化,需要缩容后重新上线。