在和您确认下场景:
- 上游 MySQL 使用 DM 同步数据到 TiDB
- 在 DM 同步的过程中,同时使用了 lightning 导入数据?
- TiFlash set replica 是在哪个阶段? lightning 导入过程中,还是导入完成?
- lightning 导入模式是什么? 哪种 backend 模式? lightning 导入的表和 DM 同步的表有冲突吗?
- 已经重新缩容扩容tiflash了吗?还有问题吗?
在和您确认下场景:
缩容扩容前,删除 replica 了吗? 即设置所有 replica 为 0. 再缩容扩容。
是的,已经删除
mark一下,我也是类似错误,在同步第一张表到tiflash的过程中异常,tiflash频繁重启。
[ERROR] [<unknown>] ["DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap*, DB::WriteCmdsView, DB::RaftCmdHeader): Code: 9008, e.displayText() = DB::Exception: Raw TiDB PK: 8000000007B4571A, Prewrite ts: 424615159928979479 can not found in default cf for key: 7480000000000010FF5D5F728000000007FFB4571A0000000000FAFA1B76C2C53FFFFE, e.what() = DB::Exception, Stack trace:\
\
0. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x367c835]\
1. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x36733c5]\
2. bin/tiflash/tiflash(DB::RegionData::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x50c) [0x734ef3c]\
3. bin/tiflash/tiflash(DB::Region::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x17) [0x732ddd7]\
4. bin/tiflash/tiflash(DB::ReadRegionCommitCache(std::shared_ptr<DB::Region> const&, bool)+0x1c1) [0x7322fa1]\
5. bin/tiflash/tiflash(DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::vector<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> >, std::allocator<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool)+0x111) [0x7326051]\
6. bin/tiflash/tiflash(DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&)+0x2c4) [0x7330ce4]\
7. bin/tiflash/tiflash(DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&)+0x5a) [0x73201aa]\
8. bin/tiflash/tiflash(DB::HandleWriteRaftCmd(DB::EngineStoreServerWrap const*, DB::WriteCmdsView, DB::RaftCmdHeader)+0x30) [0x7327a50]\
9. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe31f7) [0x7f55dbb251f7]\
10. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe9e9a) [0x7f55dbb2be9a]\
11. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbbd541) [0x7f55dbaff541]\
12. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x51d0d6) [0x7f55db45f0d6]\
13. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x5632dd) [0x7f55db4a52dd]\
14. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x596f3f) [0x7f55db4d8f3f]\
15. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x156ea8f) [0x7f55dc4b0a8f]\
16. /lib64/libpthread.so.0(+0x7dd4) [0x7f55da707dd4]\
17. /lib64/libc.so.6(clone+0x6c) [0x7f55da12f02c]\
"] [thread_id=10]
Hello ~ 你的这个问题解决了吗 ?
没解决。tiflash不停的重启,打的日志居然一堆DM同步的信息,我添加到tiflash的表又不在下面日志包含的表中。
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_syncer_checkpoint(1849) synced during sync all schemas"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_loader_checkpoint(1938) syncing during sync all schemas"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_loader_checkpoint(1938) name identical, not renaming."] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [INFO] [<unknown>] ["SchemaBuilder: Altering table dm_meta(56).xxxxxxxx_loader_checkpoint(1938)"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [INFO] [<unknown>] ["SchemaBuilder: No schema change detected for table dm_meta(56).xxxxxxxx_loader_checkpoint(1938), not altering"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_loader_checkpoint(1938) synced during sync all schemas"] [thread_id=1]
[2021/05/12 21:13:45.650 +08:00] [DEBUG] [<unknown>] ["SchemaBuilder: Table dm_meta(56).xxxxxxxx_syncer_checkpoint(1940) syncing during sync all schemas"] [thread_id
看一下 dmsg 日志,是不是 oom 了?
帮忙确认一下是不是 sync 的 tale 多了,按道理 dm_meta tiflash 不需要 sync 的。
我dm同步了上游几十个表吧。
尝试重新部署tiflash,再添加一个小表作为tiflash副本,正常。
再新增大表至tiflash后,tiflash down。
2021.05.15 09:57:22.362169 [ 1 ] <Warning> Application: The configuration "path" is deprecated. Check [storage] section for new style.
2021.05.15 09:57:25.696505 [ 21 ] <Warning> DMFile: Existing dmfile, removed :/tidb/tidb-data/tiflash-9000/data/t_4189/stable/.tmp.dmf_396
2021.05.15 09:57:25.910633 [ 24 ] <Error> DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap*, DB::WriteCmdsView, DB::RaftCmdHeader): Code: 9008, e.displayText() = DB::Exception: Raw TiDB PK: 8000000008FE945C, Prewrite ts: 424906226029821955 can not found in default cf for key: 7480000000000010FF5D5F728000000008FFFE945C0000000000FAFA1A6E09E6A3FFF6, e.what() = DB::Exception, Stack trace:
0. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x367c835]
1. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x36733c5]
2. bin/tiflash/tiflash(DB::RegionData::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x50c) [0x734ef3c]
3. bin/tiflash/tiflash(DB::Region::readDataByWriteIt(std::_Rb_tree_const_iterator<std::pair<std::pair<DB::RawTiDBPK, unsigned long> const, std::tuple<std::shared_ptr<DB::StringObject<true> const>, std::shared_ptr<DB::StringObject<false> const>, DB::RecordKVFormat::InnerDecodedWriteCFValue> > > const&, bool) const+0x17) [0x732ddd7]
4. bin/tiflash/tiflash(DB::ReadRegionCommitCache(std::shared_ptr<DB::Region> const&, bool)+0x1c1) [0x7322fa1]
5. bin/tiflash/tiflash(DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::vector<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> >, std::allocator<std::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool)+0x111) [0x7326051]
6. bin/tiflash/tiflash(DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&)+0x2c4) [0x7330ce4]
7. bin/tiflash/tiflash(DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&)+0x5a) [0x73201aa]
8. bin/tiflash/tiflash(DB::HandleWriteRaftCmd(DB::EngineStoreServerWrap const*, DB::WriteCmdsView, DB::RaftCmdHeader)+0x30) [0x7327a50]
9. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe31f7) [0x7f105d0581f7]
10. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbe9e9a) [0x7f105d05ee9a]
11. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xbbd541) [0x7f105d032541]
12. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x51d0d6) [0x7f105c9920d6]
13. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x5632dd) [0x7f105c9d82dd]
14. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x596f3f) [0x7f105ca0bf3f]
15. /tidb/tidb-deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x156ea8f) [0x7f105d9e3a8f]
16. /lib64/libpthread.so.0(+0x7dd4) [0x7f105bc3add4]
17. /lib64/libc.so.6(clone+0x6c) [0x7f105b66202c]
大表会带有什么样的查询逻辑?建议发一下集群配置、资源情况以及大表的大小,我们评估一下资源。
大表同步至tiflash,还没同步完就tiflash就挂了,根本没到执行查询之类的操作。
@dockerfile建议开一个新帖完整描述一下你的问题,如果只有 DM 同步,没有用 lighting 导入过数据,那可能与这个帖子中的问题不是同一个问题
提供下信息:
请问这个bug会在下一个版本解决吗,如果是,大概会是什么时候发布呢
这个 bug 在下一个版本会修复
该问题属于 tiflash 和 tikv 默认开启的新特性 compaction-filter 的兼容性问题,预计 5.0.2 修复。对于出现上述问题的 tiflash 节点,由于已有部分脏数据被持久化,需要缩容后重新上线。