drainer 进程异常退出报错errorVerbose=“not found table id

咸鱼超人 · 2020 年7 月 14 日 08:31

为提高效率，提问时请提供以下信息，问题描述清晰可优先响应。

【TiDB 版本】：3.0.9
【问题描述】：

3台pump
2台drainer
drainer1：同步id-mapping.t_uid_relation表到下游kafka的topic t_uid_relation中
drainer2：同步qianqian_test.qianqian_t_uid_relation同步到下游kafka的test-tidb-qianqian中

该pump+drainer 配置正常上游tidb写数据能实时同步到下游对应topic中

同步开启一段时间后因业务需要在qianqian_test中新建qianqian_t_uid_relation_copy1表并且对这张表写数据这个时候两个drainer进程都挂了并且报错：

[2020/07/14 13:35:57.627 +08:00] [INFO] [async_producer.go:717] ["[sarama] producer/broker/4 input chan closed\
"]
[2020/07/14 13:35:57.627 +08:00] [INFO] [async_producer.go:801] ["[sarama] producer/broker/4 shut down\
"]
[2020/07/14 13:35:57.627 +08:00] [INFO] [broker.go:253] ["[sarama] Closed connection to broker kafka:9092\
"]
[2020/07/14 13:35:57.627 +08:00] [ERROR] [server.go:279] [“syncer exited abnormal”] [error=“filterTable failed: not found table id: 4608”] [errorVerbose=“not found table id: 4608\[ngithub.com/pingcap/tidb-binlog/drainer.filterTable\
\t/home/jenkins/agent/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:446\
github.com/pingcap/tidb-binlog/drainer.(*Syncer](http://ngithub.com/pingcap/tidb-binlog/drainer.filterTable%5Cn%5Ct/home/jenkins/agent/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:446%5Cngithub.com/pingcap/tidb-binlog/drainer.(*Syncer)).run\
\t/home/jenkins/agent/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:353\[ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer](http://ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer)).Start\
\t/home/jenkins/agent/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:123\[ngithub.com/pingcap/tidb-binlog/drainer.(*Server](http://ngithub.com/pingcap/tidb-binlog/drainer.(*Server)).Start.func4\
\t/home/jenkins/agent/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:278\[ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup](http://ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup)).start.func1\
\t/home/jenkins/agent/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:69\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357\
filterTable failed”]
[2020/07/14 13:35:57.627 +08:00] [INFO] [util.go:66] [Exit] [name=syncer]
[2020/07/14 13:35:57.627 +08:00] [INFO] [server.go:415] [“begin to close drainer server”]
[2020/07/14 13:35:57.627 +08:00] [INFO] [broker.go:253] ["[sarama] Closed connection to broker kafka:9092\
"]

4608这个tableid就是刚才新建的表

pump配置：

cat /data/pump/conf/pump.toml 
# pump Configuration

gc = 7
heartbeat-interval = 2

[security]
ssl-ca = ""
ssl-cert = ""
ssl-key = ""

[storage]

drainer1配置：

# drainer Configuration.

# the interval time (in seconds) of detect pumps' status
detect-interval = 10

# Use the specified compressor algorithm to compress payload between pump and drainer
# compressor = "gzip"

# syncer Configuration.
[syncer]
# Assume the upstream sql-mode.
# If this is setted , drainer will use the sql-mode to parse DDL statment
# sql-mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION

# disable sync these schema
ignore-schemas = "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql"

# number of binlog events in a transaction batch
txn-batch = 20

# work count to execute binlogs
# if the latency between drainer and downstream(mysql or tidb) are too high, you might want to increase this
# to get higher throughput by higher concurrent write to the downstream
worker-count = 16

# whether to disable the SQL feature of splitting a single binlog event.
# If it is set to "true", binlog events are restored to a single transaction for synchronization based on the order of binlogs.
# If the downstream service is MySQL, set it to "False".
disable-dispatch = false

# safe mode will split update to delete and insert
safe-mode = false

# downstream storage, equal to --dest-db-type
# valid values are "mysql", "file", "tidb", "flash", "kafka"
db-type = "kafka"

# ignore syncing the txn with specified commit ts to downstream
ignore-txn-commit-ts = []

# replicate-do-db priority over replicate-do-table if have same db name
# and we support regex expression , start with '~' declare use regex expression.
#replicate-do-db = ["qianqian_test"]
[[syncer.replicate-do-table]]
db-name ="id-mapping"
tbl-name = "t_id_mapping"

[[syncer.replicate-do-table]]
db-name ="id-mapping"
tbl-name = "t_uid_relation"


# disable sync these table
# [[syncer.ignore-table]]
# db-name = "test"
# tbl-name = "log"

# the downstream mysql protocol database
#[syncer.to]
#host = "127.0.0.1"
#user = "root"
#password = ""
#port = 3306

# Uncomment this if you want to use file as db-type.
# [syncer.to]
# dir = "data.drainer"

# when db-type is kafka, you can uncomment this to config the down stream kafka, it will be the globle config kafka default
[syncer.to]
# only need config one of zookeeper-addrs and kafka-addrs, will get kafka address if zookeeper-addrs is configed.
zookeeper-addrs = "zk1:2181,zk2:2181,zk3:2181"
kafka-addrs = "kafka:9092"
kafka-version = "1.0.1"
kafka-max-messages = 1024

# the topic name drainer will push msg, the default name is <cluster-id>_obinlog
# be careful don't use the same name if run multi drainer instances
topic-name = "topic t_uid_relation"

drainer2配置：

# drainer Configuration.

# the interval time (in seconds) of detect pumps' status
detect-interval = 10

# Use the specified compressor algorithm to compress payload between pump and drainer
# compressor = "gzip"

# syncer Configuration.
[syncer]
# Assume the upstream sql-mode.
# If this is setted , drainer will use the sql-mode to parse DDL statment
# sql-mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION

# disable sync these schema
ignore-schemas = "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql"

# number of binlog events in a transaction batch
txn-batch = 20

# work count to execute binlogs
# if the latency between drainer and downstream(mysql or tidb) are too high, you might want to increase this
# to get higher throughput by higher concurrent write to the downstream
worker-count = 16

# whether to disable the SQL feature of splitting a single binlog event.
# If it is set to "true", binlog events are restored to a single transaction for synchronization based on the order of binlogs.
# If the downstream service is MySQL, set it to "False".
disable-dispatch = false

# safe mode will split update to delete and insert
safe-mode = false

# downstream storage, equal to --dest-db-type
# valid values are "mysql", "file", "tidb", "flash", "kafka"
db-type = "kafka"

# ignore syncing the txn with specified commit ts to downstream
ignore-txn-commit-ts = []

# replicate-do-db priority over replicate-do-table if have same db name
# and we support regex expression , start with '~' declare use regex expression.
#replicate-do-db = ["qianqian_test"]
[[syncer.replicate-do-table]]
db-name ="qianqian_test"
tbl-name = "qianqian_t_uid_relation"

#[[syncer.replicate-do-table]]
#db-name ="id-mapping"
#tbl-name = "t_uid_relation"


# disable sync these table
# [[syncer.ignore-table]]
# db-name = "test"
# tbl-name = "log"

# the downstream mysql protocol database
#[syncer.to]
#host = "127.0.0.1"
#user = "root"
#password = ""
#port = 3306

# Uncomment this if you want to use file as db-type.
# [syncer.to]
# dir = "data.drainer"

# when db-type is kafka, you can uncomment this to config the down stream kafka, it will be the globle config kafka default
[syncer.to]
# only need config one of zookeeper-addrs and kafka-addrs, will get kafka address if zookeeper-addrs is configed.
zookeeper-addrs = "zk1:2181,zk2:2181,zk3:2181"
kafka-addrs = "kafka:9092"
kafka-version = "1.0.1"
kafka-max-messages = 1024


# the topic name drainer will push msg, the default name is <cluster-id>_obinlog
# be careful don't use the same name if run multi drainer instances
topic-name = "test-tidb-qianqian"

若提问为性能优化、故障排查类问题，请下载脚本运行。终端输出的打印结果，请务必全选并复制粘贴上传。

yilong · 2020 年7 月 14 日 09:34

感谢反馈，已经在查看，会尽快回复，多谢。

WangXiangUSTC · 2020 年7 月 14 日 09:45

麻烦先确认下有没有 TiDB 没有开启 binlog，这样的话执行 DDL 的操作可能没有写 binlog

咸鱼超人 · 2020 年7 月 14 日 09:51

这边tidb的配置是这样的
5台tidb-server节点
其中2台未开启binlog
其他3台开启了binlog并使用haproxy做分发

做上述测试时都是使用的haproxy连接tidb~

WangXiangUSTC · 2020 年7 月 14 日 09:55

这样还是有可能有问题的，因为 TiDB 集群只有一个 ddl owner，由这个 owner 来执行 ddl，并且写 binlog。

你这个场景可能会把 ddl owner 分配到未开启 binlog 的 TiDB 上，这样就没有产生 ddl 的 binlog，导致 drainer 遇到这个表相关的 binlog 都会报错。

可以确认一下 ddl owner 是否在未开启 binlog 的 TiDB 上
尝试重启 Drainer

咸鱼超人 · 2020 年7 月 14 日 10:13

ddl的owner是tidb内部自己选举的吗启动的tidb-server都有可能被选举为ddl的owner是吗

咸鱼超人 · 2020 年7 月 14 日 10:24

我们这边都开启binlog 再做一次数据校准谢谢你~~非常感谢如果有tidbserver关于内部选举ddl owner的相关文档可以推送给我么再次感谢

WangXiangUSTC · 2020 年7 月 14 日 11:54

是的，每个 TiDB 都有可能成为 ddl owner，可以看这个博客 https://pingcap.com/blog-cn/tidb-source-code-reading-17/#ddl-in-tidb

system · 2022 年10 月 31 日 19:06

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。

drainer 进程异常退出 报错errorVerbose=“not found table id

drainer 进程异常退出报错errorVerbose=“not found table id