tidb binlog总是报tidb_TiDB_binlog_error_total,并且tidb_Drainer_server_is_down挂掉

Drainer报错信息如下:
[2021/04/03 08:46:27.641 +08:00] [ERROR] [main.go:69] [“start drainer server failed”] [error=“filterTable failed: not found table id: 570233”] [errorVerbose=“not found table id: 570233\ngithub.com/pingcap/tidb-binlog/drainer.filterTable\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:523\ github.com/pingcap/tidb-binlog/drainer.(*Syncer).run\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:377\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).Start\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:132\ngithub.com/pingcap/tidb-binlog/drainer.(*Server).Start.func4\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:290\ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup).start.func1\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:76\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357\ filterTable failed”]
[2021/04/03 08:46:44.206 +08:00] [INFO] [version.go:50] [“Welcome to Drainer”] [“Release Version”=v4.0.10] [“Git Commit Hash”=e28b75cac81bea82c2a89ad024d1a37bf3c9bee9] [“Build TS”=“2021-01-15 02:55:24”] [“Go Version”=go1.13] [“Go OS/Arch”=linux/amd64]
[2021/04/03 08:46:44.206 +08:00] [INFO] [main.go:46] [“start drainer…”] [config="{“log-level”:“info”,“node-id”:“172.22.88.220:8249”,“addr”:“http://172.22.88.220:8249”,“advertise-addr”:“http://172.22.88.220:8249”,“data-dir”:"/data/tidb-data/drainer-8249",“detect-interval”:5,“pd-urls”:“http://172.22.88.200:2379,http://172.22.88.100:2379,http://172.22.88.220:2379”,“log-file”:"/home/tidb/tidb-deploy/drainer-8249/log/drainer.log",“initial-commit-ts”:0,“sycner”:{“sql-mode”:null,“ignore-txn-commit-ts”:null,“ignore-schemas”:“INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,METRICS_SCHEMA,cmc_ssbi_dashboard,cmc_ssbi_db,cmc_ssbi_metadata,cmc_ssbi_rt”,“ignore-table”:null,“txn-batch”:20,“loopback-control”:false,“sync-ddl”:true,“channel-id”:0,“worker-count”:1,“to”:{“host”:"",“user”:"",“password”:"",“security”:{“ssl-ca”:"",“ssl-cert”:"",“ssl-key”:"",“cert-allowed-cn”:null},“encrypted_password”:"",“sync-mode”:0,“port”:0,“checkpoint”:{“type”:"",“schema”:"",“host”:"",“user”:"",“password”:"",“encrypted_password”:"",“port”:0,“security”:{“ssl-ca”:"",“ssl-cert”:"",“ssl-key”:"",“cert-allowed-cn”:null}},“dir”:"/data/tidb-binlog",“retention-time”:0,“params”:null,“merge”:false,“zookeeper-addrs”:"",“kafka-addrs”:"",“kafka-version”:"",“kafka-max-messages”:0,“kafka-client-id”:"",“topic-name”:""},“replicate-do-table”:null,“replicate-do-db”:null,“db-type”:“file”,“relay”:{“log-dir”:"",“max-file-size”:10485760},“disable-dispatch-flag”:null,“enable-dispatch-flag”:null,“disable-dispatch”:null,“enable-dispatch”:null,“safe-mode”:false,“disable-detect-flag”:null,“enable-detect-flag”:null,“disable-detect”:null,“enable-detect”:null},“security”:{“ssl-ca”:"",“ssl-cert”:"",“ssl-key”:"",“cert-allowed-cn”:null},“synced-check-time”:5,“compressor”:"",“EtcdTimeout”:5000000000,“MetricsAddr”:"",“MetricsInterval”:15}"]
[2021/04/03 08:46:44.207 +08:00] [INFO] [client.go:193] ["[pd] create pd client with endpoints"] [pd-address="[http://172.22.88.200:2379,http://172.22.88.100:2379,http://172.22.88.220:2379]"]
[2021/04/03 08:46:44.210 +08:00] [INFO] [base_client.go:308] ["[pd] switch leader"] [new-leader=http://172.22.88.100:2379] [old-leader=]

[2021/04/03 08:49:07.684 +08:00] [ERROR] [server.go:291] [“syncer exited abnormal”] [error=“filterTable failed: not found table id: 570289”] [errorVerbose=“not found table id: 570289\ngithub.com/pingcap/tidb-binlog/drainer.filterTable\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:523\ github.com/pingcap/tidb-binlog/drainer.(*Syncer).run\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:377\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).Start\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:132\ngithub.com/pingcap/tidb-binlog/drainer.(*Server).Start.func4\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:290\ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup).start.func1\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:76\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357\ filterTable failed”]
[2021/04/03 08:49:07.684 +08:00] [INFO] [util.go:73] [Exit] [name=syncer]
[2021/04/03 08:49:07.684 +08:00] [INFO] [server.go:465] [“begin to close drainer server”]
[2021/04/03 08:49:07.686 +08:00] [INFO] [server.go:430] [“has already update status”] [id=172.22.88.220:8249]
[2021/04/03 08:49:07.686 +08:00] [INFO] [server.go:469] [“commit status done”]
[2021/04/03 08:49:07.686 +08:00] [INFO] [pump.go:77] [“pump is closing”] [id=172.22.88.200:8250]
[2021/04/03 08:49:07.686 +08:00] [INFO] [collector.go:135] [“publishBinlogs quit”]
[2021/04/03 08:49:07.686 +08:00] [INFO] [pump.go:77] [“pump is closing”] [id=172.22.88.100:8250]
[2021/04/03 08:49:07.686 +08:00] [INFO] [pump.go:77] [“pump is closing”] [id=172.22.88.220:8250]
[2021/04/03 08:49:07.686 +08:00] [INFO] [util.go:73] [Exit] [name=collect]
[2021/04/03 08:49:07.686 +08:00] [INFO] [util.go:73] [Exit] [name=heartbeat]
[2021/04/03 08:49:07.686 +08:00] [INFO] [server.go:484] [“drainer exit”]
[2021/04/03 08:49:07.686 +08:00] [INFO] [server.go:325] [“drainer http server stopped”] [error=“mux: listener closed”]
[2021/04/03 08:49:07.686 +08:00] [ERROR] [main.go:69] [“start drainer server failed”] [error=“filterTable failed: not found table id: 570289”] [errorVerbose=“not found table id: 570289\ngithub.com/pingcap/tidb-binlog/drainer.filterTable\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:523\ github.com/pingcap/tidb-binlog/drainer.(*Syncer).run\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:377\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).Start\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:132\ngithub.com/pingcap/tidb-binlog/drainer.(*Server).Start.func4\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:290\ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup).start.func1\ \t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:76\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357\ filterTable failed”]

你好,麻烦多提供一些信息,感谢。

1.上下游数据库的类型以及版本
2.tidb-binlog 的版本和 drainer 的配置文件
3.上游数据库查一下 table_id 是 570233 的表是哪个 ? 另外如果下游是 tidb 数据库的话,麻烦也查一下。

查询命令:

select * from information_schema.tables where tidb_table_id='570233' \G

1、TiDB-v4.0.10 TiDB Serve,没有下游,只用来记录binlog
2、tidb-binlog版本怎么查看
drainer_servers:

  • host: xxxxx
    config:
    syncer.db-type: “file”
    syncer.to.dir: “/data/tidb-binlog”
    syncer.ignore-schemas: INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,METRICS_SCHEMA,cmc_ssbi_dashboard,xebest_db,yitao_rt
    3、mysql> select * from information_schema.tables where tidb_table_id=‘570233’ \G;
    Empty set (0.33 sec)

ERROR:
No query specified

好的,麻烦按照这个帖子中检查下是否所有的 tidb-server 都打开了 binlog ?
部署方式是怎样的,如果是 tiup 部署,tidb-binlog 的版本和 tidb 集群是一致的。如果是 ansible或者 binary 方式部署,则进入到对应的 bin 目录下 -V 查看版本。

采用tiup进行的部署
1 台 drainer
3 台 pump
2 台 tidb
pump配置如下:
pump_servers:

  • host:
    config:
    storage.sync-log: false
    storage.kv_chan_cap: 10485760
    gc: 7
  • host:
    config:
    storage.sync-log: false
    storage.kv_chan_cap: 10485760
    gc: 7
  • host:
    config:
    storage.sync-log: false
    storage.kv_chan_cap: 10485760
    gc: 7

drainer配置:
drainer_servers:

  • host:
    config:
    syncer.db-type: “file”
    syncer.to.dir: “/data/tidb-binlog”
    syncer.ignore-schemas: INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,METRICS_SCHEMA,db_name

tidb binlog配置:
binlog.enable: true
binlog.ignore-error: true

[tidb@hcmc-bi-tidb00 log]$ curl http://127.0.0.1:10080/info/all
{
“servers_num”: 2,
“owner_id”: “f2fa5c87-e1e7-40d3-91dd-89783af2ba9a”,
“is_all_server_version_consistent”: true,
“all_servers_info”: {
“9e3a0448-9c5f-466c-9231-c18d1f00e8f4”: {
“version”: “5.7.25-TiDB-v4.0.10”,
“git_hash”: “dbade8cda4c5a329037746e171449e0a1dfdb8b3”,
“ddl_id”: “9e3a0448-9c5f-466c-9231-c18d1f00e8f4”,
“ip”: “xxx1”,
“listening_port”: 3306,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Skipping”,
“start_timestamp”: 1617098056
},
“f2fa5c87-e1e7-40d3-91dd-89783af2ba9a”: {
“version”: “5.7.25-TiDB-v4.0.10”,
“git_hash”: “dbade8cda4c5a329037746e171449e0a1dfdb8b3”,
“ddl_id”: “f2fa5c87-e1e7-40d3-91dd-89783af2ba9a”,
“ip”: “xxx2”,
“listening_port”: 3306,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Skipping”,
“start_timestamp”: 1617098056
}
}

解决方案参考官网:
https://docs.pingcap.com/zh/tidb/stable/tidb-binlog-faq#主从同步开启-ignore-error-触发-critical-error-后如何重新部署

这还是没解决根本问题,具体是什么原因导致的,每次都要这样操作,重新部署?已经出现好几次了

原因是因为 tidb 配置中 ignore-error 为开启的状态。TiDB 配置开启 ignore-error 写 binlog 失败后触发 critical error 告警,后续都不会再写 binlog,状态会卡住在 skipping

写失败的原因是 没有找到表,上游当前确实也没有查到这个表的信息,无法继续追踪。如果这个问题出现了多次,建议重新同步后 drainer 开启 debug 日志。

另外 4.0 的版本推荐使用的是 ticdc

业务是有drop table的操作,我这边只是记录binlog,并没有下游同步操作,怎么忽略这种错误呢

没有找到表,触发critical error 告警,导致不在写binlog?

其它时间也有drop的情况,也不是每次都会报错,不清楚这个具体什么原因导致,还是没有解决具体问题,不能每次都得按文档进行重新部署是吧

理解你的顾虑,这边建议你可以使用 ticdc 或者重新部署把 drainer 的 debug 日志打开。再次遇到问题时可以提供 debug 日志我们协助看下。

开启debug日志,是设置 log-level 为debug?

是的,drainer 的日志级别设置为 debug

drainer_servers:

  • host:
    ssh_port: 22
    port: 8249
    deploy_dir: /home/tidb/tidb-deploy/drainer-8249
    data_dir: /home/tidb/tidb-data/drainer-8249
    config:
    log_level: debug 是在这里加入吗,我这里加入重启集群报错。。。。。。
    syncer.db-type: file
    syncer.to.dir: /home/tidb/tidb-binlog
    arch: amd64
    os: linux

tiup 修改完配置之后,需要执行 reload 操作。

什么情况会出现,找不到表:“syncer exited abnormal”] [error=“filterTable failed: not found table id: 570289”] [errorVerbose=“not found table id:

目前已知的是 多个 tidb-server ,其中有 tidb-server 未开启 binlog 的情况下,可能会把 ddl owner 分配到未开启 binlog 的 TiDB 上,这样就没有产生 ddl 的 binlog,导致 drainer 遇到这个表相关的 binlog 都会报错。

其他情况需要具体分析。

多个tidb-server都开启了binlog