断电重启后drainer启动不了,报binlogger: content is corruption

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
TiDB和binlog版本都是v4.0.4
【问题描述】
测试环境的TiDB集群断电重启后,其它组件都正常启动,drainer一直是down状态,查看日志drainer.log日志可以看到不断的重试启动,反复输出的日志如下:

[2021/02/01 10:49:55.789 +08:00] [INFO] [version.go:50] [“Welcome to Drainer”] [“Release Version”=v4.0.10] [“Git Commit Hash”=e28b75cac81bea82c2a89ad024d1a37bf3c9bee9] [“Build TS”=“2021-01-15 02:55:24”] [“Go Version”=go1.13] [“Go OS/Arch”=linux/amd64]
[2021/02/01 10:49:55.789 +08:00] [INFO] [main.go:46] [“start drainer…”] [config=“{"log-level":"info","node-id":"192.168.68.129:8249","addr":"http://192.168.68.129:8249","advertise-addr":"http://192.168.68.129:8249","data-dir":"/data/tidb-data/drainer-8249","detect-interval":5,"pd-urls":"http://192.168.68.129:2379,http://192.168.68.128:2379","log-file":"/data/tidb-deploy/drainer-8249/log/drainer.log","initial-commit-ts":0,"sycner":{"sql-mode":null,"ignore-txn-commit-ts":null,"ignore-schemas":"INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql","ignore-table":null,"txn-batch":20,"loopback-control":false,"sync-ddl":true,"channel-id":0,"worker-count":1,"to":{"host":"","user":"","password":"","security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null},"encrypted_password":"","sync-mode":0,"port":0,"checkpoint":{"type":"","schema":"","host":"","user":"","password":"","encrypted_password":"","port":0,"security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null}},"dir":"/data/tidb-data/drainer-8249","retention-time":0,"params":null,"merge":false,"zookeeper-addrs":"","kafka-addrs":"","kafka-version":"","kafka-max-messages":0,"kafka-client-id":"","topic-name":""},"replicate-do-table":null,"replicate-do-db":null,"db-type":"file","relay":{"log-dir":"","max-file-size":10485760},"disable-dispatch-flag":null,"enable-dispatch-flag":null,"disable-dispatch":null,"enable-dispatch":null,"safe-mode":false,"disable-detect-flag":null,"enable-detect-flag":null,"disable-detect":null,"enable-detect":null},"security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null},"synced-check-time":5,"compressor":"","EtcdTimeout":5000000000,"MetricsAddr":"","MetricsInterval":15}”]
[2021/02/01 10:49:55.789 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://192.168.68.128:2379,http://192.168.68.129:2379]”]
[2021/02/01 10:49:55.795 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://192.168.68.129:2379] [old-leader=]
[2021/02/01 10:49:55.795 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6898588239712132522]
[2021/02/01 10:49:55.795 +08:00] [INFO] [server.go:121] [“get cluster id from pd”] [id=6898588239712132522]
[2021/02/01 10:49:55.795 +08:00] [INFO] [client.go:577] [“[pd] tso dispatcher is not ready, wait for a while”]
[2021/02/01 10:49:55.846 +08:00] [WARN] [ts.go:50] [“get timestamp too slow”] [take=50.985928ms]
[2021/02/01 10:49:55.846 +08:00] [INFO] [checkpoint.go:67] [“initialize checkpoint”] [type=file] [checkpoint=422570521589186562] [version=0] [cfg=“{"CheckpointType":"file","Db":null,"Schema":"","Table":"","ClusterID":6898588239712132522,"InitialCommitTS":0,"dir":"/data/tidb-data/drainer-8249/savepoint"}”]
[2021/02/01 10:49:55.846 +08:00] [INFO] [store.go:68] [“new store”] [path=“tikv://192.168.68.128:2379,192.168.68.129:2379?disableGC=true”]
[2021/02/01 10:49:55.846 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[192.168.68.128:2379,192.168.68.129:2379]”]
[2021/02/01 10:49:55.853 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://192.168.68.129:2379] [old-leader=]
[2021/02/01 10:49:55.853 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6898588239712132522]
[2021/02/01 10:49:55.853 +08:00] [INFO] [client.go:577] [“[pd] tso dispatcher is not ready, wait for a while”]
[2021/02/01 10:49:55.904 +08:00] [WARN] [pd.go:130] [“get timestamp too slow”] [“cost time”=50.639386ms]
[2021/02/01 10:49:55.904 +08:00] [INFO] [store.go:74] [“new store with retry success”]
[2021/02/01 10:49:56.085 +08:00] [INFO] [binlogger.go:93] [“open binlogger”] [directory=/data/tidb-data/drainer-8249]
[2021/02/01 10:49:56.086 +08:00] [INFO] [file.go:146] [“ignored file in binlog dir”] [name=savepoint]
[2021/02/01 10:49:56.086 +08:00] [FATAL] [main.go:50] [“create drainer server failed”] [error=“fail to create pb dsyncer: binlogger: content is corruption”] [errorVerbose=“binlogger: content is corruption
github.com/pingcap/tidb-binlog/pkg/binlogfile.init
\t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/pkg/binlogfile/binlogger.go:35
runtime.doInit
\t/usr/local/go/src/runtime/proc.go:5222
runtime.doInit
\t/usr/local/go/src/runtime/proc.go:5217
runtime.doInit
\t/usr/local/go/src/runtime/proc.go:5217
runtime.doInit
\t/usr/local/go/src/runtime/proc.go:5217
runtime.doInit
\t/usr/local/go/src/runtime/proc.go:5217
runtime.main
\t/usr/local/go/src/runtime/proc.go:190
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357
fail to create pb dsyncer”] [stack=“main.main
\t/home/jenkins/agent/workspace/uild_binlog_multi_branch_v4.0.10/go/src/github.com/pingcap/tidb-binlog/cmd/drainer/main.go:50
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

这个问题,如果方便的话请提供下 pd leader 的日志信息,且时间包括故障点前后 ~~

另外,4.0 版本推荐使用 TiCDC,TiDB Binlog 后续不再迭代新功能 ~~

pd1.log (213.5 KB) pd2.log (213.0 KB)

谢谢!

附件是两个PD在断电重启前200行后1000行的日志,按日志显示是2021/01/28 16:01启动集群时drainer一直都是down状态。

我们用binlog目的是利用全备加binlog可以使集群恢复到任何时间点,一般我们要求是可以还原最近七天内的任何时间点,使用TiCDC好像不太适合这个场景吧?

1、看 pd 的 log 应该是部署了两个 pd 节点,如果是测试环境,不考虑高可用,可以配置一个 pd 节点(当前两个 pd 节点和一个 pd 节点高可用上等效),pd leader 的选举也需要满足多数派 ~

2、drainer 应该是输出到 file 了,请确认到相应的目录 /data/tidb-data/drainer-8249 下,ls 看下相关的文件 ~

针对这个需求,未来会有 PITR 的功能,通过 BR 以及 TiCDC 来实现,具体情况,可查看下面的 issue :

https://github.com/pingcap/br/issues/325

是输出到file, `/data/tidb-data/drainer-8249下有相关文件,如下图所示:
image

期待可用于生产的PITR功能

1、这个红框中的文件是手动创建的还是自动生成的?
2、建议将这个文件 mv 到其他目录,然后再尝试拉起下 drainer 看下 ~~

:scream::+1::+1::+1::+1::+1::+1::+1::+1::+1:
这个文件是手工生成测试归档binlog的,把这个文件移走后drainer就顺利启动了,感谢感谢!!

:handshake::handshake::handshake:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。