各位大神,drainer异常退出,无法启动了

  • 【TiDB 版本】:4.0

  • 【问题描述】:drainer异常退出,无法启动,报错如下:

          [2020/03/10 09:49:51.965 +08:00] [INFO] [version.go:50] ["Welcome to Drainer"] ["Release Version"=v3.0.4-40-g27030dd] ["Git Commit Hash"=27030ddae9ec0e42f371694435245e9533e64dec] ["Build TS"="2019-12-19 01:42:51"] ["Go Version"=go1.13] ["Go OS/Arch"=linux/amd64]
      [2020/03/10 09:49:51.965 +08:00] [INFO] [main.go:46] ["start drainer..."] [config="{\"log-level\":\"info\",\"node-id\":\"\",\"addr\":\"http://192.168.240.108:8249\",\"advertise-addr\":\"http://192.168.240.108:8249\",\"data-dir\":\"/home/tidb/deploy/data.drainer\",\"detect-interval\":10,\"pd-urls\":\"http://192.168.240.108:2379\",\"log-file\":\"/home/tidb/deploy/log/drainer.log\",\"initial-commit-ts\":415163446916808717,\"sycner\":{\"sql-mode\":null,\"ignore-txn-commit-ts\":[],\"ignore-schemas\":\"INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,UserRadarSTD\",\"ignore-table\":null,\"txn-batch\":20,\"worker-count\":1,\"to\":{\"host\":\"\",\"user\":\"\",\"password\":\"\",\"port\":0,\"checkpoint\":{\"type\":\"\",\"schema\":\"\",\"host\":\"\",\"user\":\"\",\"password\":\"\",\"port\":0},\"dir\":\"\",\"zookeeper-addrs\":\"192.168.42.137:2181\",\"kafka-addrs\":\"192.168.42.137:9092\",\"kafka-version\":\"0.11.0.0\",\"kafka-max-messages\":1024,\"topic-name\":\"radartags\"},\"replicate-do-table\":null,\"replicate-do-db\":[\"userradartags\"],\"db-type\":\"kafka\",\"relay-log-dir\":\"\",\"relay-log-size\":10485760,\"enable-dispatch\":false,\"safe-mode\":false,\"enable-detect\":false},\"security\":{\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\"},\"synced-check-time\":5,\"compressor\":\"\",\"EtcdTimeout\":5000000000,\"MetricsAddr\":\"\",\"MetricsInterval\":15}"]
      [2020/03/10 09:49:51.966 +08:00] [INFO] [client.go:147] ["[pd] create pd client with endpoints"] [pd-address="[http://192.168.240.108:2379]"]
      [2020/03/10 09:49:51.968 +08:00] [INFO] [client.go:255] ["[pd] switch leader"] [new-leader=http://192.168.240.108:2379] [old-leader=]
      [2020/03/10 09:49:51.969 +08:00] [INFO] [client.go:166] ["[pd] init cluster id"] [cluster-id=6773524898251248932]
      [2020/03/10 09:49:51.969 +08:00] [INFO] [server.go:111] ["get cluster id from pd"] [id=6773524898251248932]
      [2020/03/10 09:49:51.969 +08:00] [INFO] [checkpoint.go:61] ["initialize checkpoint"] [type=file] [checkpoint=415183991815274497] [cfg="{\"CheckpointType\":\"file\",\"Db\":null,\"Schema\":\"\",\"Table\":\"\",\"ClusterID\":6773524898251248932,\"InitialCommitTS\":415163446916808717,\"dir\":\"/home/tidb/deploy/data.drainer/savepoint\"}"]
      [2020/03/10 09:49:51.969 +08:00] [INFO] [store.go:68] ["new store"] [path="tikv://192.168.240.108:2379?disableGC=true"]
      [2020/03/10 09:49:51.969 +08:00] [INFO] [client.go:147] ["[pd] create pd client with endpoints"] [pd-address="[192.168.240.108:2379]"]
      [2020/03/10 09:49:51.971 +08:00] [INFO] [client.go:255] ["[pd] switch leader"] [new-leader=http://192.168.240.108:2379] [old-leader=]
      [2020/03/10 09:49:51.971 +08:00] [INFO] [client.go:166] ["[pd] init cluster id"] [cluster-id=6773524898251248932]
      [2020/03/10 09:49:51.972 +08:00] [INFO] [store.go:74] ["new store with retry success"]
      [2020/03/10 09:49:52.082 +08:00] [INFO] [client.go:127] ["[sarama] Initializing new client"]
      [2020/03/10 09:49:52.082 +08:00] [INFO] [client.go:174] ["[sarama] Successfully initialized new client"]
      [2020/03/10 09:49:52.082 +08:00] [INFO] [store.go:68] ["new store"] [path="tikv://192.168.240.108:2379?disableGC=true"]
      [2020/03/10 09:49:52.082 +08:00] [INFO] [client.go:147] ["[pd] create pd client with endpoints"] [pd-address="[192.168.240.108:2379]"]
      [2020/03/10 09:49:52.084 +08:00] [INFO] [client.go:255] ["[pd] switch leader"] [new-leader=http://192.168.240.108:2379] [old-leader=]
      [2020/03/10 09:49:52.085 +08:00] [INFO] [client.go:166] ["[pd] init cluster id"] [cluster-id=6773524898251248932]
      [2020/03/10 09:49:52.087 +08:00] [INFO] [store.go:74] ["new store with retry success"]
      [2020/03/10 09:49:52.089 +08:00] [INFO] [server.go:255] ["register success"] ["drainer node id"=tidb-108:8249]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [server.go:319] ["start to server request"] [addr=http://192.168.240.108:8249]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [merge.go:222] ["merger add source"] ["source id"=tidb-108:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [merge.go:222] ["merger add source"] ["source id"=tidb-109:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [merge.go:222] ["merger add source"] ["source id"=tidb-110:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [merge.go:222] ["merger add source"] ["source id"=tidb-111:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [pump.go:134] ["pump create pull binlogs client"] [id=tidb-109:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [pump.go:134] ["pump create pull binlogs client"] [id=tidb-110:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [pump.go:134] ["pump create pull binlogs client"] [id=tidb-108:8250]
      [2020/03/10 09:49:52.090 +08:00] [INFO] [pump.go:134] ["pump create pull binlogs client"] [id=tidb-111:8250]
      [2020/03/10 09:49:53.091 +08:00] [INFO] [collector.go:284] ["get ddl job"] [job="ID:507, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:18.359 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.092 +08:00] [WARN] [schema.go:243] ["ddl job schema version is less than current version, skip this ddl job"] [job="ID:508, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:18.459 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [currentVersion=0]
      [2020/03/10 09:49:53.092 +08:00] [WARN] [schema.go:243] ["ddl job schema version is less than current version, skip this ddl job"] [job="ID:510, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:24.209 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [currentVersion=0]
      [2020/03/10 09:49:53.092 +08:00] [WARN] [schema.go:243] ["ddl job schema version is less than current version, skip this ddl job"] [job="ID:514, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 09:31:45.509 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [currentVersion=0]
      [2020/03/10 09:49:53.092 +08:00] [WARN] [schema.go:243] ["ddl job schema version is less than current version, skip this ddl job"] [job="ID:509, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:24.209 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [currentVersion=0]
      [2020/03/10 09:49:53.092 +08:00] [INFO] [schema.go:436] ["Finished dropping column"] [job="ID:266, Type:drop column, State:synced, SchemaState:none, SchemaID:153, TableID:258, RowCount:0, ArgLen:0, start time: 2020-02-13 13:18:02.104 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.092 +08:00] [INFO] [schema.go:436] ["Finished dropping column"] [job="ID:267, Type:drop column, State:synced, SchemaState:none, SchemaID:153, TableID:258, RowCount:0, ArgLen:0, start time: 2020-02-13 13:18:07.754 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.092 +08:00] [INFO] [schema.go:436] ["Finished dropping column"] [job="ID:402, Type:drop column, State:synced, SchemaState:none, SchemaID:153, TableID:161, RowCount:0, ArgLen:0, start time: 2020-03-03 13:44:13.248 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [syncer.go:403] ["skip ddl"] [schema=UserRadarSTD] [table=] [sql="ALTER DATABASE UserRadarSTD\r\n    COLLATE utf8_general_ci"] ["commit ts"=415183991828381708]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [collector.go:284] ["get ddl job"] [job="ID:508, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:18.459 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [async_producer.go:1015] ["[sarama] Producer shutting down."]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [client.go:227] ["[sarama] Closing Client"]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [syncer.go:246] ["handleSuccess quit"]
      [2020/03/10 09:49:53.093 +08:00] [ERROR] [server.go:279] ["syncer exited abnormal"] [error="version: 0 not found"] [errorVerbose="version: 0 not found\ngithub.com/pingcap/errors.NotFoundf\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:117\ngithub.com/pingcap/tidb-binlog/drainer.(*Schema).getSchemaTableAndDelete\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/schema.go:458\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).run\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:396\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).Start\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:126\ngithub.com/pingcap/tidb-binlog/drainer.(*Server).Start.func4\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:278\ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup).start.func1\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:69\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [util.go:66] [Exit] [name=syncer]
      [2020/03/10 09:49:53.093 +08:00] [INFO] [server.go:444] ["begin to close drainer server"]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [collector.go:284] ["get ddl job"] [job="ID:509, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:24.209 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [server.go:409] ["has already update status"] [id=tidb-108:8249]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [server.go:448] ["commit status done"]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [pump.go:73] ["pump is closing"] [id=tidb-108:8250]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [pump.go:73] ["pump is closing"] [id=tidb-109:8250]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [pump.go:73] ["pump is closing"] [id=tidb-110:8250]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [pump.go:73] ["pump is closing"] [id=tidb-111:8250]
      [2020/03/10 09:49:53.094 +08:00] [INFO] [util.go:66] [Exit] [name=heartbeat]
      [2020/03/10 09:49:53.095 +08:00] [INFO] [collector.go:284] ["get ddl job"] [job="ID:510, Type:modify schema charset and collate, State:synced, SchemaState:public, SchemaID:153, TableID:0, RowCount:0, ArgLen:0, start time: 2020-03-10 08:47:24.209 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
      [2020/03/10 09:49:53.095 +08:00] [INFO] [collector.go:130] ["publishBinlogs quit"]
      [2020/03/10 09:49:53.095 +08:00] [INFO] [util.go:66] [Exit] [name=collect]
      [2020/03/10 09:49:53.095 +08:00] [INFO] [merge.go:245] ["Merger is closed successfully"]
      [2020/03/10 09:49:53.095 +08:00] [INFO] [main.go:73] ["drainer exit"]
      [2020/03/10 09:49:53.095 +08:00] [INFO] [server.go:463] ["drainer exit"]

您好,

  1. binlogctl 命令查看 Pump 状态是否正常,以及是否全部非 offline 状态的 Pump 都在正常运行。
  2. 提供下 {tidb-ansible}/conf/drainer.toml 文件
    3.提供下完整的 drainer 日志

感谢

pump都正常:drainer.toml (2.6 KB) drainer.log (11.2 KB)

> resources/bin/binlogctl -pd-urls=http://192.168.240.108:2379 -cmd pumps
[2020/03/10 11:07:29.249 +08:00] [INFO] [nodes.go:52] ["query node"] [type=pump] [node="{NodeID: tidb-109:8250, Addr: 192.168.240.109:8250, State: online, MaxCommitTS: 415186195987562497, UpdateTime: 2020-03-10 11:07:28 +0800 CST}"]
[2020/03/10 11:07:29.249 +08:00] [INFO] [nodes.go:52] ["query node"] [type=pump] [node="{NodeID: tidb-110:8250, Addr: 192.168.240.110:8250, State: online, MaxCommitTS: 415186195581239300, UpdateTime: 2020-03-10 11:07:27 +0800 CST}"]
[2020/03/10 11:07:29.249 +08:00] [INFO] [nodes.go:52] ["query node"] [type=pump] [node="{NodeID: tidb-111:8250, Addr: 192.168.240.111:8250, State: online, MaxCommitTS: 415186195659882497, UpdateTime: 2020-03-10 11:07:27 +0800 CST}"]
[2020/03/10 11:07:29.249 +08:00] [INFO] [nodes.go:52] ["query node"] [type=pump] [node="{NodeID: tidb-108:8250, Addr: 192.168.240.108:8250, State: online, MaxCommitTS: 415186196144848900, UpdateTime: 2020-03-10 11:07:27 +0800 CST}"]

麻烦查看一下历史的 DDL ,有哪个 DDL 的 version:0 返回一下结果。

curl 127.0.0.1:10080/ddl/history

version = 0的 是一个创建视图的语句,但是这个视图现在是正常的: version_0_ddl.json (10.7 KB)

我看了后面的ddl,create View的很多都是verision = 0

可以开 Debug 日志,上传一下吗?我看下日志

drainer的日志吗? 请问开启debug在哪里配置?

drain config 里面有 log-level 字段

你好,这个是开启debug后的drainer日志drainer_debug.log (135.5 KB)

如果不好解决,有什么办法我可以配置一下,跳过所有的历史数据从最新的数据开始同步吗?因为现在一直起不来,谢谢

请问下 tidb 使用的是什么版本,pump 和 drainer 最好使用相同的版本

TiDB是4.0版,我是ansible安装的,版本应该是兼容匹配的

可以尝试跳过下:

> 1. 清理掉下游数据包括 checkpoint 表tidb_binlog.checkpoint

你好,下游是kafka的checkpoint存放在哪里?怎么清理?

tidb 具体是哪个版本的呢,我看了下 drainer 的版本号,应该不是 4.0 版本的

[tidb@tidb-108 bin]$ ./tidb-server -V
Release Version: v4.0.0-alpha-1219-g510c36d4c
Git Commit Hash: 510c36d4cb776ea47034b9ef2985beeb91093481
Git Branch: master
UTC Build Time: 2019-12-22 01:17:12
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

好的,已配置跳过历史的数据,在data.drainer/savepoint里改的配置,现在已经正常下发了,谢谢。

感谢您的回复,如果问题已经解决,麻烦将关键的回复设置为最佳解决方案。
新问题麻烦另开新帖哦。