AndyHoo
(AndyHoo)
2019 年12 月 11 日 02:19
1
为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
tidb集群的开启了两个drainer将数据备份到下游的rds,由于其中一个drainer对应的上游tidb库中一个大表新增一个字段导致报错
[2019/12/10 14:56:56.090 +08:00] [ERROR] [load.go:557] ["exec failed"] [sql="ALTER TABLE
tc_payment_sync_orderADD COLUMN
payment_statustinyint(4) NOT NULL DEFAULT '0' COMMENT '原始订单状态:0未成功,1成功' after
type"] [error="invalid connection"] [errorVerbose="invalid connection[ngithub.com/pingcap/errors.AddStack](http://ngithub.com/pingcap/errors.AddStack) /home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/[github.com/pingcap/errors@v0.11.4/errors.go:174](http://github.com/pingcap/errors@v0.11.4/errors.go:174)[ngithub.com/pingcap/errors.Trace](http://ngithub.com/pingcap/errors.Trace) /home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/[github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15](http://github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15)[ngithub.com/pingcap/tidb-binlog/pkg/util.RetryOnError](http://ngithub.com/pingcap/tidb-binlog/pkg/util.RetryOnError) /home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/util/util.go:157](http://github.com/pingcap/tidb-binlog/pkg/util/util.go:157)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).execDDL](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).execDDL) /home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:278](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:278)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).execDDL](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).execDDL) /home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:555](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:555)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).put](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).put) /home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:578](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:578)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).Run](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).Run) /home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:441](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:441)[ngithub.com/pingcap/tidb-binlog/drainer/sync.(*MysqlSyncer).run](http://ngithub.com/pingcap/tidb-binlog/drainer/sync.(*MysqlSyncer).run) /home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/drainer/sync/mysql.go:117](http://github.com/pingcap/tidb-binlog/drainer/sync/mysql.go:117) runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337"]
之后write save checkpoint
日志就没有产生了,后面该drainer同步就停止了,另一drainer正常运行,后面由于收到drainer同步延迟的告警后,重启了同步停止的drainer,重启后发现异常的那个drainer竟然直接跳过中间没有同步的数据,直接跳到另外那个正常drainer同步的时间点开始同步,这样就导致中间丢了一部分数据,请问大佬这部分数据可以补救回来吗?
请问通过两个 drainer 将上游数据同步到下游两个不同的 RDS 吗?
方便的话麻烦提供一下 drainer 的配置文件 以及重启 drainer 的命令
AndyHoo
(AndyHoo)
2019 年12 月 11 日 02:39
3
重启的时候,我先用的是scripts下的stop_drainer.sh,发现关不掉之后,我直接kill -9,后面再start_drainer.sh
麻烦通过下面的命令看一下 pump 和 drainer 的状态
bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd pumps
bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd drainers
AndyHoo
(AndyHoo)
2019 年12 月 11 日 02:47
5
pumps:
[2019/12/11 10:44:31.942 +08:00] [INFO] [nodes.go:47] ["query node"] [type=pump] [node="{NodeID: TIDB006:8250, Addr: 11.12.13.147:8250, State: online, MaxCommitTS: 413147403803951149, UpdateTime: 2019-12-11 10:44:31 +0800 CST}"] [2019/12/11 10:44:31.942 +08:00] [INFO] [nodes.go:47] ["query node"] [type=pump] [node="{NodeID: TIDB007:8250, Addr: 11.12.13.148:8250, State: online, MaxCommitTS: 413147403397627941, UpdateTime: 2019-12-11 10:44:29 +0800 CST}"]
drainer:
`
[2019/12/11 10:45:41.159 +08:00] [INFO] [nodes.go:47] [“query node”] [type=drainer] [node="{NodeID: TIDB006:8249, Addr: 11.12.13.147:8249, State: online, MaxCommitTS: 413147401706799155, UpdateTime: 2019-12-11 10:45:39 +0800 CST}"]
[2019/12/11 10:45:41.159 +08:00] [INFO] [nodes.go:47] [“query node”] [type=drainer] [node="{NodeID: TIDB007:8249, Addr: 11.12.13.148:8249, State: online, MaxCommitTS: 413147421302063179, UpdateTime: 2019-12-11 10:45:39 +0800 CST}"]
`
AndyHoo
(AndyHoo)
2019 年12 月 11 日 02:49
6
重启完后,状态都是正常的,就是有一段数据没有同步到下游rds
然后麻烦提供一下 pump 和 drainer 节点完整的日志,我通过日志看下过程
AndyHoo
(AndyHoo)
2019 年12 月 11 日 02:59
11
好的,后续出现一个大表的ddl的时候如何避免这种问题呢?
AndyHoo
(AndyHoo)
2019 年12 月 11 日 03:02
12
大表的ddl容易导致drainer挂掉或者假死,上次我给一个大表添加索引,也导致了这种问题,后面重启就好了,报错和上面那个一样,其他的没什么异常日志
嗯,这个麻烦提供一下 pump 和 drainer 的日志,我再跟进一下,感谢
AndyHoo
(AndyHoo)
2019 年12 月 11 日 03:13
14
我看了下pump里面没有错误日志,drainer的错误日志里的内容
[mysql] 2019/10/24 12:02:41 packets.go:36: read tcp 11.12.13.148:41368->10.1.66.238:3306: i/o timeout [mysql] 2019/10/24 12:03:42 packets.go:36: read tcp 11.12.13.148:38886->10.1.66.238:3306: i/o timeout [mysql] 2019/10/24 12:04:43 packets.go:36: read tcp 11.12.13.148:38896->10.1.66.238:3306: i/o timeout [mysql] 2019/10/24 12:05:44 packets.go:36: read tcp 11.12.13.148:38940->10.1.66.238:3306: i/o timeout [mysql] 2019/10/24 12:06:45 packets.go:36: read tcp 11.12.13.148:38950->10.1.66.238:3306: i/o timeout [mysql] 2019/12/10 14:52:51 packets.go:36: read tcp 11.12.13.148:42670->10.1.66.238:3306: i/o timeout [mysql] 2019/12/10 14:53:52 packets.go:36: read tcp 11.12.13.148:45318->10.1.66.238:3306: i/o timeout [mysql] 2019/12/10 14:54:53 packets.go:36: read tcp 11.12.13.148:45320->10.1.66.238:3306: i/o timeout [mysql] 2019/12/10 14:55:54 packets.go:36: read tcp 11.12.13.148:45322->10.1.66.238:3306: i/o timeout [mysql] 2019/12/10 14:56:55 packets.go:36: read tcp 11.12.13.148:45356->10.1.66.238:3306: i/o timeout
以及drainer.log中的报错信息:
[2019/12/10 14:56:56.090 +08:00] [ERROR] [load.go:557] ["exec failed"] [sql="ALTER TABLE
tc_payment_sync_orderADD COLUMN
payment_statustinyint(4) NOT NULL DEFAULT '0' COMMENT '原始订单状态:0未成功,1成功' after
type"] [error="invalid connection"] [errorVerbose="invalid connection[ngithub.com/pingcap/errors.AddStack](http://ngithub.com/pingcap/errors.AddStack)
/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/[github.com/pingcap/errors@v0.11.4/errors.go:174](http://github.com/pingcap/errors@v0.11.4/errors.go:174)[ngithub.com/pingcap/errors.Trace](http://ngithub.com/pingcap/errors.Trace)
/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/[github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15](http://github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15)[ngithub.com/pingcap/tidb-binlog/pkg/util.RetryOnError](http://ngithub.com/pingcap/tidb-binlog/pkg/util.RetryOnError)
/home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/util/util.go:157](http://github.com/pingcap/tidb-binlog/pkg/util/util.go:157)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).execDDL](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).execDDL)
/home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:278](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:278)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).execDDL](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).execDDL)
/home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:555](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:555)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).put](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*batchManager).put)
/home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:578](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:578)[ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).Run](http://ngithub.com/pingcap/tidb-binlog/pkg/loader.(*loaderImpl).Run)
/home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/pkg/loader/load.go:441](http://github.com/pingcap/tidb-binlog/pkg/loader/load.go:441)[ngithub.com/pingcap/tidb-binlog/drainer/sync.(*MysqlSyncer).run](http://ngithub.com/pingcap/tidb-binlog/drainer/sync.(*MysqlSyncer).run)
/home/jenkins/workspace/release_tidb_3.0/go/src/[github.com/pingcap/tidb-binlog/drainer/sync/mysql.go:117](http://github.com/pingcap/tidb-binlog/drainer/sync/mysql.go:117)
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337"]
AndyHoo
(AndyHoo)
2019 年12 月 11 日 03:37
15
大佬,我使用 Reparo这个工具把指定时间范围内的数据从tidb里抽取出来,然后再恢复到下游rds中可以吗?
QBin
(Bin)
2019 年12 月 11 日 08:50
16
AndyHoo
(AndyHoo)
2019 年12 月 11 日 09:29
17
但是使用这个的前提是drainer已经把pump中的数据拉取过来了,我的情况是drainer遗漏了部分数据,现在想把这部分数据给补充回来