dm增量同步阿里rds数据报 invlid connection

【dm版本】
Release Version: v1.0.5
Git Commit Hash: a8e9f53f91e29756b09a22cdc37a6a6efcdfe55b
Git Branch: release-1.0
UTC Build Time: 2020-04-27 06:56:31
Go Version: go version go1.13 linux/amd64

【tidb版本】
5.7.25-TiDB-v4.0.0

最近在用dm工具同步阿里云rds的增量数据的时候,日志看时不时会出现error日志,如下:

[2020/06/15 14:04:01.549 +08:00] [INFO] [syncer.go:2076] [“binlog replication progress”] [task=assets_sync] [unit=“binlog replication”] [“total binlog size”=401846361] [“last binlog size”=400846382] [“cost time”=30] [bytes/Second=33332] [“unsynced binlog size”=0] [“estimate time to catch up”=0]
[2020/06/15 14:04:01.549 +08:00] [INFO] [syncer.go:2101] [“binlog replication status”] [task=assets_sync] [unit=“binlog replication”] [total_events=850779] [total_tps=28359] [tps=8] [master_position=“(mysql-bin.000304, 7598663)”] [master_gtid=0b0cd93b-76b0-11ea-a9a7-00163e023ad7:1-22540278] [checkpoint=“(mysql-bin|000001.000304, 7598663)(flushed (mysql-bin|000001.000304, 6606968))”]
[2020/06/15 14:04:04.772 +08:00] [INFO] [syncer.go:893] [“flushed checkpoint”] [task=assets_sync] [unit=“binlog replication”] [checkpoint=“(mysql-bin|000001.000304, 7598663)(flushed (mysql-bin|000001.000304, 7598663))”]
[2020/06/15 14:04:04.772 +08:00] [INFO] [relay.go:113] [“current earliest active relay log”] [task=assets_sync] [unit=“binlog replication”] [“active relay log”=0b0cd93b-76b0-11ea-a9a7-00163e023ad7.000001/mysql-bin.000304]
[2020/06/15 14:04:05.518 +08:00] [INFO] [relay.go:568] [“flush meta finished”] [component=“relay log”] [meta=“master-uuid = 0b0cd93b-76b0-11ea-a9a7-00163e023ad7.000001, relay-binlog = (mysql-bin.000304, 7600993), relay-binlog-gtid = “]
[2020/06/15 14:04:17.641 +08:00] [INFO] [syncer.go:1631] [task=assets_sync] [unit=“binlog replication”] [event=query] [statement=”/* rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [schema=] [“last position”=“(mysql-bin|000001.000304, 7617766)”] [position=“(mysql-bin|000001.000304, 7618071)”] [“gtid set”=NULL]
[2020/06/15 14:04:17.642 +08:00] [INFO] [syncer.go:1647] [“resolve sql”] [task=assets_sync] [unit=“binlog replication”] [event=query] [“raw statement”="/
rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [statements=“["CREATE TABLE IF NOT EXISTS mysql.ha_health_check (id BIGINT DEFAULT 0,type CHAR(1) DEFAULT ‘0’,PRIMARY KEY(type)) ENGINE = InnoDB"]”] [schema=] [“last position”=“(mysql-bin|000001.000304, 7618071)”] [position=“(mysql-bin|000001.000304, 7618071)”] [“gtid set”=NULL]
[2020/06/15 14:04:17.642 +08:00] [WARN] [syncer.go:1680] [“skip event”] [task=assets_sync] [unit=“binlog replication”] [event=query] [statement=“CREATE TABLE IF NOT EXISTS mysql.ha_health_check (id BIGINT DEFAULT 0,type CHAR(1) DEFAULT ‘0’,PRIMARY KEY(type)) ENGINE = InnoDB”] [schema=]
[2020/06/15 14:04:17.642 +08:00] [INFO] [syncer.go:1734] [“prepare to handle ddls”] [task=assets_sync] [unit=“binlog replication”] [event=query] [ddls=“[]”] [“raw statement”="/
rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [position=“(mysql-bin|000001.000304, 7618071)”]
[2020/06/15 14:04:17.642 +08:00] [INFO] [syncer.go:1736] [“skip event, need handled ddls is empty”] [task=assets_sync] [unit=“binlog replication”] [event=query] [“raw statement”="/
rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [position=“(mysql-bin|000001.000304, 7618071)”]
[2020/06/15 14:04:31.549 +08:00] [INFO] [syncer.go:2076] [“binlog replication progress”] [task=assets_sync] [unit=“binlog replication”] [“total binlog size”=401884064] [“last binlog size”=401846361] [“cost time”=30] [bytes/Second=1256] [“unsynced binlog size”=0] [“estimate time to catch up”=0]
[2020/06/15 14:04:31.550 +08:00] [INFO] [syncer.go:2101] [“binlog replication status”] [task=assets_sync] [unit=“binlog replication”] [total_events=850795] [total_tps=14179] [tps=0] [master_position=“(mysql-bin.000304, 7631844)”] [master_gtid=0b0cd93b-76b0-11ea-a9a7-00163e023ad7:1-22540325] [checkpoint=“(mysql-bin|000001.000304, 7631844)(flushed (mysql-bin|000001.000304, 7598663))”]
[2020/06/15 14:04:33.664 +08:00] [INFO] [server.go:252] [request=QueryStatus] [payload="name:"assets_sync" “]
[2020/06/15 14:04:35.518 +08:00] [INFO] [relay.go:568] [“flush meta finished”] [component=“relay log”] [meta=“master-uuid = 0b0cd93b-76b0-11ea-a9a7-00163e023ad7.000001, relay-binlog = (mysql-bin.000304, 7637395), relay-binlog-gtid = “]
[2020/06/15 14:04:36.279 +08:00] [INFO] [syncer.go:893] [“flushed checkpoint”] [task=assets_sync] [unit=“binlog replication”] [checkpoint=”(mysql-bin|000001.000304, 7637395)(flushed (mysql-bin|000001.000304, 7637395))”]
[2020/06/15 14:04:36.279 +08:00] [INFO] [relay.go:113] [“current earliest active relay log”] [task=assets_sync] [unit=“binlog replication”] [“active relay log”=0b0cd93b-76b0-11ea-a9a7-00163e023ad7.000001/mysql-bin.000304]
[2020/06/15 14:04:36.695 +08:00] [INFO] [server.go:252] [request=QueryStatus] [payload=]
[2020/06/15 14:04:40.259 +08:00] [INFO] [syncer.go:1631] [task=assets_sync] [unit=“binlog replication”] [event=query] [statement=”/
rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [schema=] [“last position”=“(mysql-bin|000001.000304, 7642069)”] [position=“(mysql-bin|000001.000304, 7642374)”] [“gtid set”=NULL]
[2020/06/15 14:04:40.259 +08:00] [INFO] [syncer.go:1647] [“resolve sql”] [task=assets_sync] [unit=“binlog replication”] [event=query] [“raw statement”="/
rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [statements=“["CREATE TABLE IF NOT EXISTS mysql.ha_health_check (id BIGINT DEFAULT 0,type CHAR(1) DEFAULT ‘0’,PRIMARY KEY(type)) ENGINE = InnoDB"]”] [schema=] [“last position”=“(mysql-bin|000001.000304, 7642374)”] [position=“(mysql-bin|000001.000304, 7642374)”] [“gtid set”=NULL]
[2020/06/15 14:04:40.259 +08:00] [WARN] [syncer.go:1680] [“skip event”] [task=assets_sync] [unit=“binlog replication”] [event=query] [statement=“CREATE TABLE IF NOT EXISTS mysql.ha_health_check (id BIGINT DEFAULT 0,type CHAR(1) DEFAULT ‘0’,PRIMARY KEY(type)) ENGINE = InnoDB”] [schema=]
[2020/06/15 14:04:40.259 +08:00] [INFO] [syncer.go:1734] [“prepare to handle ddls”] [task=assets_sync] [unit=“binlog replication”] [event=query] [ddls=“[]”] [“raw statement”="/
rds internal mark / CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB"] [position=“(mysql-bin|000001.000304, 7642374)”]
[2020/06/15 14:04:40.260 +08:00] [INFO] [syncer.go:1736] [“skip event, need handled ddls is empty”] [task=assets_sync] [unit=“binlog replication”] [event=query] [“raw statement”="/
rds internal mark */ CREATE TABLE IF NOT EXISTS mysql.ha_health_check (
id BIGINT DEFAULT 0,
type CHAR(1) DEFAULT ‘0’,
PRIMARY KEY (type)
)
ENGINE = InnoDB”] [position=“(mysql-bin|000001.000304, 7642374)”]
[2020/06/15 14:04:44.871 +08:00] [ERROR] [db.go:295] [“execute statements failed after retry”] [task=assets_sync] [unit=“binlog replication”] [queries=“[DELETE FROM assets.contact_info WHERE id = ? LIMIT 1 REPLACE INTO assets.contact_info (id,contract_info_id,name,phone,phone_type,relationship,source,repay_label,result,result_detail,next_contact_time,add_time,update_time,status) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)]”] [arguments=“[[259182] [259182 63262 xxx 1370sd59552 1 10 1 0 2400 2401,2403 2020-04-07 00:00:00 2020-06-15 14:04:44 1]]”] [error=“[code=10006:class=database:scope=not-set:level=high] execute statement failed: begin: invalid connection”]
[2020/06/15 14:04:44.872 +08:00] [INFO] [syncer.go:2045] [“print status routine exits”] [task=assets_sync] [unit=“binlog replication”] [error=“context canceled”]
[2020/06/15 14:04:44.872 +08:00] [INFO] [syncer.go:1252] [“binlog replication main routine quit(context canceled)!”] [task=assets_sync] [unit=“binlog replication”] [“last position”=“(mysql-bin|000001.000304, 7644764)”]
[2020/06/15 14:04:44.872 +08:00] [INFO] [mode.go:99] [“change count”] [task=assets_sync] [unit=“binlog replication”] [“previous count”=1] [“new count”=0]
[2020/06/15 14:04:44.872 +08:00] [WARN] [syncer.go:868] [“error detected when executing SQL job, skip flush checkpoint”] [task=assets_sync] [unit=“binlog replication”] [checkpoint=“(mysql-bin|000001.000304, 7644764)(flushed (mysql-bin|000001.000304, 7637395))”]
[2020/06/15 14:04:44.872 +08:00] [INFO] [reader.go:408] [“binlog reader closing”] [task=assets_sync] [unit=“binlog replication”] [component=“binlog reader”]
[2020/06/15 14:04:44.872 +08:00] [WARN] [reader.go:137] [“parse relay finished”] [task=assets_sync] [unit=“binlog replication”] [component=“binlog reader”] [error=“parse relay log file mysql-bin.000304 from offset 6606968 in dir /data4/dm_worker-assets/relay_log/0b0cd93b-76b0-11ea-a9a7-00163e023ad7.000001: parse relay log file mysql-bin.000304 from offset 0 in dir /data4/dm_worker-assets/relay_log/0b0cd93b-76b0-11ea-a9a7-00163e023ad7.000001: context canceled”]
[2020/06/15 14:04:44.872 +08:00] [INFO] [reader.go:413] [“binlog reader closed”] [task=assets_sync] [unit=“binlog replication”] [component=“binlog reader”]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7505092)(flushed (mysql-bin|000001.000304, 7505092))”] [schema=assets] [table=public_repay]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000303, 852068)(flushed (mysql-bin|000001.000303, 852068))”] [schema=assets] [table=penalty_interest_info]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7562157)(flushed (mysql-bin|000001.000304, 7562157))”] [schema=assets] [table=customer_info]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 6629458)(flushed (mysql-bin|000001.000304, 6629458))”] [schema=assets] [table=derate_detail_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7507178)(flushed (mysql-bin|000001.000304, 7507178))”] [schema=assets] [table=repay_plan_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7644733)(flushed (mysql-bin|000001.000304, 7634248))”] [schema=assets] [table=contact_info]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 6567989)(flushed (mysql-bin|000001.000304, 6567989))”] [schema=assets] [table=contract_info_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7506179)(flushed (mysql-bin|000001.000304, 7506179))”] [schema=assets] [table=public_balance]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7644269)(flushed (mysql-bin|000001.000304, 7633852))”] [schema=assets] [table=track_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 6154230)(flushed (mysql-bin|000001.000304, 6154230))”] [schema=assets] [table=remit_detail_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 7616957)(flushed (mysql-bin|000001.000304, 7616957))”] [schema=assets] [table=repay_detail_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 6253237)(flushed (mysql-bin|000001.000304, 6253237))”] [schema=assets] [table=remit_audit]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 6318434)(flushed (mysql-bin|000001.000304, 6318434))”] [schema=assets] [table=task_info_record]
[2020/06/15 14:04:44.872 +08:00] [INFO] [checkpoint.go:464] [“rollback checkpoint”] [task=assets_sync] [unit=“binlog replication”] [component=“remote checkpoint”] [checkpoint=“(mysql-bin|000001.000304, 1975036)(flushed (mysql-bin|000001.000304, 1975036))”] [schema=assets] [table=pay_off_detail_record]
[2020/06/15 14:04:44.872 +08:00] [WARN] [syncer.go:632] [“something wrong with rollback global checkpoint”] [task=assets_sync] [unit=“binlog replication”] [“previous position”=“(mysql-bin|000001.000304, 7644764)”] [“current position”=“(mysql-bin|000001.000304, 7637395)”]
[2020/06/15 14:04:44.873 +08:00] [INFO] [subtask.go:266] [“unit process returned”] [subtask=assets_sync] [unit=Sync] [stage=Paused] [status=“{"totalEvents":850804,"totalTps":14179,"masterBinlog":"(mysql-bin.000304, 7644764)","masterBinlogGtid":"0b0cd93b-76b0-11ea-a9a7-00163e023ad7:1-22540343","syncerBinlog":"(mysql-bin|000001.000304, 7637395)"}”]
[2020/06/15 14:04:44.873 +08:00] [ERROR] [subtask.go:285] [“unit process error”] [subtask=assets_sync] [unit=Sync] [“error information”=“{"Type":1,"msg":"[code=10006:class=database:scope=not-set:level=high] execute statement failed: begin: invalid connection\
github.com/pingcap/dm/pkg/terror.(*Error).Delegate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\
github.com/pingcap/dm/pkg/conn.(*BaseConn).ExecuteSQLWithIgnoreError\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/conn/baseconn.go:149\
github.com/pingcap/dm/syncer.(*DBConn).executeSQLWithIgnore.func3\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:280\
github.com/pingcap/dm/pkg/retry.(*FiniteRetryStrategy).Apply\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/retry/strategy.go:71\
github.com/pingcap/dm/pkg/conn.(*BaseConn).ApplyRetryStrategy\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/conn/baseconn.go:216\
github.com/pingcap/dm/syncer.(*DBConn).executeSQLWithIgnore\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:275\
github.com/pingcap/dm/syncer.(*DBConn).executeSQL\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:305\
github.com/pingcap/dm/syncer.(*Syncer).sync.func3\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1002\
github.com/pingcap/dm/syncer.(*Syncer).sync\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1045\
github.com/pingcap/dm/syncer.(*Syncer).Run.func2\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1114\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357","error":{"ErrCode":10006,"ErrClass":1,"ErrLevel":3,"Message":"execute statement failed: begin: invalid connection","RawCause":"invalid connection"}}”]
[2020/06/15 14:04:45.503 +08:00] [WARN] [task_checker.go:354] [“backoff skip auto resume task”] [component=“task checker”] [task=assets_sync] [latestResumeTime=2020/06/15 14:03:30.503 +08:00] [duration=5m0s]
[2020/06/15 14:04:50.503 +08:00] [WARN] [task_checker.go:354] [“backoff skip auto resume task”] [component=“task checker”] [task=assets_sync] [latestResumeTime=2020/06/15 14:03:30.503 +08:00] [duration=5m0s]
[2020/06/15 14:04:55.503 +08:00] [WARN] [task_checker.go:354] [“backoff skip auto resume task”] [component=“task checker”] [task=assets_sync] [latestResumeTime=2020/06/15 14:03:30.503 +08:00] [duration=5m0s]

任务随后进入paused状态,然后自动恢复。

最后用 sync-diff-inspector 比对几个表后,发现数据存在丢失。
查看任务status如下:
“Type”: “ExecSQL”,
“msg”: “[code=10006:class=database:scope=not-set:level=high] execute statement failed: begin: invalid connection
github.com/pingcap/dm/pkg/terror.(*Error).Delegate
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267
github.com/pingcap/dm/pkg/conn.(*BaseConn).ExecuteSQLWithIgnoreError
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/conn/baseconn.go:149
github.com/pingcap/dm/syncer.(*DBConn).executeSQLWithIgnore.func3
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:280
github.com/pingcap/dm/pkg/retry.(*FiniteRetryStrategy).Apply
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/retry/strategy.go:71
github.com/pingcap/dm/pkg/conn.(*BaseConn).ApplyRetryStrategy
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/conn/baseconn.go:216
github.com/pingcap/dm/syncer.(*DBConn).executeSQLWithIgnore
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:275
github.com/pingcap/dm/syncer.(*DBConn).executeSQL
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:305
github.com/pingcap/dm/syncer.(*Syncer).sync.func3
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1002
github.com/pingcap/dm/syncer.(*Syncer).sync
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1045
github.com/pingcap/dm/syncer.(*Syncer).Run.func2
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1114
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”,
“error”: null
},
{
“Type”: “ExecSQL”,
“msg”: “[code=10006:class=database:scope=not-set:level=high] execute statement failed: begin: invalid connection
github.com/pingcap/dm/pkg/terror.(*Error).Delegate
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267
github.com/pingcap/dm/pkg/conn.(*BaseConn).ExecuteSQLWithIgnoreError
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/conn/baseconn.go:149
github.com/pingcap/dm/syncer.(*DBConn).executeSQLWithIgnore.func3
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:280
github.com/pingcap/dm/pkg/retry.(*FiniteRetryStrategy).Apply
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/retry/strategy.go:71
github.com/pingcap/dm/pkg/conn.(*BaseConn).ApplyRetryStrategy
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/conn/baseconn.go:216
github.com/pingcap/dm/syncer.(*DBConn).executeSQLWithIgnore
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:275
github.com/pingcap/dm/syncer.(*DBConn).executeSQL
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:305
github.com/pingcap/dm/syncer.(*Syncer).sync.func3
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1002
github.com/pingcap/dm/syncer.(*Syncer).sync
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1045
github.com/pingcap/dm/syncer.(*Syncer).Run.func2
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1114
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”,
“error”: null
}

binlog的监控也是段段续续:
image
官方异常表里面有提到这个异常,说是下游资源紧张或者重启,但是看了监控并不是很忙。有大使给看下吗。在线等!!
登记了微信目前还没回复。

  1. 请问,是在query-status恢复正常后,查看 sync-diff-inspector 结果,还是当前query-status是invalid情况下?

  2. 在使用 sync-diff-inspector对比时,上下游都有dml操作吗?是如何对比的?

首先,感谢您的回复!

task恢复正常 running时候进行的比对,是有存在dml操作的可能,但是我这个表基本都是新增数据,而且我指定了range:
range = “add_time < ‘2020-06-15’”, 就是截止到昨天的数据。
通过diff出来的结果,单条查看后,确是存在数据缺失的。
这个无效连接的原因到底是啥?

ok,目前还有什么问题吗?

这个error异常导致任务时不时的挂起,数据也存在丢失的情况,invlid connection 到底是啥导致的?

发生 invalid connection 错误时,通常表示 DM 到下游 TiDB 的数据库连接出现了异常(如网络故障、TiDB 重启、TiKV busy 等)且当前请求已有部分数据发送到了 TiDB。由于 DM 中存在同步任务并发向下游复制数据的特性,因此在任务中断时可能同时包含多个错误(可通过 query-statusquery-error 查询当前错误):

  • 如果错误中仅包含 invalid connection 类型的错误,且当前处于增量复制阶段,则 DM 会自动进行重试。
  • 如果 DM 由于版本问题(v1.0.0-rc.1 后引入自动重试)等未自动进行重试或自动重试未能成功,则可尝试先使用 stop-task 停止任务,然后再使用 start-task 重启任务。
  • tidb重启:
    tidb在早上9点多做过重启,这个现在任务还一直会报这个异常,应该没有关系吧;
  • 网络故障:
    我看了机器的network, 监控没看出异常:
  • tikv busy:
    看监控最近一段时间资源也不是很高:

所以这个错误提示最终原因还是没搞明白?

  1. 可以反馈下 query-status 的信息,看是什么错误。
  2. 反馈下 dm-worker 的完整信息。
  3. 反馈下 tidb.log 日志,看下是否有相关报错。

辛苦上传下

问题反馈.tar (941.5 KB)
日志已上传,辛苦看下

看了下 tidb.log 中 info 信息,tidb 和 tikv 之间的 wait 也是比较高 1800+ ms,确认下目前网络是否为万兆网卡,tikv.log 中可能存在一些信息,可以反馈看下。

dm-worker 中有过多次的重试,将 dm-worker 拉起来,目测还是网络的问题

[2020/06/15 18:00:09.690 +08:00] [INFO] [coprocessor.go:871] ["[TIME_COP_WAIT] resp_time:1.976441546s txnStartTS:417389663398395911 region_id:44 store_addr:10.100.81.99:20160 kv_wait_ms:1976"]

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。