为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
- 【TiDB 版本】:4.0.8 w/ DM 2.0.0
- 【问题描述】:
当前我们正在线下环境测试用DM从阿里云RDS MySQL实时导入数据(共50多张表,原库表直接迁移,没有分库分表情况)。任务正常运行一段时间之后出错停止了,但未及时处理,出错2个小时之后,用dmctl query-status
命令查看:
{
"result": true,
"msg": "",
"sources": [
{
"result": true,
"msg": "",
"sourceStatus": {
"source": "dm_mysql_src_test",
"worker": "dm-10.0.46.191-8262",
"result": null,
"relayStatus": null
},
"subTaskStatus": [
{
"name": "dm_mysql_task_test",
"stage": "Paused",
"unit": "Sync",
"result": {
"isCanceled": false,
"errors": [
{
"ErrCode": 10006,
"ErrClass": "database",
"ErrScope": "not-set",
"ErrLevel": "high",
"Message": "startLocation: [position: (mysql-bin.000011, 214170385), gtid-set: 4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1688134,d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122], endLocation: [position: (mysql-bin.000011, 214170532), gtid-set: 4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1688134,d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122]: execute statement failed: commit",
"RawCause": "Error 1062: Duplicate entry 'oujYRxHPp_kmBMNCmu' for key 'openid'",
"Workaround": ""
}
],
"detail": null
},
"unresolvedDDLLockID": "",
"sync": {
"totalEvents": "1225708",
"totalTps": "69",
"recentTps": "48",
"masterBinlog": "(mysql-bin.000011, 421297478)",
"masterBinlogGtid": "d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122,4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1951208",
"syncerBinlog": "(mysql-bin.000011, 213984816)",
"syncerBinlogGtid": "d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122,4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1687811",
"blockingDDLs": [
],
"unresolvedGroups": [
],
"synced": false,
"binlogType": "remote"
}
}
]
}
]
}
在执行该任务的dm-worker.log中发现:
[2020/11/23 20:03:02.701 +08:00] [ERROR] [db.go:273] ["execute statements failed after retry"] [task=dm_mysql_task_test] [unit="binlog replication"] [queries="[INSERT INTO `test_agb`.`userwacc` (`userid`,`unionid`,`openid`,`avatar`,`nickname`,`gender`,`pinyin`,`subscribed`,`appopenid`,`weappopenid`,`publicopenid`,`joinweappopenid`,`rbtweappopenid`,`third_id`) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)]"] [arguments="[[110034603 oRo5hu5-mkmg oujYRxHPp_kmBMNCmu 测试1 0 0 oujYRxHPp_kmBMNCmu <nil> <nil> <nil> <nil> ]]"] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 1062: Duplicate entry 'oujYRxHPp_kmBMNCmu' for key 'openid'"]
[2020/11/23 20:03:02.712 +08:00] [ERROR] [subtask.go:300] ["unit process error"] [subtask=dm_mysql_task_test] [unit=Sync] ["error information"="{\"ErrCode\":10006,\"ErrClass\":\"database\",\"ErrScope\":\"not-set\",\"ErrLevel\":\"high\",\"Message\":\"startLocation: [position: (mysql-bin.000011, 214170385), gtid-set: 4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1688134,d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122], endLocation: [position: (mysql-bin.000011, 214170532), gtid-set: 4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1688134,d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122]: execute statement failed: commit\",\"RawCause\":\"Error 1062: Duplicate entry 'oujYRxHPp_kmBMNCmu' for key 'openid'\"}"]
该表的openid
一列具有UNIQUE KEY约束,该INSERT语句插入了相同的openid
,故无法同步。但是RDS MySQL库竟然执行成功了(i.e. 与TiDB中相比,该条数据的PRIMARY KEY列userid
已经变成了110034603),按道理这是完全不应该发生的,怀疑是RDS本身的问题。
接下来试图用mysqlbinlog工具查看源库对应位点的binlog,结果无法解析,输出Row event for unknown table
:
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 214170385
#201123 17:31:45 server id 1819313190 end_log_pos 0 CRC32 0x76a5d501 Start: binlog v 4, server v 5.6.16-log created 201123 17:31:45
# at 214170385
#201123 20:03:02 server id 1819313190 end_log_pos 214170532 CRC32 0xaad43c0f Write_rows: table id 260 flags: STMT_END_F
### Row event for unknown table #260DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
于是我们决定暂时不去查找原因,而是直接在TiDB中执行了DELETE+INSERT语句,手动处理了出错的那一行。然后手动修改任务配置文件,在mysql-instances
一节加入meta
,指定出错语句之后的消费位点:
mysql-instances:
- source-id: "dm_mysql_src_test"
meta:
binlog-name: "mysql-bin.000011"
binlog-pos: 214170532
binlog-gtid: "4e104f70-2cdb-11eb-9206-ec0d9acf00fe:1-1688134,d2c18b61-da9f-11e9-b478-7cd30a518102:1-828785122"
......
执行dmctl start-task
命令,很久没有反应,然后输出:
{
"result": true,
"msg": "",
"sources": [
{
"result": false,
"msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:10001 ErrClass:\"database\" ErrScope:\"downstream\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask dm_mysql_task_test : database driver error\" RawCause:\"context deadline exceeded\" Workaround:\"Please check the database connection and the database config in configuration file.\" , Workaround: Please execute `query-status` to check status.",
"source": "dm_mysql_src_test",
"worker": "dm-10.0.46.191-8262"
}
]
}
而dmctl query-status
显示(反应时间也很长,大概几十秒):
{
"result": true,
"msg": "",
"sources": [
{
"result": false,
"msg": "[code=38008:class=dm-master:scope=internal:level=high], Message: grpc request error, RawCause: rpc error: code = DeadlineExceeded desc = context deadline exceeded",
"sourceStatus": {
"source": "dm_mysql_src_test",
"worker": "",
"result": null,
"relayStatus": null
},
"subTaskStatus": [
]
}
]
}
将任务配置中加入的meta信息删掉,重启任务,错误仍然如旧。期间多次检查MySQL数据源的配置,均是正常状态。
请问如何解决上述问题?Many thanks~