tidb dm 从阿里云RDS同步数据到TIDB 7.5.3集群,遇到报错提示 “ERROR 1236 (HY000): The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires” 但是阿里云端binlog是存在的,求教如何修复这个问题?

【 TiDB 使用环境】生产环境
【 TiDB 版本】7.5.3

dm 任务详情如下

[tidb@devops-tiup-01-vm ~ Wed Oct 16 09:19:44]$  tiup dmctl --master-addr 192.168.3.156:8261 query-status rds-to-tidb-for-b1
tiup is checking updates for component dmctl ...
A new version of dmctl is available:
   The latest version:         v8.3.0
   Local installed version:    v8.2.0
   Update current component:   tiup update dmctl
   Update all components:      tiup update --all

Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v8.2.0/dmctl/dmctl --master-addr 192.168.3.156:8261 query-status rds-to-tidb-for-b1
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "rds-product_b1",
                "worker": "dm-192.168.1.24-8262",
                "result": null,
                "relayStatus": {
                    "masterBinlog": "(mysql-bin.062310, 70132113)",
                    "masterBinlogGtid": "72366f5c-0609-11ec-b896-0c42a1b7b2a6:1-8447207357,742a767c-0609-11ec-9112-0c42a1f03bde:1-7227,75cda62d-3487-11ec-8b90-043f72e57d9a:1-983075710",
                    "relaySubDir": "75cda62d-3487-11ec-8b90-043f72e57d9a.000001",
                    "relayBinlog": "(mysql-bin.062309, 4)",
                    "relayBinlogGtid": "",
                    "relayCatchUpMaster": false,
                    "stage": "Paused",
                    "result": {
                        "isCanceled": false,
                        "errors": [
                            {
                                "ErrCode": 30015,
                                "ErrClass": "relay-unit",
                                "ErrScope": "upstream",
                                "ErrLevel": "high",
                                "Message": "TCPReader get relay event with error",
                                "RawCause": "ERROR 1236 (HY000): The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master's binary log expiration period. The GTID sets and the missing purged transactions are too long to print in this message. For more information, please see the master's error log or the manual for GTID_SUBTRACT.",
                                "Workaround": ""
                            }
                        ],
                        "detail": null
                    }
                }
            },
            "subTaskStatus": [
                {
                    "name": "rds-to-tidb-for-b1",
                    "stage": "Running",
                    "unit": "Sync",
                    "result": null,
                    "unresolvedDDLLockID": "",
                    "sync": {
                        "totalEvents": "65147593",
                        "totalTps": "257",
                        "recentTps": "0",
                        "masterBinlog": "(mysql-bin.062310, 70132113)",
                        "masterBinlogGtid": "72366f5c-0609-11ec-b896-0c42a1b7b2a6:1-8447207357,742a767c-0609-11ec-9112-0c42a1f03bde:1-7227,75cda62d-3487-11ec-8b90-043f72e57d9a:1-983075710",
                        "syncerBinlog": "(mysql-bin|000001.062308, 524291076)",
                        "syncerBinlogGtid": "72366f5c-0609-11ec-b896-0c42a1b7b2a6:1-8447207357,742a767c-0609-11ec-9112-0c42a1f03bde:1-7227,75cda62d-3487-11ec-8b90-043f72e57d9a:1-982898836",
                        "blockingDDLs": [
                        ],
                        "unresolvedGroups": [
                        ],
                        "synced": false,
                        "binlogType": "local",
                        "secondsBehindMaster": "0",
                        "blockDDLOwner": "",
                        "conflictMsg": "",
                        "totalRows": "65147593",
                        "totalRps": "257",
                        "recentRps": "0"
                    },
                    "validation": null
                }
            ]
        }
    ]
}

阿里云RDS测binlog显示都还在

但是dm同步任务突然报错提示

ERROR 1236 (HY000): The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master’s binary log expiration period. The GTID sets and the missing purged transactions are too long to print in this message. For more information, please see the master’s error log or the manual for GTID_SUBTRACT

这个是什么问题,如何修复呢?

建议用dms同步。主从节点切换了,同步才中断的

但是从阿里云控制台服务可用性中没有看到切换记录

发一下 RDS gtid_purged 参数的值,是不是DM一开始从备库同步的数据,现在想改到主库上去?

| gtid_purged   | 72366f5c-0609-11ec-b896-0c42a1b7b2a6:1-8447207357,
742a767c-0609-11ec-9112-0c42a1f03bde:1-7227,
75cda62d-3487-11ec-8b90-043f72e57d9a:1-914788327 |

不是一开始就配置的dm从rds 主库同步的,同步binlog追平一天多了,今早突然同步失败报错

再来一下gtid,看增量id是914788327还是8447207357?以前的主节点是75cda62d-3487-11ec-8b90-043f72e57d9a,现在是不是72366f5c-0609-11ec-b896-0c42a1b7b2a6?

现在"主节点"还是043f72e57d9a

*************************** 1. row ***************************
             File: mysql-bin.062318
         Position: 326157980
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 72366f5c-0609-11ec-b896-0c42a1b7b2a6:1-8447207357,
742a767c-0609-11ec-9112-0c42a1f03bde:1-7227,
75cda62d-3487-11ec-8b90-043f72e57d9a:1-984205805
1 row in set (0.00 sec)


+---------------+--------------------------------------+
| Variable_name | Value                                |
+---------------+--------------------------------------+
| server_uuid   | 75cda62d-3487-11ec-8b90-043f72e57d9a |
+---------------+--------------------------------------+
1 row in set (0.01 sec)

你是开了relaylog。在这个模式下,会把上游的binlog拖到本地,以后都是读取本地的relaylog,来完成dm同步。

现在的问题是,你这个relaylog不太正常。虽然你上面配置gtid,但是

“relayBinlogGtid”: “”,

正常的情况下,这里应该和masterBinlogGtid是一样的内容才对。

参考这一小节的内容。

最终是通过删除数据源,相同配置重建数据源,然后重新跑dm task (不过dm task从原来的全量+增量,修改为增量)然后任务就正常了

所以怀疑还是和阿里云本身有关系,但是阿里云RDS确实没有发生主备切换。