Dm启动任务失败,报 failed to open DSN

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

系统版本 & kernel 版本:

centos7

TiDB 版本:

最新

磁盘型号:

集群节点分布:

数据量 & region 数量 & 副本数:

集群 QPS、.999-Duration、读写比例:

问题描述(我做了什么):

  • 启动任务失败
  • start-task task-test.yaml
  • 需要同步的mysql 是阿里云的rds
  • 报错如下:
{
    "result": false,
    "msg": "[code=26002:class=dm-master:scope=upstream:level=high] fail to initial checker: failed to open DSN :***@:0
github.com/pingcap/dm/pkg/terror.(*Error).Generate
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232
github.com/pingcap/dm/checker.(*Checker).Init
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/checker/checker.go:129
github.com/pingcap/dm/checker.CheckSyncConfig
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/checker/cmd.go:50
github.com/pingcap/dm/dm/master.(*Server).generateSubTask
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1847
github.com/pingcap/dm/dm/master.(*Server).StartTask
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:220
github.com/pingcap/dm/dm/pb._Master_StartTask_Handler
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/pb/dmmaster.pb.go:1530
google.golang.org/grpc.(*Server).processUnaryRPC
	/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:966
google.golang.org/grpc.(*Server).handleStream
	/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
	/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:685
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337",
    "workers": [
    ]
}

麻烦上传一下安装部署配置和 task ymal

安装部署配置是指什么呢

task.yml 内容如下:

# 任务名,多个同时运行的任务不能重名。
name: "test"
# 全量+增量 (all) 同步模式。
task-mode: "all"
# 下游 TiDB 配置信息。
target-database:
  host: "10.168.1.39"
  port: 4000
  user: "root"
  password: ""

# 当前数据同步任务需要的全部上游 MySQL 实例配置。
mysql-instances:
-
  # 上游实例或者复制组 ID,参考 `inventory.ini` 的 `source_id` 或者 `dm-master.toml` 的 `source-id 配置`。
  source-id: "mysql-replica-01"
  # 需要同步的库名或表名的黑白名单的配置项名称,用于引用全局的黑白名单配置,全局配置见下面的 `black-white-list` 的配置。
  black-white-list: "global"
  # mydumper 的配置项名称,用于引用全局的 mydumper 配置。
  mydumper-config-name: "global"

# 黑白名单全局配置,各实例通过配置项名引用。
black-white-list:
  global:
    do-tables:                        # 需要同步的上游表的白名单。
    - db-name: "ertc_test"              # 需要同步的表的库名。
      tbl-name: "test"          # 需要同步的表的名称。


# mydumper 全局配置,各实例通过配置项名引用。
mydumpers:
  global:
    mydumper-path: "/home/tidb/dm-ansible/resources/bin/mydumper"   # mydumper 二进制文件的路径。
    extra-args: "-B eric_test -T test"  # 只导出 `test_db` 库中的 `test_table` 表,可设置 mydumper 的任何参数。

能麻烦执行 query-status 看一下当前的 relayStatus 吗? 另外,在部署 DM 时,在 inventory.ini 内配置的上游 MySQL 密码是使用 dmctl 加密后的吗

看到relayStatus 状态报err,如下

 query-status
{
    "result": true,
    "msg": "",
    "workers": [
        {
            "result": true,
            "worker": "10.168.1.43:8262",
            "msg": "no sub task started",
            "subTaskStatus": [
            ],
            "relayStatus": {
                "masterBinlog": "(mysql-bin.000019, 2837)",
                "masterBinlogGtid": "",
                "relaySubDir": "15564e70-6f24-11e9-bd30-00163e081084.000001",
                "relayBinlog": "(, 4)",
                "relayBinlogGtid": "",
                "relayCatchUpMaster": false,
                "stage": "Paused",
                "result": {
                    "isCanceled": false,
                    "errors": [
                        {
                            "Type": "UnknownError",
                            "msg": "[code=30015:class=relay-unit:scope=upstream:level=high] TCPReader get event: ERROR 1236 (HY000): Could not open log file
github.com/pingcap/dm/pkg/terror.(*Error).Delegate
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267
github.com/pingcap/dm/pkg/binlog/reader.(*TCPReader).GetEvent
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/binlog/reader/tcp.go:151
github.com/pingcap/dm/relay/reader.(*reader).GetEvent
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/reader/reader.go:144
github.com/pingcap/dm/relay.(*Relay).handleEvents
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:346
github.com/pingcap/dm/relay.(*Relay).process
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:280
github.com/pingcap/dm/relay.(*Relay).Process
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:189
github.com/pingcap/dm/dm/worker.(*realRelayHolder).run
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:156
github.com/pingcap/dm/dm/worker.(*realRelayHolder).Start.func1
	/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:132
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337"
                        }
                    ],
                    "detail": null
                }
            },
            "sourceID": "mysql-replica-01"
        }
    ]
}

从 relay 这个报错看,还是两个不同的问题:

  1. relay 这个报错,应该是在部署/启动 DM-worker 时,指定的 binlog position 或都 GTID sets 有问题,具体可以搜上面报错里面的 MySQL 报错 “ERROR 1236 (HY000): Could not open log file”
  2. 上面 failed to open DSN 问题,有确认从 DM-master 所在的机器,能访问通上游的 MySQL 吗