DM同步create FetchDDLInfo stream报错

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:3.0
  • 【问题描述】: DM同步mysql到tidb,任务一直处于new状态,查看dm-master上有如下报错:
[2019/12/26 15:36:42.669 +08:00] [INFO] [server.go:190] ["listening gRPC API and status request"] [address=:8261]
[2019/12/26 15:36:45.673 +08:00] [INFO] [server.go:1224] ["update workers of task"] [workers="{\"qtt_hivemeta_dm_task_1\":[\"172.16.184.190:8262\"]}"]
[2019/12/26 15:37:56.769 +08:00] [INFO] [server.go:497] [payload="name:\"qtt_hivemeta_dm_task_1\" "] [request=QueryStatus]
[2019/12/26 15:38:02.584 +08:00] [INFO] [server.go:553] [payload="name:\"qtt_hivemeta_dm_task_1\" "] [request=QueryError]
[2019/12/26 15:38:02.590 +08:00] [ERROR] [server.go:1325] ["receive ddl info"] [worker=172.16.184.190:8262] [error="rpc error: code = Unavailable desc = transport is closing"]
[2019/12/26 15:38:07.590 +08:00] [ERROR] [server.go:1308] ["create FetchDDLInfo stream"] [worker=172.16.184.190:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.184.190:8262: connect: connection refused\""]
[2019/12/26 15:38:12.590 +08:00] [ERROR] [server.go:1308] ["create FetchDDLInfo stream"] [worker=172.16.184.190:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.184.190:8262: connect: connection refused\""]
[2019/12/26 15:38:17.590 +08:00] [ERROR] [server.go:1308] ["create FetchDDLInfo stream"] [worker=172.16.184.190:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.184.190:8262: connect: connection refused\""]
[2019/12/26 15:41:27.399 +08:00] [INFO] [server.go:553] [payload="name:\"qtt_hivemeta_dm_task_1\" "] [request=QueryError]
[2019/12/26 15:41:27.404 +08:00] [ERROR] [server.go:1325] ["receive ddl info"] [worker=172.16.184.190:8262] [error="rpc error: code = Unavailable desc = transport is closing"]
[2019/12/26 15:41:32.404 +08:00] [ERROR] [server.go:1308] ["create FetchDDLInfo stream"] [worker=172.16.184.190:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.184.190:8262: connect: connection refused\""]
[2019/12/26 15:41:37.404 +08:00] [ERROR] [server.go:1308] ["create FetchDDLInfo stream"] [worker=172.16.184.190:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.184.190:8262: connect: connection refused\""]
[2019/12/26 15:41:42.404 +08:00] [ERROR] [server.go:1308] ["create FetchDDLInfo stream"] [worker=172.16.184.190:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.184.190:8262: connect: connection refused\""]

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

请检查下,这个ip和端口是开放的吗? 172.16.184.190:8262

Telnet可以通的

先使用check-task检查一下,看是否有报错

pass的

麻烦重新start一下task,然后query查询一下状态, 另外上传下dm-master和dm-worker的日志,多谢

dm-master.log (2.5 MB) dm-worker.log (3.9 MB)

操作流程

» start-task /opt/apps/tidb/dm-ansible-v1.0.2/conf/task_hive_meta.yaml
{
    "result": true,
    "msg": "",
    "workers": [
        {
            "result": false,
            "worker": "172.16.184.190:8262",
            "msg": "[code=38033:class=dm-master:scope=internal:level=high] request is timeout, but request may be successful, please execute `query-status` to check status\ngithub.com/pingcap/dm/pkg/terror.(*Error).Generate\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232\ngithub.com/pingcap/dm/dm/master.(*Server).waitOperationOk\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1923\ngithub.com/pingcap/dm/dm/master.(*Server).handleOperationResult\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1946\ngithub.com/pingcap/dm/dm/master.(*Server).StartTask.func1\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:276\ngithub.com/pingcap/dm/dm/master.(*AgentPool).Emit\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/agent_pool.go:117\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"
        }
    ]
}
» query-status qtt_hivemeta_dm_task_1
{
    "result": true,
    "msg": "",
    "workers": [
        {
            "result": true,
            "worker": "172.16.184.190:8262",
            "msg": "",
            "subTaskStatus": [
                {
                    "name": "qtt_hivemeta_dm_task_1",
                    "stage": "New",
                    "unit": "InvalidUnit",
                    "result": null,
                    "unresolvedDDLLockID": ""
                }
            ],
            "relayStatus": {
                "masterBinlog": "(mysql-bin.000021, 206321368)",
                "masterBinlogGtid": "0f590104-10d9-11ea-960d-98039b07362c:1-66143303,df1a6d40-254b-11ea-9b64-b8599f3ec032:1-4185263",
                "relaySubDir": "df1a6d40-254b-11ea-9b64-b8599f3ec032.000001",
                "relayBinlog": "(mysql-bin.000021, 206321368)",
                "relayBinlogGtid": "",
                "relayCatchUpMaster": true,
                "stage": "Running",
                "result": null
            },
            "sourceID": "mysql-replica-01"
        }
    ]
}
» query-error qtt_hivemeta_dm_task_1
{
    "result": true,
    "msg": "",
    "workers": [
        {
            "result": false,
            "worker": "172.16.184.190:8262",
            "msg": "[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = transport is closing\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/dm/master/workerrpc.callRPC\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/workerrpc/rawgrpc.go:124\ngithub.com/pingcap/dm/dm/master/workerrpc.(*GRPCClient).SendRequest\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/workerrpc/rawgrpc.go:64\ngithub.com/pingcap/dm/dm/master.(*Server).getErrorFromWorkers.func2\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1190\ngithub.com/pingcap/dm/dm/master.(*AgentPool).Emit\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/agent_pool.go:117\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337",
            "subTaskError": [
            ],
            "RelayError": null
        }
    ]
}

您好: 请问上游是mysql什么版本? 下游tidb具体版本,dm的版本? 同时麻烦发一下inventory.ini, dm-master,dm-worker 和task的配置,多谢