DM创建的同步任务一直出于New状态

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0
  • 【DM 版本】:1.0.5
  • 【操作系统】:CentOS 7.7
  • 【上游mysql】:aurora mysql 5.7
  • 【问题描述】:
    创建任务后,结果返回连接dm-worker超时

» start-task ./conf/task.yaml
{
“result”: true,
“msg”: “”,
“workers”: [
{
“result”: false,
“worker”: “10.0.0.207:8262”,
“msg”: “[code=38033:class=dm-master:scope=internal:level=high] request to dm-worker 10.0.0.207:8262 is timeout, but request may be successful, please execute query-status to check status
github.com/pingcap/dm/pkg/terror.(*Error).Generate
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232
github.com/pingcap/dm/dm/master.(*Server).waitOperationOk
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1923
github.com/pingcap/dm/dm/master.(*Server).handleOperationResult
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1946
github.com/pingcap/dm/dm/master.(*Server).StartTask.func1
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:276
github.com/pingcap/dm/dm/master.(*AgentPool).Emit
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/agent_pool.go:117
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”
}
]
}

通过query-status test查询任务状态

» query-status test
{
“result”: true,
“msg”: “”,
“workers”: [
{
“result”: true,
“worker”: “10.0.0.207:8262”,
“msg”: “”,
“subTaskStatus”: [
{
“name”: “test”,
“stage”: “New”,
“unit”: “InvalidUnit”,
“result”: null,
“unresolvedDDLLockID”: “”
}
],
“relayStatus”: {
“masterBinlog”: “(mysql-bin-changelog.000002, 1427946)”,
“masterBinlogGtid”: “”,
“relaySubDir”: “d2eda1e0-3aa2-37cd-b1f9-4e6ef452b597.000001”,
“relayBinlog”: “(mysql-bin-changelog.000002, 1427946)”,
“relayBinlogGtid”: “”,
“relayCatchUpMaster”: true,
“stage”: “Running”,
“result”: null
},
“sourceID”: “mysql-replica-01”
}
]
}

» query-error test
{
“result”: true,
“msg”: “”,
“workers”: [
{
“result”: true,
“worker”: “10.0.0.207:8262”,
“msg”: “”,
“subTaskError”: [
{
“name”: “test”,
“stage”: “New”,
“unit”: “InvalidUnit”
}
],
“RelayError”: {
“msg”: “”
}
}
]
}

dm-worker和dm-master都在同一台机器上,端口也是正常监听中,可以排除网络原因

dm-worker和dm-master日志里面并未出现异常日志,就是在创建task过程中,dm-master出现大量如下日志

[2020/06/02 03:09:04.785 +00:00] [INFO] [server.go:1913] [“wait op log result”] [task=test] [worker=10.0.0.207:8262] [“operation log ID”=13] [result="meta:<result:true > log:<id:13 task:<op:Start name:"test" task:"is-sharding = true\
online-ddl-scheme = \"\"\
case-sensitive = false\
name = \"test\"\
mode = \"incremental\"\
ignore-checking-items = [\"dump_privilege\", \"replication_privilege\"]\
source-id = \"mysql-replica-01\"\
server-id = 101\
flavor = \"mysql\"\
meta-schema = \"dm_meta\"\
remove-meta = false\
disable-heartbeat = true\
heartbeat-update-interval = 1\
heartbeat-report-interval = 10\
enable-heartbeat = false\
timezone = \"\"\
binlog-type = \"local\"\
relay-dir = \"./relay_log\"\
route-rules = []\
filter-rules = []\
mapping-rule = []\
mydumper-path = \"./bin/mydumper\"\
threads = 4\
chunk-filesize = 64\
skip-tz-utc = true\
extra-args = \"\"\
pool-size = 16\
dir = \"./dumped_data.test\"\
meta-file = \"\"\
worker-count = 16\
batch = 100\
queue-size = 1024\
checkpoint-flush-interval = 30\
max-retry = 0\
auto-fix-gtid = false\
enable-gtid = false\
disable-detect = false\
safe-mode = false\
enable-ansi-quotes = false\
log-level = \"debug\"\
log-file = \"dm-worker.log\"\
log-rotate = \"\"\
pprof-addr = \"\"\
status-addr = \"\"\
\
[meta]\
BinLogName = \"mysql-bin-changelog.000002\"\
BinLogPos = 1\
\
[from]\
host = \"db-aurora-mysql-cs-dev.cluster-cynf6n5ffymx.ap-northeast-1.rds.amazonaws.com\"\
port = 3306\
user = \"admin\"\
password = \"txAyDtT7yPIqW+8jXL2hvkBC6pmn7Wsoiq6Y\"\
max-allowed-packet = 67108864\
\
[to]\
host = \"10.0.0.207\"\
port = 4000\
user = \"root\"\
password = \"\"\
max-allowed-packet = 67108864\
\
[black-white-list]\
do-dbs = [\"test\"]\
\
[[black-white-list.do-tables]]\
db-name = \"test\"\
tbl-name = \"~.+\"\
" > ts:1591067315758839682 > "]

task-mode改成all也试过,依然是同样的情况

dm-master.doml (977 字节) dm-worker.doml (1002 字节) task.yaml (584 字节)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

New 状态:
创建子任务的初始阶段,并且该状态无法从其他阶段转移过来,初始化时正确无误地转换为 running 状态。

一般来说此状态应该很快进行,辛苦将对应时间点的 dm-worker.log 上传下,

  1. 密码部分目测有问题,请根据文档进行生成,并设置下

new的状态一直持续,也没有变成running或者pause

另外,binlog的信息都拉取下来了,应该不是密码的问题吧
image

重新运行一次start-task,dm-worker的日志详见附件

» start-task ./conf/task.yaml

{
“result”: true,
“msg”: “”,
“workers”: [
{
“result”: false,
“worker”: “10.0.0.207:8262”,
“msg”: “[code=38033:class=dm-master:scope=internal:level=high] request to dm-worker 10.0.0.207:8262 is timeout, but request may be successful, please execute query-status to check status
github.com/pingcap/dm/pkg/terror.(*Error).Generate
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232
github.com/pingcap/dm/dm/master.(*Server).waitOperationOk
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1923
github.com/pingcap/dm/dm/master.(*Server).handleOperationResult
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1946
github.com/pingcap/dm/dm/master.(*Server).StartTask.func1
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:276
github.com/pingcap/dm/dm/master.(*AgentPool).Emit
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/agent_pool.go:117
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”
}
]
}

dm-worker.log (15.8 KB)

另外,我也尝试过stop-task,也一样的超时!很是郁闷!!

» stop-task test

{
“op”: “Stop”,
“result”: true,
“msg”: “”,
“workers”: [
{
“meta”: {
“result”: false,
“worker”: “10.0.0.207:8262”,
“msg”: “[code=38033:class=dm-master:scope=internal:level=high] request to dm-worker 10.0.0.207:8262 is timeout, but request may be successful, please execute query-status to check status
github.com/pingcap/dm/pkg/terror.(*Error).Generate
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232
github.com/pingcap/dm/dm/master.(*Server).waitOperationOk
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1923
github.com/pingcap/dm/dm/master.(*Server).handleOperationResult
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:1946
github.com/pingcap/dm/dm/master.(*Server).OperateTask.func2
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/server.go:368
github.com/pingcap/dm/dm/master.(*AgentPool).Emit
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/master/agent_pool.go:117
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”
},
“op”: “Stop”,
“logID”: “16”
}
]
}

你好,

  1. 当前配置文件中,is-sharding: true 如果没有多个 DM-worker 实例进行同一个任务,这些实例将上游的若干分片合并到一个下游的表中的需求可以去掉
  2. 将 task-mode 设置为 all 并配置 test.tbl,先不用正则,看是否正常
  3. dm 不是的方式是否为 ansible 部署?如果是,可以看下 inventory 文件中 mysql_password 的配置项。目测当前密码并不是通过 dmctl 生成的

image