DM同步卡在load 0.00% (tidb3.0.15 dm1.0.5)

这是一个分享

tidb 3.0.15
dm 1.0.5

问题现象在dm-worker日志里会看到如下信息

[2020/06/29 13:18:23.862 +08:00] [INFO] [loader.go:122] ["closed !!!"] [task=你的任务名] [unit=load] ["worker ID"=58]
[2020/06/29 13:18:23.862 +08:00] [INFO] [loader.go:122] ["closed !!!"] [task=你的任务名] [unit=load] ["worker ID"=62]
[2020/06/29 13:18:23.862 +08:00] [INFO] [loader.go:122] ["closed !!!"] [task=你的任务名] [unit=load] ["worker ID"=4]
[2020/06/29 13:18:23.862 +08:00] [INFO] [loader.go:119] ["start to close..."] [task=你的任务名] [unit=load] ["worker ID"=27]
[2020/06/29 13:18:23.862 +08:00] [INFO] [loader.go:122] ["closed !!!"] [task=你的任务名] [unit=load] ["worker ID"=27]
[2020/06/29 13:18:23.862 +08:00] [INFO] [loader.go:639] ["stop importing data process"] [task=你的任务名] [unit=load]
[2020/06/29 13:18:28.822 +08:00] [INFO] [status.go:75] ["progress status of load"] [task=你的任务名] [unit=load] [finished_bytes=0] [total_bytes=5230641967] [total_file_co
unt=7441] [progress="0.00 %"]
[2020/06/29 13:18:33.822 +08:00] [INFO] [status.go:75] ["progress status of load"] [task=你的任务名] [unit=load] [finished_bytes=0] [total_bytes=5230641967] [total_file_co
unt=7441] [progress="0.00 %"]
[2020/06/29 13:18:38.666 +08:00] [INFO] [relay.go:568] ["flush meta finished"] [component="relay log"] [meta="master-uuid = 7e35e4a1-9119-11ea-a7e3-b8599f489558.000001, rela
y-binlog = (mysql-bin.000473, 367330236), relay-binlog-gtid = "]
[2020/06/29 13:18:38.822 +08:00] [INFO] [status.go:75] ["progress status of load"] [task=你的任务名] [unit=load] [finished_bytes=0] [total_bytes=5230641967] [total_file_co
unt=7441] [progress="0.00 %"]
[2020/06/29 13:18:43.822 +08:00] [INFO] [status.go:75] ["progress status of load"] [task=你的任务名] [unit=load] [finished_bytes=0] [total_bytes=5230641967] [total_file_co
unt=7441] [progress="0.00 %"]

在dm日志中几乎就一直这样卡,而且是在不断的刷出0.00 %。

而且在dmctl中查看任务状态,也在一直卡在0,并且也没有明显的错误

{
    "result": true,
    "worker": "***:8262",
    "msg": "",
    "subTaskStatus": [
        {
            "name": "任务名",
            "stage": "Running",
            "unit": "Load",
            "result": null,
            "unresolvedDDLLockID": "",
            "load": {
                "finishedBytes": "0",
                "totalBytes": "5245290674",
                "progress": "0.00 %",
                "metaBinlog": "(mysql-bin.000483, 139269287)"
            }
        }
    ],
    "relayStatus": {
        "masterBinlog": "(mysql-bin.000483, 161515134)",
        "masterBinlogGtid": "7e35e4a1-9119-11ea-a7e3-b8599f489558:1-410675134",
        "relaySubDir": "7e35e4a1-9119-11ea-a7e3-b8599f489558.000001",
        "relayBinlog": "(mysql-bin.000483, 161515134)",
        "relayBinlogGtid": "",
        "relayCatchUpMaster": true,
        "stage": "Running",
        "result": null
    },
    "sourceID": "***"
},

这个时候的思路是一直觉得数据量太大了,卡在没有导进去,或者是dm哪出问题了。就反复去找dm的日志。

最后在论坛里看到一个提示,需要去看看tidb层面的日志。

于是我就找到了如下错误

[2020/06/30 20:09:29.696 +08:00] [ERROR] [conn.go:728] ["dispatch error"] [conn=28568] [connInfo="id:28568, addr:ip:43164 status:11, collation:utf8_general_ci, user:sys_sync"] [command=Query] [status="inTxn:1, autocommit:1"
] [sql="USE `数据库名*`;"] [err="[schema:1049]Unknown database '数据库名*'\ngithub.com/pingcap/errors.AddStack\n\t/Users/cdp/iceblue/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/parse
r/terror.(*Error).GenWithStackByArgs\n\t/Users/cdp/iceblue/go/pkg/mod/github.com/pingcap/parser@v3.0.13-0.20200507065234-2f9b5a8acf14+incompatible/terror/terror.go:238\ngithub.com/pingcap/tidb/executor.(*SimpleExec).executeUse\n\t/Us
ers/cdp/iceblue/go/src/pingcap/tidb/executor/simple.go:525\ngithub.com/pingcap/tidb/executor.(*SimpleExec).Next\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/executor/simple.go:109\ngithub.com/pingcap/tidb/executor.Next\n\t/Users/cdp/ice
blue/go/src/pingcap/tidb/executor/executor.go:191\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/executor/adapter.go:449\ngithub.com/pingcap/tidb/executor.(*ExecStmt).han
dlePessimisticDML\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/executor/adapter.go:468\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/executor/adapter.go:314\ngithub.com/pingcap/tidb/session
.runStmt\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/session/tidb.go:264\ngithub.com/pingcap/tidb/session.(*session).executeStatement\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/session/session.go:1031\ngithub.com/pingcap/tidb/session.(*
session).execute\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/session/session.go:1145\ngithub.com/pingcap/tidb/session.(*session).Execute\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/session/session.go:1069\ngithub.com/pingcap/tidb/server.
(*TiDBContext).Execute\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/server/driver_tidb.go:246\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/server/conn.go:1256\ngithub.com/pingcap/ti
db/server.(*clientConn).dispatch\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/server/conn.go:966\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/server/conn.go:716\ngithub.com/pingcap/tidb/ser
ver.(*Server).onConn\n\t/Users/cdp/iceblue/go/src/pingcap/tidb/server/server.go:401\nruntime.goexit\n\t/Users/cdp/rink/go1.13.1/src/runtime/asm_amd64.s:1357"]

主要错误在那个Unknown database上。

原因是在配置文件里,配置了一个错误的库名,导致tidb在导入前没找到这个库名,然后就像错误那样一直在等待或者重复的去找。dm日志层面就是一直卡着load 0.00。

错误就像这样,目标库名后面多写了个*

...



  route-rule-6:
    schema-pattern: "库名*"
    table-pattern: "表名*"
    target-schema: "数据库名*"
    target-table:  "表名"


...

总结

先看tidb日志
再检查任务的配置文件,看看是不是库名表名写错了,写了一个未知或不存在的,导致一直卡着不动。

顺便吐槽,这个地方如果不去手动解决会一直卡在load,应该给一个超时时间的。如果load一直是0达到多少阈值就报个错给个提示,不要一直这么等下去。

1赞

棒棒棒。

是的,当前问题我们已经向研发同学反馈,此问题也在 dm v1.0.6 得到了解决,可以升级体验下哦~

PS:是否说明下版本和当前环境可能更加清晰哦~