Query-error无法获取同步异常的binlog位置,执行sql-skip没报错也没效果

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:3.0.1 , dm版本为v1.0.1
  • 【问题描述】:myql源库修改数据库字符集后,DM同步任务暂停,query-error 无法获取错误的binlog位置

» query-error stock_to_tidb { “result”: true, “msg”: “”, “workers”: [ { “result”: true, “worker”: “192.168.17.154:8375”, “msg”: “”, “subTaskError”: [ { “name”: “stock_to_tidb”, “stage”: “Paused”, “unit”: “Sync”, “sync”: { “errors”: [ ] } } ], “RelayError”: { “msg”: “” } } ] }

» query-status stock_to_tidb { “result”: true, “msg”: “”, “workers”: [ { “result”: true, “worker”: “192.168.17.154:8375”, “msg”: “”, “subTaskStatus”: [ { “name”: “stock_to_tidb”, “stage”: “Paused”, “unit”: “Sync”, “result”: { “isCanceled”: false, “errors”: [ { “Type”: “UnknownError”, “msg”: “[code=11007:class=functional:scope=internal:level=high] unknown type ddl \u0026{ddlNode:{stmtNode:{node:{text:alter DATABASE pt_goods_stock DEFAULT CHARACTER SET utf8mb4}}} Name:pt_goods_stock AlterDefaultDatabase:false Options:[0xc0001b8960]}\ngithub.com/pingcap/dm/pkg/terror.(*Error).Generate\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232\ngithub.com/pingcap/dm/pkg/parser.SplitDDL\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/parser/common.go:290\ngithub.com/pingcap/dm/syncer.(*Syncer).resolveDDLSQL\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/ddl.go:132\ngithub.com/pingcap/dm/syncer.(*Syncer).handleQueryEvent\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1574\ngithub.com/pingcap/dm/syncer.(*Syncer).Run\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1275\ngithub.com/pingcap/dm/syncer.(*Syncer).Process\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:578\ngithub.com/pingcap/dm/syncer.(*Syncer).Resume\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:2250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337” } ], “detail”: null }, “unresolvedDDLLockID”: “”, “sync”: { “totalEvents”: “1549942848”, “totalTps”: “726”, “recentTps”: “1089”, “masterBinlog”: “(mysql-bin.001157, 407632106)”, “masterBinlogGtid”: “195cdb0f-ec45-11e9-b89b-b8599f3794dc:1-14982,19ba5019-ec45-11e9-bff8-506b4bdda558:1-774994366”, “syncerBinlog”: “(mysql-bin|000001.001105, 130989085)”, “syncerBinlogGtid”: “”, “blockingDDLs”: [ ], “unresolvedGroups”: [ ], “synced”: false } } ], “relayStatus”: { “masterBinlog”: “(mysql-bin.001157, 407632106)”, “masterBinlogGtid”: “195cdb0f-ec45-11e9-b89b-b8599f3794dc:1-14982,19ba5019-ec45-11e9-bff8-506b4bdda558:1-774994366”, “relaySubDir”: “19ba5019-ec45-11e9-bff8-506b4bdda558.000001”, “relayBinlog”: “(mysql-bin.001157, 407632106)”, “relayBinlogGtid”: “195cdb0f-ec45-11e9-b89b-b8599f3794dc:1-14982,19ba5019-ec45-11e9-bff8-506b4bdda558:1-774994366”, “relayCatchUpMaster”: true, “stage”: “Running”, “result”: null }, “sourceID”: “mysql-stock-rds” } ] }

» sql-skip --worker=192.168.17.154:8375 --binlog-pos=mysql-bin|000001.001105:130989085 stock_to_tidb { “result”: true, “msg”: “”, “workers”: [ { “result”: true, “worker”: “”, “msg”: “” } ] }

» resume-task --worker=192.168.17.154:8375 stock_to_tidb { “op”: “Resume”, “result”: true, “msg”: “”, “workers”: [ { “meta”: { “result”: true, “worker”: “192.168.17.154:8375”, “msg”: “” }, “op”: “Resume”, “logID”: “1053” } ] }

» query-error stock_to_tidb { “result”: true, “msg”: “”, “workers”: [ { “result”: true, “worker”: “192.168.17.154:8375”, “msg”: “”, “subTaskError”: [ { “name”: “stock_to_tidb”, “stage”: “Paused”, “unit”: “Sync”, “sync”: { “errors”: [ ] } } ], “RelayError”: { “msg”: “” } } ] } »

执行sql-skip和resume-task后,任务没有任何状态的改变,和没执行一摸一样

求大佬们指导

上游 binlog 文件名是 mysql-bin|000001.001105 吗?

跳过操作可以参考一下这个:

在执行完 resume-task 操作之后可以在 worker 的日志中查看一下是否跳过成功: **

**

谢谢回答

1: 上游的binlog文件名是: mysql-bin.001105 , mysql-bin.001106 这种格式

在tidb里,对应的应该是 mysql-bin|000001.001105 这种格式了

2: » resume-task --worker=192.168.17.154:8375 stock_to_tidb

{

"op": "Resume",
"result": true,
"msg": "",
"workers": [
    {
        "meta": {
            "result": true,
            "worker": "192.168.17.154:8375",
            "msg": ""
        },
        "op": "Resume",
        "logID": "1091"
    }
]

}

看上去没出现什么异常,但任务依然是paused状态

» query-error stock_to_tidb

{ “result”: true,

“msg”: “”,

"workers": [
    {
        "result": true,
        "worker": "192.168.17.154:8375",
        "msg": "",
        "subTaskError": [
            {
                "name": "stock_to_tidb",
                "stage": "Paused",
                "unit": "Sync",
                "sync": {
                    "errors": [
                    ]
                }
            }
        ],
        "RelayError": {
            "msg": ""
        }
    }
]

}

3: dm-worker.log 里的日志,看上去没有skip成功

[2019/11/29 14:23:07.700 +08:00] [ERROR] [subtask.go:253] [“unit process error”] [subtask=stock_to_tidb] [unit=Sync] [“error information”="{“msg”:"[code=11007:class=functional:scope=internal:level=high] unknown type ddl \u0026{ddlNode:{stmtNode:{node:{text:alter DATABASE pt_goods_stock DEFAULT CHARACTER SET utf8mb4}}} Name:pt_goods_stock AlterDefaultDatabase:false Options:[0xc046c3e560]}\ngithub.com/pingcap/dm/pkg/terror.(*Error).Generate\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232\ngithub.com/pingcap/dm/pkg/parser.SplitDDL\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/parser/common.go:290\ngithub.com/pingcap/dm/syncer.(*Syncer).resolveDDLSQL\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/ddl.go:132\ngithub.com/pingcap/dm/syncer.(*Syncer).handleQueryEvent\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1574\ngithub.com/pingcap/dm/syncer.(*Syncer).Run\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1275\ngithub.com/pingcap/dm/syncer.(*Syncer).Process\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:578\ngithub.com/pingcap/dm/syncer.(*Syncer).Resume\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:2250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"}"]

4: sql-pattern 与 binlog-pos 2种方法进行skip操作,效果是一样的

sql-skip --worker=192.168.17.154:8375 --binlog-pos=mysql-bin|000001.001105:130989085 stock_to_tidb resume-task --worker=192.168.17.154:8375 stock_to_tidb

sql-skip --worker=192.168.17.154:8375 --sql-pattern=~(?i)alter\s+\s+DATABASE\s+pt_goods_stock\s+DEFAULT\s+CHARACTER\s+SET\s+utf8mb4 stock_to_tidb resume-task --worker=192.168.17.154:8375 stock_to_tidb

我这个写法有什么问题吗 ? 求指点

在 task 配置文件中配置 sql-pattern 试下

filters:
  filter-partition-rule:
    schema-pattern: "*"
    sql-pattern: ["ALTER\\s+DATABASE\\s+pt_goods_stock\\s+DEFAULT\\s+CHARACTER\\s+SET\\s+utf8mb4"]
    action: Ignore

以上方式试下,按照官网步骤操作即可。

按照说明修改yaml配置文件,执行以下操作

» update-task /home/tidb/dm-ansible-v1.0.1/conf/goodsstock_to_tidb.yaml

» resume-task stock_to_tidb

查看dm-worker.log 里的日志,看上去没有skip成功,报上面同样的错误

query-error 和 query-status 和之前状态一样,没有发生改变

[2019/11/29 15:28:22.703 +08:00] [ERROR] [subtask.go:253] [“unit process error”] [subtask=stock_to_tidb] [unit=Sync] [“error information”="{“msg”:"[code=11007:class=functional:scope=internal:level=high] unknown type ddl \u0026{ddlNode:{stmtNode:{node:{text:alter DATABASE pt_goods_stock DEFAULT CHARACTER SET utf8mb4}}} Name:pt_goods_stock AlterDefaultDatabase:false Options:[0xc005ed45c0]}\ngithub.com/pingcap/dm/pkg/terror.(*Error).Generate\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232\ngithub.com/pingcap/dm/pkg/parser.SplitDDL\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/parser/common.go:290\ngithub.com/pingcap/dm/syncer.(*Syncer).resolveDDLSQL\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/ddl.go:132\ngithub.com/pingcap/dm/syncer.(*Syncer).handleQueryEvent\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1574\ngithub.com/pingcap/dm/syncer.(*Syncer).Run\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:1275\ngithub.com/pingcap/dm/syncer.(*Syncer).Process\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:578\ngithub.com/pingcap/dm/syncer.(*Syncer).Resume\n\t/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:2250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"}"]

谢谢支持

task 配置文件辛苦发一下吧。

tidb (1.1 KB)

请见附件 谢谢

哦,你这边是新添加的 filter 过滤规则,需要在 上面 mysql-instances 引用一下,类似于 black-white-list。修改下再试试呢 ?

谢谢大佬耐心解答 , 引用了filter-rules, 貌似还是没生效 ,麻烦看下附件有什么问题

filters:
  filter-procedure-rule:
    schema-pattern: "pt_goods_stock"
    sql-pattern: ["^alter\\s+DATABASE"]
    action: Ignore

这样试下呢,如果还是不行,辛苦提供下 dm-worker 的日志,日志时间可以从当前报错到修改完成后的时间段。

tidb.log (17.0 KB)

dm-worker 日志见附件,感谢大佬

好的,我们看下。

您好,看到这边 filter 的名字有变化(前面 task 中配置到的 filter-partition-rule,后面 task 中配置的 filter-procedure-rule),再确认下 mysql-instances 下面引用的 filter 的名字是否和 filter 的名字一致?另外 filter 下面加上 table-pattern * 试试看是否可以。

filters:
  filter-procedure-rule:
    schema-pattern: "pt_goods_stock"
    table-pattern: "*"
    sql-pattern: ["^alter\\s+DATABASE"]
    action: Ignore

注意这两个 filter 的名字,我直接从官网贴的配置,两次选的位置可能不一样。不要有误导/

这个有注意到 ,目前还是存在问题,请看附件,谢谢

tidb.log (20.6 KB)

你执行 stop-task,然后再 start-task 试下,filter 改成以下方式(主要是更改了 schema-pattern 这块)。

filter-alter-rule:
    schema-pattern: "*"
    sql-pattern: ["^alter\\s+DATABASE"]
    action: Ignore

stop-task然后执行start-task ,所有已同步的表是不是都要删除,重新从0开始重新同步一次呢

谢谢

该过滤规则只是匹配所有库的 sql alter database语句过滤,不用重删除表同步

stop-task然后执行start-task ,现象是数据数据重新dump,然后load。再dump之前所有数据需要删除,不然load的时候会报错。见附件截图 ,谢谢

重新导入数据的原因是因为 remove-meta 在配置文件中设置了 true,关于这个参数,详细可以看下这个帖子。因位点信息被清理掉了,只能删除下游数据重新同步了。

好的 ,谢谢 实践中