数据表量过多,dm迁移至TiDB时仅能同步10万张表

【 TiDB 使用环境】Poc
【遇到的问题:使用DM工具,由MySQL同步数据至TiDB时,上游库数据表数量过多(约30万张表),tidb速度越来越低,最终写入速度归零(同步十万张表的时候基本同步不动了)

task 配置和 log 发下。

30w 张表是分库分表么?

3000个DB下有约30w张表;

task配置:
name: wps
task-mode: all

target-database:
host: “10.0.0.0”
port: 000000
user: “Ksc_TiDB”
password: “sdfsafas” # 如果密码不为空,则推荐使用经过 dmctl 加密的密文

mydumpers: # dump 处理单元的运行配置参数
global: # 配置名称
threads: 4 # dump 处理单元从上游数据库实例导出数据的线程数量,默认值为 4
chunk-filesize: 64 # dump 处理单元生成的数据文件大小,默认值为 64,单位为 MB
extra-args: “–consistency none” # dump 处理单元的其他参数,不需要在 extra-args 中配置 table-list,DM 会自动生成

loaders: # load 处理单元的运行配置参数
global:
pool-size: 16
import-mode: “sql”
on-duplicate: “replace”
dir: /data/wps/data

syncers: # sync 处理单元的运行配置参数
global: # 配置名称
worker-count: 16 # 应用已传输到本地的 binlog 的并发线程数量,默认值为 16。调整此参数不会影响上游拉取日志的并发,但会对下游产生显著压力。
batch: 100 # sync 迁移到下游数据库的一个事务批次 SQL 语句数,默认值为 100,建议一般不超过 500。
enable-ansi-quotes: true # 若 session 中设置 sql-mode: "ANSI_QUOTES",则需开启此项

# 设置为 true,则将来自上游的 `INSERT` 改写为 `REPLACE`,将 `UPDATE` 改写为 `DELETE` 与 `REPLACE`,保证在表结构中存在主键或唯一索引的条件下迁移数据时可以重复导入 DML。
safe-mode: true
# 设置为 true,DM 会在不增加延迟的情况下,尽可能地将上游对同一条数据的多次操作压缩成一次操作。
# 如 INSERT INTO tb(a,b) VALUES(1,1); UPDATE tb SET b=11 WHERE a=1; 会被压缩成 INSERT INTO tb(a,b) VALUES(1,11); 其中 a 为主键
# 如 UPDATE tb SET b=1 WHERE a=1; UPDATE tb(a,b) SET b=2 WHERE a=1; 会被压缩成 UPDATE tb(a,b) SET b=2 WHERE a=1; 其中 a 为主键
# 如 DELETE FROM tb WHERE a=1; INSERT INTO tb(a,b) VALUES(1,1); 会被压缩成 REPLACE INTO tb(a,b) VALUES(1,1); 其中 a 为主键
compact: false
# 设置为 true,DM 会尽可能地将多条同类型的语句合并到一条语句中,生成一条带多行数据的 SQL 语句。
# 如 INSERT INTO tb(a,b) VALUES(1,1); INSERT INTO tb(a,b) VALUES(2,2); 会变成 INSERT INTO tb(a,b) VALUES(1,1),(2,2);
# 如 UPDATE tb SET b=11 WHERE a=1; UPDATE tb(a,b) set b=22 WHERE a=2; 会变成 INSERT INTO tb(a,b) VALUES(1,11),(2,22) ON DUPLICATE KEY UPDATE a=VALUES(a), b=VALUES(b); 其中 a 为主键
# 如 DELETE FROM tb WHERE a=1; DELETE FROM tb WHERE a=2 会变成 DELETE FROM tb WHERE (a) IN (1),(2);其中 a 为主键
multiple-rows: false

mysql-instances:

  • source-id: “source01”
    block-allow-list: “ba-rule1”
    loader-thread: 64
    loader-config-name: “global”
    syncer-config-name: “global”
    mydumper-config-name: “global”

block-allow-list:
ba-rule1:
do-dbs: [“cwbaseshow*”]

目前用 dmctl query-statu 查询任务状态,是什么

这么来看的话你的任务卡在了全量导入阶段,可以传一下 DM worker 的日志看看卡在哪里了吗

传一下 DM worker 的日志,看看日志里面有没有什么信息

大概是在2023/02/03 22:23:47.002 +08:00,progress就一直是0%了
[2023/02/03 22:23:43.479 +08:00] [INFO] [collector.go:255] [“backup success summary”] [task=wps] [unit=dump] [total-ranges=567000] [ranges-succeed=567000] [ranges-failed=0] [total-take=2h51m2.991960503s] [total-kv-size=1.426GB] [average-speed=138.9kB/s] [total-rows=9368995]
[2023/02/03 22:23:43.479 +08:00] [INFO] [dumpling.go:202] [“dump data finished”] [task=wps] [unit=dump] [“cost time”=2h52m8.82914405s]
[2023/02/03 22:23:43.479 +08:00] [INFO] [dumpling.go:289] [“progress status of dumpling”] [task=wps] [unit=dump] [total_tables=282000] [finished_tables=282000] [estimated_total_rows=9095077] [finished_rows=9368995] [estimated_progress=100.00%] [“new progress”=“100.00 %”] [bps=88084]
[2023/02/03 22:23:43.479 +08:00] [INFO] [subtask.go:348] [“unit process returned”] [subtask=wps] [unit=Dump] [stage=Finished] [status=“{"totalTables":282000,"completedTables":282000,"finishedBytes":1425501087,"finishedRows":9368995,"estimateTotalRows":9095077,"bps":88084,"progress":"100.00 %"}”]
[2023/02/03 22:23:43.479 +08:00] [INFO] [subtask.go:359] [“switching to next unit”] [subtask=wps] [unit=Dump]
[2023/02/03 22:23:43.479 +08:00] [INFO] [subtask.go:248] [“start to run”] [subtask=wps] [unit=Load]
[2023/02/03 22:23:43.479 +08:00] [INFO] [lightning.go:383] [“lightning load start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:23:43.480 +08:00] [INFO] [util.go:200] [“put load worker in etcd”] [task=wps] [source=source01] [worker=worker01]
[2023/02/03 22:23:43.552 +08:00] [INFO] [checkpoint.go:575] [“initial checkpoint record”] [task=wps] [unit=lightning-load] [component=“lightning checkpoint database list”] [task=wps] [source=source01]
[2023/02/03 22:23:43.648 +08:00] [INFO] [checkpoint.go:595] [“update lightning loader status”] [task=wps] [unit=lightning-load] [component=“lightning checkpoint database list”] [task=wps] [source=source01] [status=running]
[2023/02/03 22:23:43.663 +08:00] [INFO] [info.go:49] [“Welcome to TiDB-Lightning”] [release-version=v6.0.0-master] [git-hash=None] [git-branch=None] [go-version=go1.19.3] [utc-build-time=None] [race-enabled=false]
[2023/02/03 22:23:43.663 +08:00] [INFO] [lightning.go:383] [cfg] [task=wps] [unit=lightning-load] [cfg=“{"id":1675434223663581706,"lightning":{"table-concurrency":64,"index-concurrency":64,"region-concurrency":64,"io-concurrency":5,"check-requirements":true,"meta-schema-name":"","max-error":{"type":0},"task-info-schema-name":"lightning_task_info"},"tidb":{"host":"10.0.5.6","port":31378,"user":"Ksc_TiDB","status-port":10080,"pd-addr":"","sql-mode":"NO_ENGINE_SUBSTITUTION,IGNORE_SPACE,NO_AUTO_VALUE_ON_ZERO,ALLOW_INVALID_DATES","tls":"false","security":{"ca-path":"","cert-path":"","key-path":"","redact-info-log":false},"max-allowed-packet":67108864,"distsql-scan-concurrency":15,"build-stats-concurrency":20,"index-serial-scan-concurrency":20,"checksum-table-concurrency":2,"vars":{"foreign_key_checks":"0","tidb_txn_mode":"optimistic","time_zone":"+08:00"}},"checkpoint":{"schema":"tidb_lightning_checkpoint","driver":"file","enable":true,"keep-after-success":"origin"},"mydumper":{"read-block-size":65536,"batch-size":0,"batch-import-ratio":0,"source-id":"source01","data-source-dir":"file:///data/wps/data.wps","character-set":"auto","csv":{"separator":",","delimiter":"\"","terminator":"","null":"\\N","header":true,"trim-last-separator":false,"not-null":false,"backslash-escape":true},"max-region-size":268435456,"filter":[".","!mysql.","!sys.","!INFORMATION_SCHEMA.","!PERFORMANCE_SCHEMA.","!METRICS_SCHEMA.","!INSPECTION_SCHEMA."],"files":null,"no-schema":false,"case-sensitive":false,"strict-format":false,"default-file-rules":true,"ignore-data-columns":null,"data-character-set":"binary","data-invalid-char-replace":"�"},"tikv-importer":{"addr":"","backend":"tidb","on-duplicate":"replace","max-kv-pairs":4096,"send-kv-pairs":32768,"region-split-size":0,"region-split-keys":0,"sorted-kv-dir":"","disk-quota":9223372036854775807,"range-concurrency":0,"duplicate-resolution":"none","incremental-import":false,"engine-mem-cache-size":536870912,"local-writer-mem-cache-size":134217728,"store-write-bwlimit":0},"post-restore":{"checksum":"off","analyze":"off","level-1-compact":false,"post-process-at-last":true,"compact":false},"cron":{"switch-mode":"5m0s","log-progress":"5m0s","check-disk-quota":"1m0s"},"routes":[],"security":{"ca-path":"","cert-path":"","key-path":"","redact-info-log":false},"black-white-list":{"do-tables":null,"do-dbs":null,"ignore-tables":null,"ignore-dbs":null}}”]
[2023/02/03 22:23:44.124 +08:00] [INFO] [lightning.go:495] [“load data source start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:23:47.002 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:23:47.202 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:23:49.422 +08:00] [INFO] [loader.go:450] [“[loader] file is filtered by file router”] [task=wps] [unit=lightning-load] [path=metadata]
[2023/02/03 22:23:50.055 +08:00] [INFO] [lightning.go:498] [“load data source completed”] [task=wps] [unit=lightning-load] [takeTime=5.930994944s] []
[2023/02/03 22:23:50.055 +08:00] [INFO] [checkpoints.go:1014] [“open checkpoint file failed, going to create a new one”] [task=wps] [unit=lightning-load] [path=/data/wps/data.wps/tidb_lightning_checkpoint.pb] []
[2023/02/03 22:23:50.100 +08:00] [INFO] [restore.go:461] [“the whole procedure start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:23:50.100 +08:00] [INFO] [restore.go:769] [“restore all schema start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:24:17.000 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:24:17.202 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:24:47.001 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:24:47.201 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:25:17.000 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:25:17.205 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:25:47.002 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:25:47.205 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:26:17.006 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:26:17.204 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:26:47.000 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]

这里看不出什么,麻烦你贴一下tidb的日志,在tidb层应该会有一些报错

tidb层无报错,目前建立dm任务数,能够突破到14W张表了,并且表数量还在上涨;dm在同步任务方面是否存在类似的限制?

你说的太笼统了,目前对上下游版本有限制,语法,字符集有限制
最大同步任务数量 600
每个 Task 处理的表数量 无限制

按道理tidb同步过程中是没有限制的,之前有同步过最多70w的table,开了多个dm任务去同步,是不是dm资源太少导致?