TiDB Lightning使用local物理模式导入问题

【TiDB 使用环境】生产环境
【TiDB 版本】v5.2.3
【部署方式】机器部署
【机器部署详情】32C/128GB
【集群节点数】3PD/1TiDB/8TiKV
【遇到的问题:问题现象及影响】

目前一直尝试使用local模式导入大表,所以问题比较多,在使用local物理模式导入的时候,一读取文件非常快,但是后面写入貌似就停滞了

【复制黏贴 ERROR 报错的日志】

1.导入时的进度日志信息

# cat lightning.log.2026-01-31T23.48.28+0800 |grep "progres" |more
[2026/02/01 00:00:28.105 +08:00] [INFO] [restore.go:1151] [progress] [total=1.1%] [tables="0/1 (0.0%)"] [chunks="110/8225 (1.3%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=96.16987539478136] [state=writing] [remaining=7h41m39s]
[2026/02/01 00:05:28.103 +08:00] [INFO] [restore.go:1151] [progress] [total=2.1%] [tables="0/1 (0.0%)"] [chunks="220/8225 (2.7%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=92.26964019480674] [state=writing] [remaining=7h37m0s]
[2026/02/01 00:06:14.333 +08:00] [INFO] [lightning.go:594] ["progress paused"]
[2026/02/01 00:10:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=2.3%] [tables="0/1 (0.0%)"] [chunks="239/8225 (2.9%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=65.95648294334976] [state=writing] [remaining=10h29m58s]
[2026/02/01 00:13:10.754 +08:00] [INFO] [lightning.go:610] ["progress resumed"]
[2026/02/01 00:15:28.110 +08:00] [INFO] [restore.go:1151] [progress] [total=2.8%] [tables="0/1 (0.0%)"] [chunks="293/8225 (3.6%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=60.552017707636786] [state=writing] [remaining=11h21m33s]
[2026/02/01 00:20:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=3.9%] [tables="0/1 (0.0%)"] [chunks="400/8225 (4.9%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=66.00804757706014] [state=writing] [remaining=10h17m24s]
[2026/02/01 00:25:28.105 +08:00] [INFO] [restore.go:1151] [progress] [total=5.0%] [tables="0/1 (0.0%)"] [chunks="516/8225 (6.3%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=70.4196673834599] [state=writing] [remaining=9h27m37s]
[2026/02/01 00:30:28.104 +08:00] [INFO] [restore.go:1151] [progress] [total=6.1%] [tables="0/1 (0.0%)"] [chunks="629/8225 (7.6%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=73.57986317440354] [state=writing] [remaining=8h56m59s]
[2026/02/01 00:35:28.107 +08:00] [INFO] [restore.go:1151] [progress] [total=7.2%] [tables="0/1 (0.0%)"] [chunks="738/8225 (9.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=75.31403576088131] [state=writing] [remaining=8h37m9s]
[2026/02/01 00:40:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=8.2%] [tables="0/1 (0.0%)"] [chunks="847/8225 (10.3%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=76.858247272891] [state=writing] [remaining=8h21m9s]
[2026/02/01 00:45:28.104 +08:00] [INFO] [restore.go:1151] [progress] [total=9.3%] [tables="0/1 (0.0%)"] [chunks="960/8225 (11.7%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=78.4307891813146] [state=writing] [remaining=8h5m25s]
[2026/02/01 00:50:28.119 +08:00] [INFO] [restore.go:1151] [progress] [total=10.4%] [tables="0/1 (0.0%)"] [chunks="1070/8225 (13.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=79.46483855953782] [state=writing] [remaining=7h53m25s]
[2026/02/01 00:55:28.109 +08:00] [INFO] [restore.go:1151] [progress] [total=11.4%] [tables="0/1 (0.0%)"] [chunks="1175/8225 (14.3%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=80.28180609655676] [state=writing] [remaining=7h44m57s]
[2026/02/01 01:00:28.106 +08:00] [INFO] [restore.go:1151] [progress] [total=12.6%] [tables="0/1 (0.0%)"] [chunks="1292/8225 (15.7%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=81.18661597402352] [state=writing] [remaining=7h32m12s]
[2026/02/01 01:05:28.110 +08:00] [INFO] [restore.go:1151] [progress] [total=13.6%] [tables="0/1 (0.0%)"] [chunks="1403/8225 (17.1%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=82.01286412550553] [state=writing] [remaining=7h22m55s]
[2026/02/01 01:10:28.109 +08:00] [INFO] [restore.go:1151] [progress] [total=14.7%] [tables="0/1 (0.0%)"] [chunks="1509/8225 (18.3%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=82.37054126860961] [state=writing] [remaining=7h15m57s]
[2026/02/01 01:15:28.103 +08:00] [INFO] [restore.go:1151] [progress] [total=15.8%] [tables="0/1 (0.0%)"] [chunks="1620/8225 (19.7%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=82.89390091121828] [state=writing] [remaining=7h7m41s]
[2026/02/01 01:20:28.140 +08:00] [INFO] [restore.go:1151] [progress] [total=16.8%] [tables="0/1 (0.0%)"] [chunks="1731/8225 (21.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=83.40874664334672] [state=writing] [remaining=6h59m49s]
[2026/02/01 01:25:28.125 +08:00] [INFO] [restore.go:1151] [progress] [total=17.9%] [tables="0/1 (0.0%)"] [chunks="1842/8225 (22.4%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=83.7247627644406] [state=writing] [remaining=6h52m19s]
[2026/02/01 01:30:28.116 +08:00] [INFO] [restore.go:1151] [progress] [total=19.0%] [tables="0/1 (0.0%)"] [chunks="1950/8225 (23.7%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=84.03836018891411] [state=writing] [remaining=6h45m51s]
[2026/02/01 01:35:28.118 +08:00] [INFO] [restore.go:1151] [progress] [total=20.1%] [tables="0/1 (0.0%)"] [chunks="2063/8225 (25.1%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=84.40828231331868] [state=writing] [remaining=6h38m20s]
[2026/02/01 01:40:28.103 +08:00] [INFO] [restore.go:1151] [progress] [total=21.1%] [tables="0/1 (0.0%)"] [chunks="2173/8225 (26.4%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=84.74497729495086] [state=writing] [remaining=6h31m46s]
......
[2026/02/01 05:25:28.112 +08:00] [INFO] [restore.go:1151] [progress] [total=68.2%] [tables="0/1 (0.0%)"] [chunks="6993/8225 (85.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.74007101049644] [state=writing] [remaining=2h34m5s]
[2026/02/01 05:30:28.128 +08:00] [INFO] [restore.go:1151] [progress] [total=69.2%] [tables="0/1 (0.0%)"] [chunks="7098/8225 (86.3%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.78762054286342] [state=writing] [remaining=2h29m8s]
[2026/02/01 05:35:28.109 +08:00] [INFO] [restore.go:1151] [progress] [total=70.2%] [tables="0/1 (0.0%)"] [chunks="7204/8225 (87.6%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.81164543378871] [state=writing] [remaining=2h24m7s]
[2026/02/01 05:40:28.113 +08:00] [INFO] [restore.go:1151] [progress] [total=71.3%] [tables="0/1 (0.0%)"] [chunks="7315/8225 (88.9%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.88832377199822] [state=writing] [remaining=2h18m47s]
[2026/02/01 05:45:28.115 +08:00] [INFO] [restore.go:1151] [progress] [total=72.2%] [tables="0/1 (0.0%)"] [chunks="7409/8225 (90.1%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.77728189705823] [state=writing] [remaining=2h14m33s]
[2026/02/01 05:50:28.109 +08:00] [INFO] [restore.go:1151] [progress] [total=73.4%] [tables="0/1 (0.0%)"] [chunks="7529/8225 (91.5%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.8998556103901] [state=writing] [remaining=2h8m38s]
[2026/02/01 05:55:28.104 +08:00] [INFO] [restore.go:1151] [progress] [total=74.5%] [tables="0/1 (0.0%)"] [chunks="7638/8225 (92.9%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.9770229544155] [state=writing] [remaining=2h3m26s]
[2026/02/01 06:00:28.121 +08:00] [INFO] [restore.go:1151] [progress] [total=75.7%] [tables="0/1 (0.0%)"] [chunks="7761/8225 (94.4%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=89.19281984434798] [state=writing] [remaining=1h57m23s]
[2026/02/01 06:05:28.119 +08:00] [INFO] [restore.go:1151] [progress] [total=76.8%] [tables="0/1 (0.0%)"] [chunks="7872/8225 (95.7%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=89.25861796867525] [state=writing] [remaining=1h52m5s]
[2026/02/01 06:10:28.103 +08:00] [INFO] [restore.go:1151] [progress] [total=77.9%] [tables="0/1 (0.0%)"] [chunks="7985/8225 (97.1%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=89.38096600895241] [state=writing] [remaining=1h46m40s]
[2026/02/01 06:15:28.117 +08:00] [INFO] [restore.go:1151] [progress] [total=79.0%] [tables="0/1 (0.0%)"] [chunks="8099/8225 (98.5%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=89.49232331720603] [state=writing] [remaining=1h41m13s]
[2026/02/01 06:20:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.0%] [tables="0/1 (0.0%)"] [chunks="8210/8225 (99.8%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=89.54933247980293] [state=writing] [remaining=1h35m57s]
从这里开始慢了
[2026/02/01 06:25:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=88.51329746848972] [state=importing] [remaining=1h36m17s]
[2026/02/01 06:30:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=87.39285568513115] [state=importing] [remaining=1h37m30s]
[2026/02/01 06:35:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=86.30042352946468] [state=importing] [remaining=1h38m42s]
[2026/02/01 06:40:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=85.23496552355982] [state=importing] [remaining=1h39m56s]
[2026/02/01 06:45:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=84.19549929410525] [state=importing] [remaining=1h41m8s]
[2026/02/01 06:50:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=83.18107765549748] [state=importing] [remaining=1h42m21s]
[2026/02/01 06:55:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=82.19080909130246] [state=importing] [remaining=1h43m33s]
[2026/02/01 07:00:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=81.22383975411647] [state=importing] [remaining=1h44m46s]
[2026/02/01 07:05:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=80.27935959277879] [state=importing] [remaining=1h45m58s]
[2026/02/01 07:10:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=79.35659545572123] [state=importing] [remaining=1h47m11s]
[2026/02/01 07:15:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=78.45480149119756] [state=importing] [remaining=1h48m23s]
[2026/02/01 07:20:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=77.57327062438948] [state=importing] [remaining=1h49m36s]
[2026/02/01 07:25:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=76.711331345497] [state=importing] [remaining=1h50m48s]
[2026/02/01 07:30:28.102 +08:00] [INFO] [restore.go:1151] [progress] [total=80.2%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=75.86833663985037] [state=importing] [remaining=1h52m0s]
......
[2026/02/02 09:40:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=81.4%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=17.046746547859986] [state=importing] [remaining=7h41m43s]
[2026/02/02 09:45:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=81.4%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=17.004759361830427] [state=importing] [remaining=7h42m44s]
[2026/02/02 09:50:28.101 +08:00] [INFO] [restore.go:1151] [progress] [total=81.4%] [tables="0/1 (0.0%)"] [chunks="8225/8225 (100.0%)"] [engines="0/22 (0.0%)"] [speed(MiB/s)=16.962978455040822] [state=importing] [remaining=7h43m45s]

2.然后检查日志里的异常信息如下

# cat lightning.log.2026-01-31T23.48.28+0800 |grep "WARN" |grep -v "too slow" |more
[2026/02/01 03:50:28.602 +08:00] [WARN] [localhelper.go:228] ["split regions"] [error="batch split regions failed: split region failed: err=message:\"EpochNotMatch [region 3882656] 4190035 epoch changed conf_ver: 1973 version: 49935 
!= conf_ver: 1973 version: 48855, retry later\" epoch_not_match:<current_regions:<id:3882656 start_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\216_r\\200\\000\\000\\021&\\377\\233C\\007\\000\\000\\000\\000\\000\\372\" region_epo
ch:<conf_ver:1973 version:49935 > peers:<id:4190035 store_id:2 > peers:<id:4190179 store_id:6 > peers:<id:4191292 store_id:7 > > > : [BR:Restore:ErrRestoreSplitFailed]fail to split region"] ["retry time"=2] [region_id=3882656]
[2026/02/01 04:06:25.816 +08:00] [WARN] [local.go:1827] ["write to tikv failed"] [error="rpc error: code = Unavailable desc = keepalive watchdog timeout"] [retry=0]
[2026/02/01 04:07:57.901 +08:00] [WARN] [local.go:1827] ["write to tikv failed"] [error=EOF] [retry=0]
[2026/02/01 04:15:29.984 +08:00] [WARN] [local.go:1827] ["write to tikv failed"] [error=EOF] [retry=0]
[2026/02/01 05:26:35.653 +08:00] [WARN] [local.go:1827] ["write to tikv failed"] [error=EOF] [retry=0]
[2026/02/01 06:37:46.816 +08:00] [WARN] [local.go:1827] ["write to tikv failed"] [error="rpc error: code = Unavailable desc = keepalive watchdog timeout"] [retry=0]
......
[2026/02/02 08:01:49.509 +08:00] [WARN] [local.go:1827] ["write to tikv failed"] [error="rpc error: code = Unavailable desc = keepalive watchdog timeout"] [retry=0]

3.贴上任务配置信息

[lightning]
server-mode = true
status-addr = ':8289'
level = "info"
file = "tidb-lightning.log"

[tikv-importer]
backend = "local"
sorted-kv-dir = "/backup/tidb/sorted-kv-dir"

[mydumper]
data-source-dir = "/backup/tidb/ote_userquestion"


[tidb]
host = "172.29.*.*"
port = 4000
user = "root"
password = "******"
status-port = 10080
pd-addr = "172.29.*.*:2379"

所以请帮忙看看有没有遇到过类似的问题,是并发冲突?还是有什么超时的设置?

1 个赞

目标端TiDB集群状态是正常的

1 个赞

配置文件中的lightning和tikv-importer信息改成如下试试:
[lightning]
server-mode = true
status-addr = ‘:8289’
level = “warn” # 大规模导入时会产生大量日志,影响 I/O
file = “tidb-lightning.log”

[tikv-importer]
backend = “local”
sorted-kv-dir = “/backup/tidb/sorted-kv-dir”
region-concurrency = 16 # 控制同时写入的 Region 数量,设置为 CPU 核心数的 1~2 倍
batch-size = 1000 # 批量提交大小,视数据大小调整

1 个赞

1.在[tikv-importer]下添加
batch-size = 1000
提交任务时会提示参数错误或者不支持这个参数

2.在[tikv-importer]下添加
region-concurrency = 16
运行任务会有分裂的冲突错误(我原来使用4并发其实也有这个报错)

[2026/02/02 11:38:46.707 +08:00] [ERROR] [split_client.go:285] ["fail to split region"] [region="{ID=3882656,startKey=7480000000000003FF8E5F728000001126FF9B43070000000000FA,endKey=epoch=\"conf_ver:1973 version:49935 \",peers=\"id:419
0035 store_id:2 ,id:4190179 store_id:6 ,id:4191292 store_id:7 \"}"] [regionErr="message:\"EpochNotMatch [region 3882656] 4190035 epoch changed conf_ver: 1973 version: 51015 != conf_ver: 1973 version: 49935, retry later\" epoch_not_ma
tch:<current_regions:<id:3882656 start_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\216_r\\200\\000\\000\\0227\\377^\\312\\247\\000\\000\\000\\000\\000\\372\" region_epoch:<conf_ver:1973 version:51015 > peers:<id:4190035 store_id
:2 > peers:<id:4190179 store_id:6 > peers:<id:4191292 store_id:7 > > > "] [stack="github.com/pingcap/tidb/br/pkg/restore.(*pdClient).sendSplitRegionRequest\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/git
hub.com/pingcap/br/br/pkg/restore/split_client.go:285\ngithub.com/pingcap/tidb/br/pkg/restore.(*pdClient).BatchSplitRegionsWithOrigin\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/br/br/
pkg/restore/split_client.go:334\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(*local).BatchSplitRegions\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/br/br/pkg/lightning/backe
nd/local/localhelper.go:360\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(*local).SplitAndScatterRegionByRanges.func2\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/br/br/pkg/l
ightning/backend/local/localhelper.go:214\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2026/02/02 11:38:46.711 +08:00] [WARN] [localhelper.go:228] ["split regions"] [error="batch split regions failed: split region failed: err=message:\"EpochNotMatch [region 3882656] 4190035 epoch changed conf_ver: 1973 version: 51015 
!= conf_ver: 1973 version: 49935, retry later\" epoch_not_match:<current_regions:<id:3882656 start_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\216_r\\200\\000\\000\\0227\\377^\\312\\247\\000\\000\\000\\000\\000\\372\" region_epo
ch:<conf_ver:1973 version:51015 > peers:<id:4190035 store_id:2 > peers:<id:4190179 store_id:6 > peers:<id:4191292 store_id:7 > > > : [BR:Restore:ErrRestoreSplitFailed]fail to split region"] ["retry time"=0] [region_id=3882656]
[2026/02/02 11:38:46.711 +08:00] [INFO] [localhelper.go:87] ["split and scatter region"] [minKey=7480000000000003FF8E5F728000001237FFB0B3EE0000000000FA] [maxKey=7480000000000003FF8E5F7280000012BFFF96C6D30000000000FA] [retry=1]
[2026/02/02 11:38:47.792 +08:00] [INFO] [localhelper.go:106] ["paginate scan regions"] [count=1] [start=7480000000000003FF8E5F728000001237FFB0B3EE0000000000FA] [end=7480000000000003FF8E5F7280000012BFFF96C6D30000000000FA]
[2026/02/02 11:38:47.792 +08:00] [INFO] [localhelper.go:114] ["paginate scan region finished"] [minKey=7480000000000003FF8E5F728000001237FFB0B3EE0000000000FA] [maxKey=7480000000000003FF8E5F7280000012BFFF96C6D30000000000FA] [regions=1
]

所以我先调整range-concurrency = 1看看,当前配置如下

[lightning]
server-mode = true
status-addr = ':8289'
level = "warn"
file = "tidb-lightning.log"

[tikv-importer]
backend = "local"
sorted-kv-dir = "/backup/tidb/sorted-kv-dir"
range-concurrency = 1

[mydumper]
data-source-dir = "/backup/tidb/ote_userquestion"

[tidb]
host = "172.29.*.*"
port = 4000
user = "root"
password = "******"
status-port = 10080
pd-addr = "172.29.*.*:2379"

现在运行任务虽然没有ERROR了,但是需要观察下,进度一直显示为0

[2026/02/02 11:50:53.360 +08:00] [INFO] [restore.go:1151] [progress] [total=0.0%] [tables="0/1 (0.0%)"] [chunks="0/0 (0.0%)"] [engines="0/22 (0.0%)"] [] [state=preparing] []
[2026/02/02 11:50:53.430 +08:00] [INFO] [pd.go:433] ["pause scheduler(configs)"] [name="[balance-leader-scheduler,balance-region-scheduler,balance-hot-region-scheduler]"] [cfg="{\"enable-location-replacement\":\"false\",\"leader-sche
dule-limit\":32,\"max-merge-region-keys\":0,\"max-merge-region-size\":0,\"max-pending-peer-count\":2147483647,\"max-snapshot-count\":24,\"region-schedule-limit\":40}"]
1 个赞

添加参数好使吗?解决完分享下

1 个赞
[2026/02/02 13:29:13.421 +08:00] [INFO] [pd.go:433] ["pause scheduler(configs)"] [name="[balance-leader-scheduler,balance-region-scheduler,balance-hot-region-scheduler]"] [cfg="{\"enable-location-replacement\":\"false\",\"leader-sche
dule-limit\":32,\"max-merge-region-keys\":0,\"max-merge-region-size\":0,\"max-pending-peer-count\":2147483647,\"max-snapshot-count\":24,\"region-schedule-limit\":40}"]
[2026/02/02 13:29:21.404 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=32.067451ms]
[2026/02/02 13:29:29.405 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=32.871973ms]
[2026/02/02 13:30:05.416 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=43.94711ms]
[2026/02/02 13:30:17.405 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=33.058087ms]
[2026/02/02 13:30:27.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=35.853806ms]
[2026/02/02 13:30:33.407 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=34.993617ms]
[2026/02/02 13:30:53.360 +08:00] [INFO] [restore.go:1151] [progress] [total=0.0%] [tables="0/1 (0.0%)"] [chunks="0/0 (0.0%)"] [engines="0/22 (0.0%)"] [] [state=preparing] []
[2026/02/02 13:30:53.439 +08:00] [INFO] [pd.go:433] ["pause scheduler(configs)"] [name="[balance-leader-scheduler,balance-region-scheduler,balance-hot-region-scheduler]"] [cfg="{\"enable-location-replacement\":\"false\",\"leader-sche
dule-limit\":32,\"max-merge-region-keys\":0,\"max-merge-region-size\":0,\"max-pending-peer-count\":2147483647,\"max-snapshot-count\":24,\"region-schedule-limit\":40}"]
[2026/02/02 13:30:55.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=37.007602ms]
[2026/02/02 13:31:09.416 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=44.181371ms]
[2026/02/02 13:31:17.406 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=34.614674ms]
[2026/02/02 13:32:03.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=36.682119ms]
[2026/02/02 13:32:17.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=35.657676ms]
[2026/02/02 13:32:21.414 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=41.303917ms]
[2026/02/02 13:32:33.434 +08:00] [INFO] [pd.go:433] ["pause scheduler(configs)"] [name="[balance-leader-scheduler,balance-region-scheduler,balance-hot-region-scheduler]"] [cfg="{\"enable-location-replacement\":\"false\",\"leader-sche
dule-limit\":32,\"max-merge-region-keys\":0,\"max-merge-region-size\":0,\"max-pending-peer-count\":2147483647,\"max-snapshot-count\":24,\"region-schedule-limit\":40}"]
[2026/02/02 13:33:03.403 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=31.669588ms]
[2026/02/02 13:33:29.406 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=33.770911ms]
[2026/02/02 13:33:33.405 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=33.307084ms]
[2026/02/02 13:33:43.402 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=31.042287ms]
[2026/02/02 13:34:05.404 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=32.754026ms]
[2026/02/02 13:34:13.406 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=34.026606ms]
[2026/02/02 13:34:13.447 +08:00] [INFO] [pd.go:433] ["pause scheduler(configs)"] [name="[balance-leader-scheduler,balance-region-scheduler,balance-hot-region-scheduler]"] [cfg="{\"enable-location-replacement\":\"false\",\"leader-sche
dule-limit\":32,\"max-merge-region-keys\":0,\"max-merge-region-size\":0,\"max-pending-peer-count\":2147483647,\"max-snapshot-count\":24,\"region-schedule-limit\":40}"]
[2026/02/02 13:34:15.411 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=38.925229ms]
[2026/02/02 13:34:23.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=36.013457ms]
[2026/02/02 13:34:27.410 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=38.760121ms]
[2026/02/02 13:34:39.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=36.10114ms]
[2026/02/02 13:34:41.411 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=38.772392ms]
[2026/02/02 13:34:45.409 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=37.363659ms]
[2026/02/02 13:34:53.408 +08:00] [WARN] [pd.go:150] ["get timestamp too slow"] ["cost time"=36.139677ms]

目前一直处于 state=preparing 状态中,进度一直是0

1 个赞

与导出文件会有关系么,导出文件是使用dumpling按某种SQL条件导出的SQL文件,其中按条件分了子目录,比如

根目录
/backup/tidb/ote_userquestion/
子目录
/backup/tidb/ote_userquestion/010017b6-d3db-45de-b5f2-724b242999af
/backup/tidb/ote_userquestion/015a4c2a-2a7f-4580-a07b-063b328fb9d4
/backup/tidb/ote_userquestion/018096a2-ac42-46a2-bd32-a92ff1283c71
/backup/tidb/ote_userquestion/01dfe997-c99d-43f3-85f9-a46035696ac9

然后子目录里会有相同的SQL文件名

[tidb@bigdata-prod-tidb-ansible01 ote_userquestion]$ ll 01* |more
010017b6-d3db-45de-b5f2-724b242999af:
total 10055324
-rw-r--r-- 1 tidb tidb       146 Jan 28 01:04 metadata
-rw-r--r-- 1 tidb tidb 268435521 Jan 27 19:36 ote_userquestion.000000000.sql
-rw-r--r-- 1 tidb tidb 268435717 Jan 27 19:37 ote_userquestion.000000001.sql
-rw-r--r-- 1 tidb tidb 268435606 Jan 27 19:38 ote_userquestion.000000002.sql

015a4c2a-2a7f-4580-a07b-063b328fb9d4:
total 300468
-rw-r--r-- 1 tidb tidb       146 Jan 27 12:59 metadata
-rw-r--r-- 1 tidb tidb 268435687 Jan 27 12:59 ote_userquestion.000000000.sql
-rw-r--r-- 1 tidb tidb  39225667 Jan 27 12:59 ote_userquestion.000000001.sql

018096a2-ac42-46a2-bd32-a92ff1283c71:
total 4281580
-rw-r--r-- 1 tidb tidb       146 Jan 27 19:34 metadata
-rw-r--r-- 1 tidb tidb 268435962 Jan 27 19:23 ote_userquestion.000000000.sql
-rw-r--r-- 1 tidb tidb 268435858 Jan 27 19:24 ote_userquestion.000000001.sql
-rw-r--r-- 1 tidb tidb 268436074 Jan 27 19:26 ote_userquestion.000000002.sql
-rw-r--r-- 1 tidb tidb 268435577 Jan 27 19:26 ote_userquestion.000000003.sql
-rw-r--r-- 1 tidb tidb 268435974 Jan 27 19:28 ote_userquestion.000000004.sql

01dfe997-c99d-43f3-85f9-a46035696ac9:
total 415268
-rw-r--r-- 1 tidb tidb       146 Jan 27 14:04 metadata
-rw-r--r-- 1 tidb tidb 268436292 Jan 27 14:03 ote_userquestion.000000000.sql
-rw-r--r-- 1 tidb tidb 156779163 Jan 27 14:04 ote_userquestion.000000001.sql

相同文件名会影响lightning导入么

1 个赞

region分裂不了的错误啊

分裂错误可能是因为我的导出文件包含多个子目录,然后不同的子目录里有相同文件名的文件,我尝试导入其中一个子目录,就不会出现分裂的错误,但是导入巨慢
4.1G的数据要43min
13G的数据要2h25min

  • 先通过日志 + Dashboard定位瓶颈(TiKV Compaction / 磁盘 / 配置问题);
  • 优先优化 Lightning 并发参数TiKV Compaction 配置,这是解决 “写入停滞” 的关键;
  • 大表采用分批导入 + 先数据后索引的策略,降低 TiKV 写入压力;
  • 若仍有问题,考虑升级到 v5.2.7 修复版本 Bug,或临时调整集群副本数。

这种结构会导致 TiDB Lightning 无法正确区分数据文件

Lightning 无法区分这些是“不同分片的数据”,它认为这是同一份数据,可以通过参数降低并发,减少 Region 冲突的概率