通过lightning导入CSV大文件(800G)到tidb,数据非常慢,请问啥原因...

导入方式为: Importer-backend
tidb版本和部署节点:v4.0.0,3个kv,1个tidb
lightning和Importer部署在单独的机器,CSV文件没有切分
//----------------------------------------------------------------------------
lightning配置文件如下:
[lightning]
index-concurrency = 16
table-concurrency = 32
io-concurrency = 32
[tikv-importer]
addr = “10.204.132.30:8287”
[mydumper]
data-source-dir = “/data3/tidb-lightning/”
no-schema = true
[mydumper.csv]
separator = ‘|’
delimiter = ‘’
header = false
not-null = false
null = ‘\N’
backslash-escape = false
trim-last-separator = false
[tidb]
host = “10.204.132.31”
port = 4000
user = “root”
password = “”
status-port = 10080
pd-addr = “10.204.132.31:2379”
[post-restore]
analyze = false
[checkpoint]
enable = true
driver = “file”
dsn = “/data3/tidb_lightning_checkpoint.pb”
//----------------------------------------------------------------------------
Importer配置文件如下:
[server]
addr = “10.204.132.30:8287”
grpc-concurrency = 40
[metric]
job = “tikv-importer”
interval = “15s”
address = “”
[rocksdb]
max-backgroud-jobs = 80
[import]
import-dir = “/data4/tidb-lightning/”
num-threads = 60
num-import-jobs = 48
//------------------------------------------------------------------------------------
以下为lightning的日志文件:
[2020/08/26 16:45:59.576 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.4964601098558015] [state=writing] []
[2020/08/26 16:50:59.578 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5042661400857038] [state=writing] []
[2020/08/26 16:55:59.575 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.505839553994275] [state=writing] []
[2020/08/26 17:00:59.575 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5044362875713686] [state=writing] []
[2020/08/26 17:05:59.576 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5036977085971008] [state=writing] []
[2020/08/26 17:10:59.580 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5036721959182329] [state=writing] []
[2020/08/26 17:15:59.576 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.505215708223738] [state=writing] []
[2020/08/26 17:20:59.581 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5068626460937251] [state=writing] []
[2020/08/26 17:25:59.577 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5058983703285909] [state=writing] []
[2020/08/26 17:30:59.576 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5053531529648816] [state=writing] []
[2020/08/26 17:35:59.575 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.504923652230361] [state=writing] []
[2020/08/26 17:40:59.575 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5060802788389893] [state=writing] []
[2020/08/26 17:45:59.575 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5068116746153726] [state=writing] []
[2020/08/26 17:50:59.577 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5074971618249378] [state=writing] []
[2020/08/26 17:55:59.576 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5081152173204107] [state=writing] []
[2020/08/26 18:00:59.577 +08:00] [INFO] [restore.go:503] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.5085849203305408] [state=writing] []

1 请问 csv 文件有几个呢?
2. 看您是 4.0.0 ,可以尝试下 Local-backend
https://docs.pingcap.com/zh/tidb/dev/tidb-lightning-backends

你好,csv只有一个文件800G,好像lightning的4.0.3才支持local模式导入,目前换成Local模式导入了lightning为v4.0.4,tidb为4.0.0,也还是一样慢,以下为我的配置文件:
tidb-lightning.toml (1.4 KB)

//日志如下------------------------------------------------------------------------------------
[2020/08/26 19:02:10.450 +08:00] [WARN] [config.go:358] [“currently only per-task configuration can be applied, global configuration changes can only be made on startup”] [“global config changes”=“[lightning.level,lightning.file]”]
[2020/08/26 19:02:10.450 +08:00] [INFO] [version.go:48] [“Welcome to lightning”] [“Release Version”=v4.0.4] [“Git Commit Hash”=26a0f7195bf463950d400857db49586cb9b471f0] [“Git Branch”=heads/refs/tags/v4.0.4] [“UTC Build Time”=“2020-07-31 08:04:53”] [“Go Version”=“go version go1.13 linux/amd64”]
[2020/08/26 19:02:10.450 +08:00] [INFO] [lightning.go:177] [cfg] [cfg=“{"id":1598439730450707595,"lightning":{"table-concurrency":32,"index-concurrency":16,"region-concurrency":40,"io-concurrency":32,"check-requirements":true},"tidb":{"host":"10.204.132.31","port":4000,"user":"root","status-port":10080,"pd-addr":"10.204.132.31:2379","sql-mode":"ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER","tls":"false","security":{"ca-path":"","cert-path":"","key-path":""},"max-allowed-packet":67108864,"distsql-scan-concurrency":100,"build-stats-concurrency":20,"index-serial-scan-concurrency":20,"checksum-table-concurrency":16},"checkpoint":{"enable":true,"schema":"tidb_lightning_checkpoint","driver":"file","keep-after-success":false},"mydumper":{"read-block-size":65536,"batch-size":10737418240,"batch-import-ratio":0,"data-source-dir":"/data3/tidb-lightning/","no-schema":true,"character-set":"auto","csv":{"separator":"|","delimiter":"","header":false,"trim-last-separator":false,"not-null":false,"null":"\\N","backslash-escape":false},"case-sensitive":false,"strict-format":false,"max-region-size":268435456,"filter":["."]},"tikv-importer":{"addr":"","backend":"local","on-duplicate":"replace","max-kv-pairs":32,"send-kv-pairs":32768,"region-split-size":100663296,"sorted-kv-dir":"/data4/tidb-lightning","range-concurrency":16},"post-restore":{"level-1-compact":false,"compact":false,"checksum":true,"analyze":false},"cron":{"switch-mode":"5m0s","log-progress":"5m0s"},"routes":null,"security":{"ca-path":"","cert-path":"","key-path":""},"black-white-list":{"do-tables":null,"do-dbs":null,"ignore-tables":null,"ignore-dbs":null}}”]
[2020/08/26 19:02:10.451 +08:00] [INFO] [lightning.go:208] [“load data source start”]
[2020/08/26 19:02:10.451 +08:00] [INFO] [lightning.go:211] [“load data source completed”] [takeTime=186.736µs] []
[2020/08/26 19:02:10.467 +08:00] [INFO] [restore.go:257] [“the whole procedure start”]
[2020/08/26 19:02:10.493 +08:00] [INFO] [restore.go:580] [“restore all tables data start”]
[2020/08/26 19:02:10.497 +08:00] [INFO] [restore.go:601] [“restore table start”] [table=test.lineorder]
[2020/08/26 19:02:10.497 +08:00] [INFO] [restore.go:1320] [“load engines and files start”] [table=test.lineorder]
[2020/08/26 19:02:10.497 +08:00] [WARN] [region.go:193] [“file is too big to be processed efficiently; we suggest splitting it at 256 MB each”] [file=/data3/tidb-lightning/test.lineorder.csv] [size=722442779162]
[2020/08/26 19:02:10.498 +08:00] [INFO] [restore.go:1349] [“load engines and files completed”] [table=test.lineorder] [enginesCnt=2] [filesCnt=1] [takeTime=118.492µs] []
[2020/08/26 19:02:10.501 +08:00] [INFO] [backend.go:227] [“open engine”] [engineTag=test.lineorder:-1] [engineUUID=24fdd721-8e88-5b56-aa29-8d3cc96f645d]
[2020/08/26 19:02:10.501 +08:00] [INFO] [restore.go:811] [“import whole table start”] [table=test.lineorder]
[2020/08/26 19:02:10.501 +08:00] [INFO] [restore.go:841] [“restore engine start”] [table=test.lineorder] [engineNumber=0]
[2020/08/26 19:02:10.501 +08:00] [INFO] [restore.go:927] [“encode kv data and write start”] [table=test.lineorder] [engineNumber=0]
[2020/08/26 19:02:10.503 +08:00] [INFO] [backend.go:227] [“open engine”] [engineTag=test.lineorder:0] [engineUUID=d09d8f3c-a5c2-599e-92ad-b40fb487efc0]
[2020/08/26 19:02:10.503 +08:00] [INFO] [restore.go:1807] [“restore file start”] [table=test.lineorder] [engineNumber=0] [fileIndex=0] [path=/data3/tidb-lightning/test.lineorder.csv:0]
[2020/08/26 19:07:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.444701777689281] [state=writing] []
[2020/08/26 19:12:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.4487522372207469] [state=writing] []
[2020/08/26 19:17:10.501 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.452970132854969] [state=writing] []
[2020/08/26 19:22:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.453530625854114] [state=writing] []
[2020/08/26 19:27:10.502 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.4531946241913452] [state=writing] []
[2020/08/26 19:32:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.4536319618823836] [state=writing] []
[2020/08/26 19:37:10.498 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.45366760339067] [state=writing] []
[2020/08/26 19:42:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.4534126468109685] [state=writing] []
[2020/08/26 19:47:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.4534504445459828] [state=writing] []
[2020/08/26 19:52:10.497 +08:00] [INFO] [restore.go:508] [progress] [files=“0/1 (0.0%)”] [tables=“0/4 (0.0%)”] [speed(MiB/s)=1.453446317221705] [state=writing] []

可以看一下日志,建议切分为256M,只要一个大文件没法快

已切分成多个256M的csv,但是速度也只要10MB/s,数据导入如到同一张表,CPU有40core,消耗在70%左右…

辛苦将 lightning log 截止到目前完整的文本,以附件的形式上传下,这边需要检索下里面的内容,方便查看,感谢配合

当前表没有索引,只有按月分区,1992~1998年680G的60亿行数据,配置文件和日志如下:
tidb-lightning.log (1.2 MB) tidb-lightning.toml (1.4 KB)

从配置文件中看;
region-concurrency 没有手动指定值,默认 cpu 75% ,如果没有其他服务共用服务器的话,如果存在其他服务,建议手动指定下该值。
no-schema = true,可以提供下表结构,看下索引数量如何,评估下是否可以将索引先去掉,导入数据成功后在添加索引,tibd 是 online ddl
磁盘方面,可以看下磁盘的监控,看下是否已经到达极限
网络方面,可以看下 network 的流量,看下是否到达极限。

[2020/08/27 10:24:39.784 +08:00] [INFO] [restore.go:1816] [“restore file completed”] [table=test.lineorder] [engineNumber=26] [fileIndex=6] [path=/data3/tidb-lightning/test.lineorder.1073.csv:0] [readDur=27.167225835s] [encodeDur=17m5.264277975s] [deliverDur=4.373281885s] [checksum="{cksum=14663760837896479190,size=292684013,kvs=2228534}"] [takeTime=17m37.9049595s] []
///----------------------------------------------------------------------------------------------------------------------------------------------
从以上的每行日志看,readDur和deliverDur很快,encodeDur会耗时比较长
目前表是没有加索引的
region-concurrency是默认,目前只有该服务,所以没有指定

感谢反馈,readDur 值得是读取 csv 文件的时间,如果csv 大小为 256M,可以看下当前磁盘的 io unit,主要方向是磁盘。

万兆网卡流量正常,cpu40核使用率70%,读写文件分在不同的ssd磁盘且io都正常,目前怀疑可能是以下原因:
1、文件切分粒度不够小,是否可以切分到64M
2、速度起不来是否是因为单表并行度不够
后续会减小文件粒度再进行测试,感谢回答

可否截图看看磁盘相关的监控,read write latency 、io unit 信息呢。包括 lightning 和 tikv 的磁盘,因为 lightning local 模式直接操作的 tikv ,对磁盘的要求也是蛮高的。

lighnting 的监控需要也可以发下。

文件拆分这边官方给的建议 256 就可以了,否则频繁的读取磁盘也不是很好的方向,

以下为lightning的磁盘情况,sdd为csv源所在的磁盘,看起来刚好是10M/s,sde是解析排序后存放的磁盘:


以下为3kv+3flash(混部,跟lightning不同机器)其中一台kv的磁盘情况,kv和flash在同一磁盘sdc:

指的是 tikv 和 tiflash 混合部署在相同磁盘上?该服务器的硬件环境如何。

使用以下方式,提供下 node_exporter 完整监控和 overview 监控看下。


打开 grafana 监控,先按 d 再按 shift+e 可以打开所有监控项。

(1)、chrome 安装这个插件https://chrome.google.com/webstore/detail/full-page-screen-capture/fdpohaocaechififmbbbbbknoalclacl

(2)、鼠标焦点置于 Dashboard 上,按 ?可显示所有快捷键,先按 d 再按 E 可将所有 Rows 的 Panels 打开,需等待一段时间待页面加载完成。

(3)、使用这个 full-page-screen-capture 插件进行截屏保存

hi, 你好,目前看起来执行比较缓慢的一个点是在第一阶段读取源文件这里,麻烦确认一下 lightning 的源文件所在磁盘的 io 性能怎么样,感觉这块貌似是瓶颈所在? 因为 lightning 对 cpu 和 io 要求都比较高,如果条件允许的话,最好使用 ssd 磁盘, 否则如果瓶颈在 io 这块的话,应该没有什么优化手段可以提升性能了

另外,其实 lightning 是支持对单个 csv 文件进行并发导入的,但是依赖 csv 文件需要是标准格式。 上面的配置里面 , 可以吧「mydumper.strict-format」设置为 true (如果 csv 中不包含将为转移的 \r 或 \ 作为字符串内容的一部分的话), 这样单个文件即可并行处理,无需额外处理