lighting csv 出错 cannot have consecutive fields without separator

配置:
[lightning]

日志

level = “info”
file = “csv-tidb-lightning.log”

Prometheus

pprof-port = 8289

并发度 (最好使用默认设置)

#region-concurrency = 16
[checkpoint]

断点存储

enable = false
schema = “tidb_lightning_checkpoint”

断点默认存储在本地的文件系统,这样更高效。但你也可以

选择将断点存储在目标数据库中,设置如下:

driver = “mysql”

[tikv-importer]

使用 TiDB-backend

backend = “tidb”
[mydumper]

数据源目录

data-source-dir = “/data2/backup”
no-schema = true
[mydumper.csv]
separator = ‘,’
delimiter = ‘"’
header = true
not-null = false
null = ‘\N’
backslash-escape = true
trim-last-separator = false
[tidb]

TiDB 连接参数

host = “127.0.0.1”
port = 4000
status-port = 10080 # <- 必须有的参数
user = “root”
password = “123456”
#sql-mode = “”

异常:
[2020/08/03 14:09:55.901 +08:00] [ERROR] [restore.go:996] [“encode kv data and write failed”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [takeTime=647.37
7µs] [error=“in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 14:09:55.901 +08:00] [ERROR] [restore.go:843] [“restore engine failed”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [takeTime=701.088µs] [erro
r=“in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 14:09:55.901 +08:00] [ERROR] [restore.go:865] [“import whole table failed”] [table=test.YT_RETURNREPORT_202006] [takeTime=778.443µs] [error=“in file /d
ata2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 14:09:55.901 +08:00] [ERROR] [restore.go:605] [“restore table failed”] [table=test.YT_RETURNREPORT_202006] [takeTime=956.04µs] [error=“restore table t est.YT_RETURNREPORT_202006 failed: in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separat
or”]
[2020/08/03 14:09:55.901 +08:00] [ERROR] [restore.go:722] [“restore all tables data failed”] [takeTime=1.159676ms] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 14:09:55.901 +08:00] [ERROR] [restore.go:272] [“run failed”] [step=2] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2/backup/t
est.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]

麻烦参考下这个帖子, 是否相同? tidb v4.0.2版 lightning处理包含转义字符 \ 的字段有bug 如果不同,麻烦反馈下具体导入的值

navicat 导入能成功,用lighting 就不好使。还是这个错误。
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:996] [“encode kv data and write failed”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [takeTime=287.43
1µs] [error=“in file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:843] [“restore engine failed”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [takeTime=357.552µs] [erro
r=“in file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:865] [“import whole table failed”] [table=test.YT_RETURNREPORT_202006] [takeTime=449.202µs] [error=“in file /d
ata2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:605] [“restore table failed”] [table=test.YT_RETURNREPORT_202006] [takeTime=664.168µs] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”
]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:722] [“restore all tables data failed”] [takeTime=918.253µs] [error=“restore table test.YT_RETURNREPORT_202006
failed: in file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:272] [“run failed”] [step=2] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2/aa/test.
YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [INFO] [restore.go:477] [“everything imported, stopping periodic actions”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:278] [“the whole procedure failed”] [takeTime=41.991541ms] [error=“restore table test.YT_RETURNREPORT_202006 f
ailed: in file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:107] [“tables failed to be imported”] [count=1]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [restore.go:109] [-] [table=test.YT_RETURNREPORT_202006] [status=written] [error=“in file /data2/aa/test.YT_RETURNREPORT_2
02006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [main.go:82] [“tidb lightning encountered error stack info”] [error=“restore table test.YT_RETURNREPORT_202006 failed: in
file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”] [errorVerbose=“syntax error: cannot have
consecutive fields without separator\ in file /data2/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3\ restore table test.YT_RETURNREPORT_202006 failed”]
[2020/08/03 15:22:56.645 +08:00] [ERROR] [main.go:83] [“tidb lightning encountered error”] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2
/aa/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]

正在查看,有结论后会尽快答复,多谢。

麻烦提供一下完整日志,以及lightning 的版本

lightning 4.0.3

完整log
[2020/08/03 16:47:26.405 +08:00] [WARN] [config.go:358] [“currently only per-task configuration can be applied, global configuration changes can only be made on startu
p”] [“global config changes”="[lightning.level,lightning.file,lightning.pprof-port]"]
[2020/08/03 16:47:26.405 +08:00] [INFO] [version.go:48] [“Welcome to lightning”] [“Release Version”=v4.0.3] [“Git Commit Hash”=26a0f7195bf463950d400857db49586cb9b471f0
] [“Git Branch”=heads/refs/tags/v4.0.3] [“UTC Build Time”=“2020-07-24 01:18:08”] [“Go Version”=“go version go1.13 linux/amd64”]
[2020/08/03 16:47:26.405 +08:00] [INFO] [lightning.go:177] [cfg] [cfg="{“id”:1596444446405308558,“lightning”:{“table-concurrency”:16,“index-concurrency”:16,“r
egion-concurrency”:16,“io-concurrency”:5,“check-requirements”:true},“tidb”:{“host”:“127.0.0.1”,“port”:4000,“user”:“root”,“status-port”:10080,“pd-ad
dr”:"",“sql-mode”:“ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER”,“tls”:“false”,“security”:{“ca-path”:"",“cert-path”:"",“key-path”:""},“max-allowed-
packet”:67108864,“distsql-scan-concurrency”:100,“build-stats-concurrency”:20,“index-serial-scan-concurrency”:20,“checksum-table-concurrency”:16},“checkpoint
“:{“enable”:false,“schema”:“tidb_lightning_checkpoint”,“driver”:“file”,“keep-after-success”:false},“mydumper”:{“read-block-size”:65536,“batch-size”:1
07374182400,“batch-import-ratio”:0,“data-source-dir”:”/data2/backup”,“no-schema”:true,“character-set”:“auto”,“csv”:{“separator”:",",“delimiter”:"
\"",“header”:true,“trim-last-separator”:false,“not-null”:false,“null”:"\\N",“backslash-escape”:false},“case-sensitive”:false,“strict-format”:false,
“max-region-size”:268435456,“filter”:["."]},“tikv-importer”:{“addr”:"",“backend”:“tidb”,“on-duplicate”:“replace”,“max-kv-pairs”:32,“send-kv-pa
irs”:32768,“region-split-size”:100663296,“sorted-kv-dir”:"",“range-concurrency”:0},“post-restore”:{“level-1-compact”:false,“compact”:false,“checksum”:
true,“analyze”:true},“cron”:{“switch-mode”:“5m0s”,“log-progress”:“5m0s”},“routes”:null,“security”:{“ca-path”:"",“cert-path”:"",“key-path”:"
“},“black-white-list”:{“do-tables”:null,“do-dbs”:null,“ignore-tables”:null,“ignore-dbs”:null}}”]
[2020/08/03 16:47:26.406 +08:00] [INFO] [lightning.go:208] [“load data source start”]
[2020/08/03 16:47:26.406 +08:00] [INFO] [lightning.go:211] [“load data source completed”] [takeTime=97.35µs] []
[2020/08/03 16:47:26.413 +08:00] [INFO] [restore.go:257] [“the whole procedure start”]
[2020/08/03 16:47:26.413 +08:00] [INFO] [tidb.go:253] [“skipping check requirements for tidb backend”]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:580] [“restore all tables data start”]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:601] [“restore table start”] [table=test.YT_RETURNREPORT_202006]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:1320] [“load engines and files start”] [table=test.YT_RETURNREPORT_202006]
[2020/08/03 16:47:26.456 +08:00] [WARN] [region.go:193] [“file is too big to be processed efficiently; we suggest splitting it at 256 MB each”] [file=/data2/backup/tes
t.YT_RETURNREPORT_202006.csv] [size=65614166615]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:1349] [“load engines and files completed”] [table=test.YT_RETURNREPORT_202006] [enginesCnt=2] [filesCnt=1] [tak
eTime=73.148µs] []
[2020/08/03 16:47:26.456 +08:00] [INFO] [backend.go:227] [“open engine”] [engineTag=test.YT_RETURNREPORT_202006:-1] [engineUUID=1ba3b717-5607-53c9-a165-c73b723bd66
a]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:811] [“import whole table start”] [table=test.YT_RETURNREPORT_202006]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:841] [“restore engine start”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:927] [“encode kv data and write start”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0]
[2020/08/03 16:47:26.456 +08:00] [INFO] [backend.go:227] [“open engine”] [engineTag=test.YT_RETURNREPORT_202006:0] [engineUUID=8f8e9ecc-89bd-5f88-924e-810f5ae3a005
]
[2020/08/03 16:47:26.456 +08:00] [INFO] [restore.go:1807] [“restore file start”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [fileIndex=0] [path=/data2/ba
ckup/test.YT_RETURNREPORT_202006.csv:0]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [parser.go:162] [“syntax error”] [pos=3] [content="“ID”,“CID”,“CODE”,“CORPORATEID”,“USERID”,“MOBILE”,“SEQID”,“R
ETURNTIME”,“RECEIVETIME”,“STATUS”,“ERROR”,“MMDD”,“GATEWAY”,“STATUSJ”,“OPERAID”,“GATEWAYTAKING”,“TOTALTAKING”\r\ "11708299585","34601000006332565
",“3538”,“902533”,“220194”,“13575941770”,“695356002368”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:996] [“encode kv data and write failed”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [takeTime=716.86
5µs] [error=“in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:843] [“restore engine failed”] [table=test.YT_RETURNREPORT_202006] [engineNumber=0] [takeTime=781.699µs] [erro
r=“in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:865] [“import whole table failed”] [table=test.YT_RETURNREPORT_202006] [takeTime=868.496µs] [error=“in file /d
ata2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:605] [“restore table failed”] [table=test.YT_RETURNREPORT_202006] [takeTime=1.068036ms] [error=“restore table
test.YT_RETURNREPORT_202006 failed: in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separ
ator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:722] [“restore all tables data failed”] [takeTime=1.245852ms] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:272] [“run failed”] [step=2] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2/backup/t
est.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:278] [“the whole procedure failed”] [takeTime=44.26953ms] [error=“restore table test.YT_RETURNREPORT_202006 fa
iled: in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:107] [“tables failed to be imported”] [count=1]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [restore.go:109] [-] [table=test.YT_RETURNREPORT_202006] [status=written] [error=“in file /data2/backup/test.YT_RETURNREPO
RT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]
[2020/08/03 16:47:26.457 +08:00] [INFO] [restore.go:477] [“everything imported, stopping periodic actions”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [main.go:82] [“tidb lightning encountered error stack info”] [error=“restore table test.YT_RETURNREPORT_202006 failed: in
file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”] [errorVerbose=“syntax error: cannot
have consecutive fields without separator\ in file /data2/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3\ restore table test.YT_RETURNREPORT_202006 failed”]
[2020/08/03 16:47:26.457 +08:00] [ERROR] [main.go:83] [“tidb lightning encountered error”] [error=“restore table test.YT_RETURNREPORT_202006 failed: in file /data2
/backup/test.YT_RETURNREPORT_202006.csv:0 at offset 3: syntax error: cannot have consecutive fields without separator”]

麻烦提供一下这只张表的schema,我们本地复现看看

根据这一行日志,发现其中有全角引号,麻烦确认一下 csv 中是否含有全角引号

没有,粘贴的时候被转了。[2020/08/03 16:47:26.457 +08:00] [ERROR] [parser.go:162] [“syntax error”] [pos=3] [content="“ID”,“CID”,“CODE”,“CORPORATEID”,“USERID”,“MOBILE”,“SEQID”,“R
ETURNTIME”,“RECEIVETIME”,“STATUS”,“ERROR”,“MMDD”,“GATEWAY”,“STATUSJ”,“OPERAID”,“GATEWAYTAKING”,“TOTALTAKING”\r\ "11708299585","34601000006332565
",“3538”,“902533”,“220194”,“13575941770”,“695356002368”]

你发这个 csv 确实是没有的。为了避免转码,麻烦你把 lightning的配置也以文件的方式传上来吧。

tidb-csv-lightning.toml (872 字节)

用这个配置导入navicat从mysql中导出的csv没有问题。这个csv是从oracle导出的。

使用这个mysql 命令也能够成功导入。
LOAD DATA LOCAL INFILE ‘/data2/aa/test.YT_RETURNREPORT_202006.csv’ INTO TABLE YT_RETURNREPORT_202006 character set utf8 FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘"’ lines terminated by ‘\r\ ’ ignore 1 lines (id, cid, code, corporateid, userid, mobile, seqid, returntime, receivetime, status, error, mmdd, gateway, statusj, operaid, gatewaytaking, totaltaking);

我这边尝试本地复现一下问题

OK,麻烦了,辛苦

:handshake:

加班弄呢

看了下,这个应该是 lightning 在处理 csv 的时候,如果文件有 header, header 必须是第一行。我看报错的这个文件,应该第一行是空行,导致 lightning 把实际 header 那一行作为 data 解析。 这个 bug 我们会尽快 fix, 感谢反馈。目前的 workaround, 可以手动移除这个空行

我excel打开文件文件没看到空行啊?意思是把把表头删了就OK吗?