tidb v4.0.2版 lightning处理包含转义字符 \ 的字段有bug

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.2
  • 【问题描述】: tidb v4.0.2版 lightning处理包含转义字符 \ 的字段有bug .
    数据原文 部分内容 “退(离)休人员”;"";"\";"";
    tidb-lightning.toml 相关内容
    [mydumper.csv]
    separator = ‘;’
    delimiter = ‘"’
    header = false
    not-null = false
    null = ‘’
    backslash-escape = true
    trim-last-separator = false
    导入时错误如下:
    [2020/07/27 09:31:34.864 +08:00] [ERROR] [parser.go:162] [“syntax error”] [pos=264546868] [content=“15001234510”;“fyhxxx”;“2”;“子”;“11234512345”;“hpq云jj29-1 1-6-1”;"";"";"";"";"";“29-1 1-6-1”;"";"";"";"";“9-1 1-6-1”;"";"";"";"";"";"";“必填项”;“29-1 1-6-1”;"";"";"";"";"";"";"";“和\ufffd”]
    [2020/07/27 09:31:34.865 +08:00] [ERROR] [restore.go:1001] [“encode kv data and write failed”] [table=test.hai_aprnot_info] [engineNumber=0] [takeTime=12.53145859s] [error=“in file /data2/tmpdata/test.hai_aprnot_info.csv:0 at offset 264546868: syntax error: cannot have consecutive fields without separator”]
    [2020/07/27 09:31:34.865 +08:00] [ERROR] [restore.go:848] [“restore engine failed”] [table=test.hai_aprnot_info] [engineNumber=0] [takeTime=12.53151277s] [error=“in file /data2/tmpdata/test.hai_aprnot_info.csv:0 at offset 264546868: syntax error: cannot have consecutive fields without separator”]
    [2020/07/27 09:31:34.865 +08:00] [ERROR] [restore.go:870] [“import whole table failed”] [table=test.hai_aprnot_info] [takeTime=12.531569055s] [error=“in file /data2/tmpdata/test.hai_aprnot_info.csv:0 at offset 264546868: syntax error: cannot have consecutive fields without separator”]
    [2020/07/27 09:31:34.865 +08:00] [ERROR] [restore.go:610] [“restore table failed”] [table=test.hai_aprnot_info] [takeTime=12.574043636s] [error=“restore table test.hai_aprnot_info failed: in file /data2/tmpdata/test.hai_aprnot_info.csv:0 at offset 264546868: syntax error: cannot have consecutive fields without separator”]

根据tidb文档,backslash-escape = true时, 其他情况下(如 \" )反斜线会被移除,仅在字段中保留其后面的字符( " ) ,也就是说\ 被去除,后边的” 被保留,但当 \ 后边的 ” 是界定符时,” 未被识别为界定符,解析出错

将 backslash-escape = flase 后,解析正确。请审核是否是bug,还是我理解有问题。

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

好的,稍等我们确认下

您好,这里的描述应该是: 仅在字段中保留其后面的字符("),作为普通字符,稍后我们完善下文档,多谢。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。