load data时,列值为json并且包含\"",时导入的值被截断,值读取错误

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:

Release Version: v4.0.5
Edition: Community
Git Commit Hash: 9c1c21270001a0084dcb5d842c50e1fbea5d1fa8
Git Branch: heads/refs/tags/v4.0.5
UTC Build Time: 2020-08-28 10:07:42
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

  • 【问题描述】:
    使用load data将csv导入的tidb时,如果列值中为json字符串并且包含"",时,值被截断

load_json.xml文件内容,文档格式unix(换行符
)

"aaa"
"{""itemRangeType"":0,""itemContainType"":0,""shopRangeType"":0}"
"{""itemRangeType"":0,""itemContainType"":0,""shopRangeType"":1,""shopJson"":""[{\""id\"":\""A1234\"",\""shopName\"":\""AAAAAA\""}]""}"

导入到mysql数据库:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.31    |
+-----------+
1 row in set (0.07 sec)

mysql> drop table if exists load_json;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `load_json` ( `aaa` VARCHAR ( 2000 ) );
Query OK, 0 rows affected (0.08 sec)

mysql> load data local infile 'E:/load_json.csv' ignore  into table load_json columns terminated by ',' enclosed by '"' escaped by '' ignore 1 lines (`aaa`);
Query OK, 2 rows affected (0.07 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from load_json;
+---------------------------------------------------------------------------------------------------------------------+
| aaa                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------+
| {"itemRangeType":0,"itemContainType":0,"shopRangeType":0}                                                           |
| {"itemRangeType":0,"itemContainType":0,"shopRangeType":1,"shopJson":"[{\"id\":\"A1234\",\"shopName\":\"AAAAAA\"}]"} |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.09 sec)

导入到TIDB

mysql> select tidb_version();
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                                                             |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v4.0.5
Edition: Community
Git Commit Hash: 9c1c21270001a0084dcb5d842c50e1fbea5d1fa8
Git Branch: heads/refs/tags/v4.0.5
UTC Build Time: 2020-08-28 10:07:42
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.10 sec)

mysql> drop table if exists load_json;
Query OK, 0 rows affected (0.19 sec)

mysql> CREATE TABLE `load_json` ( `aaa` VARCHAR ( 2000 ) );
Query OK, 0 rows affected (0.07 sec)

mysql> load data local infile 'E:/load_json.csv' ignore  into table load_json columns terminated by ',' enclosed by '"' escaped by '' ignore 1 lines (`aaa`);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from load_json; -- showJson值A1234"后面的内容丢失了,注意shopJson的引号,这里也是错的
+----------------------------------------------------------------------------------------+
| aaa                                                                                    |
+----------------------------------------------------------------------------------------+
| {"itemRangeType":0,"itemContainType":0,"shopRangeType":0}                              |
| {"itemRangeType":0,"itemContainType":0,"shopRangeType":1,"shopJson":"[{""id"":""A1234" | 
+----------------------------------------------------------------------------------------+
2 rows in set (0.10 sec)

感谢反馈。本地能重现这个问题,我们先内部排查一下。