lightning 导入数据出现符号替换

组件以及版本

组件 版本
dumpling 4.0.10
lightning 4.0.10
lightning 3.0.5
tidb-server 4.0.9

MySQL 表结构以及数据

create database lightningtest;
 
use lightningtest;
 
CREATE TABLE `xxxx` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'xx',
  `mc` varchar(16) COLLATE utf8_bin NOT NULL COMMENT 'xx',
  `title` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT ' ' COMMENT 'xx',
  `notify_cursor` bigint(20) NOT NULL COMMENT 'xx',
  `context` varchar(1000) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT ' ' COMMENT 'xx',
  `accepter` int(11) NOT NULL COMMENT 'xx',
  `publisher` int(11) DEFAULT '0' COMMENT 'xx',
  `type_id` tinyint(4) NOT NULL COMMENT 'xx',
  `source_id` tinyint(4) NOT NULL COMMENT 'xx',
  `ext_info` varchar(500) COLLATE utf8_bin DEFAULT ' ' COMMENT 'xxx)',
  `ctime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `mtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后修改时间',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'xx',
  `notify_type` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'xx,xx',
  `res_id` varchar(64) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'xx, xx, xx',
  `card_type` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'type',
  `card_brief` varchar(1024) COLLATE utf8_bin NOT NULL DEFAULT ' ' COMMENT 'xx',
  `card_msg_brief` varchar(1024) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'xx',
  `card_cover` varchar(1024) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'xx',
  `card_story_title` varchar(512) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'xx',
  `card_link` varchar(512) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'xx',
  `hash` bigint(32) unsigned NOT NULL DEFAULT '0' COMMENT 'xx',
  PRIMARY KEY (`id`),
  KEY `ix_notify_cursor` (`notify_cursor`),
  KEY `ix_mtime` (`mtime`),
  KEY `ix_res_id_mc_notify_type` (`res_id`,`mc`,`notify_type`),
  KEY `ix_accepter_type_del_csr` (`accepter`,`type_id`,`deleted`,`notify_cursor`),
  KEY `ix_hash` (`hash`)
) ENGINE=InnoDB AUTO_INCREMENT=8055905 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='user notifications';
 
 
INSERT INTO `xxxx` VALUES
(8025904,'1_1_3','#{xx的}{\"https://xxvideo/xx2/#rxx\"}',1609362798578231000,'xx',27606509,169050759,3,1,'{ \"cm_new_url\": { \"title\": \"xxx:\\/\\/comment\\/detail\\/1\\/670851582\\/3887205093\\/?subType=0&anchor=3887790799&showEnter=1&extraIntentId=0\", \"content\": \"xxxx:\\/\\/comment\\/detail\\/1\\/670851582\\/3887205093\\/?subType=0&anchor=3887790799&showEnter=1&extraIntentId=0\" } }','2020-12-31 05:13:18','2020-12-31 05:13:18',0,0,'',0,'','','','','',14956610699933);
 
 
MariaDB [lightningtest]> select * from xxxx\G
*************************** 1. row ***************************
              id: 8025904
              mc: 1_1_3
           title: #{xx的}{"https://xxvideo/xx2/#rxx"}
   notify_cursor: 1609362798578231000
         context: xx
        accepter: 27606509
       publisher: 169050759
         type_id: 3
       source_id: 1
        ext_info: { "cm_new_url": { "title": "xxx:\/\/comment\/detail\/1\/670851582\/3887205093\/?subType=0&anchor=3887790799&showEnter=1&extraIntentId=0", "content": "xxxx:\/\/comment\/detail\/1\/670851582\/3887205093\/?subType=0&anchor=3887790799&showEnter=1&extraIntentId=0" } }
           ctime: 2020-12-31 05:13:18
           mtime: 2020-12-31 05:13:18
         deleted: 0
     notify_type: 0
          res_id:
       card_type: 0
      card_brief:
  card_msg_brief:
      card_cover:
card_story_title:
       card_link:
            hash: 14956610699933

二进制文件以及配置

wget https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.tar.gz
wget https://download.pingcap.org/tidb-toolkit-v4.0.10-linux-amd64.tar.gz
 
tar -xvf tidb-toolkit-v3.0.5-linux-amd64.tar.gz
tar -xvf tidb-toolkit-v4.0.10-linux-amd64.tar.gz
 
 
mkdir tidb-toolkit-v3.0.5-linux-amd64/conf
mkdir tidb-toolkit-v4.0.10-linux-amd64/conf

tidb-toolkit-v3.0.5-linux-amd64/conf/lightning.toml
tidb-toolkit-v4.0.10-linux-amd64/conf/lightning.toml
配置如下:

[lightning]
level = "info"
file = "./tidb-lightning.log"

[checkpoint]
enable = true
driver = "file"
dsn = "./chk.pb"

[tikv-importer]
backend = "tidb"

[mydumper]
# mydumper 或者dumpling 导出的数据目录
data-source-dir = "/data/bak/dumplingout/"


[tidb]
# TiDB 连接参数
host = "172.19.0.202"
port = 4000
status-port = 10080  # <- 必须有的参数
user = "root"
password = ""
sql-mode = ""


#[[routes]]
#schema-pattern = ""
#table-pattern = ""
#target-schema = ""
#target-table = ""

数据导出

## 数据导出
./tidb-toolkit-v4.0.10-linux-amd64/bin/dumpling --host 172.16.39.127 --port 3306 --filetype sql   -F 500 -o /data/bak/dumplingout/  -B lightningtest   -u  root -p xxxxxxxxxxxxx
 

lightning 3.0.5 导入,复现:

 
cd tidb-toolkit-v3.0.5-linux-amd64
 
./bin/tidb-lightning -config ./conf/lightning.toml

检查数据以及日志


!注意! 此处 ext_info字段 与MySQL数据的对比~

mysql> select * from lightningtest.xxxx\G
*************************** 1. row ***************************
              id: 8025904
              mc: 1_1_3
           title: #{xx的}{"https://xxvideo/xx2/#rxx"}
   notify_cursor: 1609362798578231000
         context: xx
        accepter: 27606509
       publisher: 169050759
         type_id: 3
       source_id: 1
        ext_info: { "cm_new_url": { "title": "xxx:\/\/comment\/detail\/1\/670851582\/3887205093\/?subType=0anchor=3887790799showEnter=1extraIntentId=0", "content": "xxxx:\/\/comment\/detail\/1\/670851582\/3887205093\/?subType=0anchor=3887790799showEnter=1extraIntentId=0" } }
           ctime: 2020-12-31 05:13:18
           mtime: 2020-12-31 05:13:18
         deleted: 0
     notify_type: 0
          res_id:
       card_type: 0
      card_brief:
  card_msg_brief:
      card_cover:
card_story_title:
       card_link:
            hash: 14956610699933
1 row in set (0.01 sec)
 

 
## 查看 tidb_general_log


[2021/05/20 09:10:19.657 +00:00] [INFO] [session.go:2287] [GENERAL_LOG] [conn=38] [user=root@172.19.0.202] [schemaVersion=249] [txnStartTS=0] [forUpdateTS=0] [isReadConsistency=false] [current_db=] [txn_mode=PESSIMISTIC] [sql="REPLACE INTO `lightningtest`.`xxxx` VALUES(8025904,'1_1_3','#{xx的}{\"https://xxvideo/xx2/#rxx\"}',1609362798578231000,'xx',27606509,169050759,3,1,'{ \"cm_new_url\": { \"title\": \"xxx:\\\\/\\\\/comment\\\\/detail\\\\/1\\\\/670851582\\\\/3887205093\\\\/?subType=0\\Zanchor=3887790799\\ZshowEnter=1\\ZextraIntentId=0\", \"content\": \"xxxx:\\\\/\\\\/comment\\\\/detail\\\\/1\\\\/670851582\\\\/3887205093\\\\/?subType=0\\Zanchor=3887790799\\ZshowEnter=1\\ZextraIntentId=0\" } }','2020-12-31 05:13:18','2020-12-31 05:13:18',0,0,'',0,'','','','','',14956610699933)"]
 
** 发现  &  被替换为 \\Z **

4.0.10 lightning 导入

drop database lightningtest;
 
cd tidb-toolkit-v4.0.10-linux-amd64
 
./bin/tidb-lightning -config ./conf/lightning.toml
 
mysql> select * from lightningtest.xxxx\G
*************************** 1. row ***************************
              id: 8025904
              mc: 1_1_3
           title: #{xx的}{"https://xxvideo/xx2/#rxx"}
   notify_cursor: 1609362798578231000
         context: xx
        accepter: 27606509
       publisher: 169050759
         type_id: 3
       source_id: 1
        ext_info: { "cm_new_url": { "title": "xxx:\/\/comment\/detail\/1\/670851582\/3887205093\/?subType=0&anchor=3887790799&showEnter=1&extraIntentId=0", "content": "xxxx:\/\/comment\/detail\/1\/670851582\/3887205093\/?subType=0&anchor=3887790799&showEnter=1&extraIntentId=0" } }
           ctime: 2020-12-31 05:13:18
           mtime: 2020-12-31 05:13:18
         deleted: 0
     notify_type: 0
          res_id:
       card_type: 0
      card_brief:
  card_msg_brief:
      card_cover:
card_story_title:
       card_link:
            hash: 14956610699933
1 row in set (0.01 sec)
 
 

总结

发现 lightning-3.0.5 偶尔出现 & 替换为 \\Z 的问题,怀疑是某些特殊的字符影响了lightning识别数据。

:call_me_hand: 问一下,看 lightning 的版本为什么会比较低,高版本的有没有试过 (其实建议 lightning版本 最好和集群版本保持一致)

因为一直用的3.0.5低版本lightning导入,也没出现问题,就没有变动。
大佬,帖子中4.0.10的lightning是已经测试描述了, 在这条记录导入的时候没有出现问题。
:smiling_imp:

:full_moon_with_face:感谢分享,看来高版本应该是修复了这个问题的