都是空格惹的祸

背景

通过DM同步Mysql库到Tidb集群,在一个DM-Worker上启动两个task,一个task同步到3.0.14的集群,另一个task同步到4.0.4的集群,两个集群分别给不同的业务部门使用

什么是DM

问题

4.0集群上的开发人员反馈有一个查询SQLselect id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where vehicle_order_id = '184742761197489224';在Tidb上查到的数据不对,少数据了,这个少数据指的是和源头Mysql上的查询结果对比,在Tidb上的查询结果少一条数据。

复现

  • 我用上面SQL在源Mysql上查询结果有两条数据:
dba@ma.e.com:3306 [xx_swo] : Fri Sep 18 08:17:43 2020 >select id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where  vehicle_order_id = '184742761197489224';
+-------+--------------------+-----------------------+---------------------+
| id    | wo_id              | delivery_code         | vehicle_order_id    |
+-------+--------------------+-----------------------+---------------------+
|  3068 | 476295277372793241 | D00620191015122055373 | 184742761197489224  |
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224  |
+-------+--------------------+-----------------------+---------------------+
2 rows in set (0.00 sec)

  • 我用上面SQL在4.0的Tidb上查询确实只查到了一条数据
MySQL [xx_swo]> select id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where  vehicle_order_id = '184742761197489224';
+------+--------------------+-----------------------+--------------------+
| id   | wo_id              | delivery_code         | vehicle_order_id   |
+------+--------------------+-----------------------+--------------------+
| 3068 | 476295277372793241 | D00620191015122055373 | 184742761197489224 |
+------+--------------------+-----------------------+--------------------+
1 row in set (0.01 sec)
  • 然后我又用上面SQL在3.0的Tidb上查询发现也是只有一条数据
MySQL [xx_swo]> select id,wo_id,delivery_code,vehicle_order_id FROM insurance_wo where  vehicle_order_id = '184742761197489224';
+------+--------------------+-----------------------+--------------------+
| id   | wo_id              | delivery_code         | vehicle_order_id   |
+------+--------------------+-----------------------+--------------------+
| 3068 | 476295277372793241 | D00620191015122055373 | 184742761197489224 |
+------+--------------------+-----------------------+--------------------+
1 row in set (0.01 sec)

排查

  • 第一个想到的是不是经过DM同步中间数据丢了呢,然后我把这个表从Mysql上通过Mysqldump导出来,导入到Tidb用上面SQL查询也是少一条数据,那么排除了DM的问题
  • 我把上面信息反馈给PingCAP的同学继续排查,首先通过use index对比查询走索引和表查询的情况,发现走索引查询和表查询结果都一样,都是有问题的

强制走索引的用法

  • 至此PingCAP的技术支持小伙伴需要后端研发人员介入了,又收集了一下Tidb集群的排序规则
    E9C73F96-DB94-4215-BA64-6D1339D60DBF
  • 用Mysql客户端登录Tidb的时候增加两个参数 --column-type-info--comments执行下面SQL:
MySQL [test]> select id,wo_id,delivery_code,vehicle_order_id, vehicle_order_id = '184742761197489224'  FROM insurance_wo where  wo_id = '476959932151783620' ;
Field   1:  `id`
Catalog:    `def`
Database:   `test`
Table:      `insurance_wo`
Org_table:  `insurance_wo`
Type:       LONG
Collation:  binary (63)
Length:     11
Max_length: 5
Decimals:   0
Flags:      NOT_NULL PRI_KEY UNSIGNED AUTO_INCREMENT NUM 

Field   2:  `wo_id`
Catalog:    `def`
Database:   `test`
Table:      `insurance_wo`
Org_table:  `insurance_wo`
Type:       VAR_STRING
Collation:  utf8_bin (83)
Length:     135
Max_length: 18
Decimals:   0
Flags:      NOT_NULL MULTIPLE_KEY NO_DEFAULT_VALUE 

Field   3:  `delivery_code`
Catalog:    `def`
Database:   `test`
Table:      `insurance_wo`
Org_table:  `insurance_wo`
Type:       VAR_STRING
Collation:  utf8_bin (83)
Length:     96
Max_length: 21
Decimals:   0
Flags:      MULTIPLE_KEY 

Field   4:  `vehicle_order_id`
Catalog:    `def`
Database:   `test`
Table:      `insurance_wo`
Org_table:  `insurance_wo`
Type:       VAR_STRING
Collation:  utf8_bin (83)
Length:     96
Max_length: 19
Decimals:   0
Flags:      NOT_NULL MULTIPLE_KEY NO_DEFAULT_VALUE 

Field   5:  `vehicle_order_id = '184742761197489224'`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     1
Max_length: 1
Decimals:   0
Flags:      NOT_NULL BINARY NUM 


+-------+--------------------+-----------------------+---------------------+-----------------------------------------+
| id    | wo_id              | delivery_code         | vehicle_order_id    | vehicle_order_id = '184742761197489224' |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224  |                                       0 |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+
1 row in set (0.01 sec)

–column-type-info 在结果中显示元数据信息
–comments 是否在发送到服务器的语句中剥离或保留注释,默认值为跳过注释;在连接Tidb的时候最好加上这个参数,不然有注释的SQL,比如强制走索引,设置会不生效

  • 通过下面SQL确认where条件vehicle_order_id这个字段值的长度,发现有长度是19位的值,大部分都是18位的,说明19位的数据有点不正常
dba@ma.e.com:3306 [xx_swo]  : Fri Sep 18 14:17:08 2020 >select id,wo_id,delivery_code,vehicle_order_id, vehicle_order_id = '184742761197489224', length(vehicle_order_id)  FROM insurance_wo where  wo_id = 476959932151783620 ;
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+--------------------------+
| id    | wo_id              | delivery_code         | vehicle_order_id    | vehicle_order_id = '184742761197489224' | length(vehicle_order_id) |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+--------------------------+
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224  |                                       1 |                       19 |
| 15558 | 476959932151783624 | D00520191016155810548 | 184742744017614672  |                                       0 |                       18 |
+-------+--------------------+-----------------------+---------------------+-----------------------------------------+--------------------------+
2 rows in set, 2 warnings (0.02 sec)
  • 在Tidb上通过hex函数查看vehicle_order_id的十六进制,看看上面两条数据有啥区别,发现在Tidb上少的这条数据的十六进制的值最后多了一个20,然后在Mysql上查询结果和Tidb上是是一样的。

dba@ma.e.com:3306 [xx_swo]  : Fri Sep 18 14:40:53 2020 >select id,wo_id,delivery_code,vehicle_order_id, hex(vehicle_order_id), hex('184742744017614672')  FROM insurance_wo where  wo_id = '476959932151783624';
+-------+--------------------+-----------------------+--------------------+--------------------------------------+--------------------------------------+
| id    | wo_id              | delivery_code         | vehicle_order_id   | hex(vehicle_order_id)                | hex('184742744017614672')            |
+-------+--------------------+-----------------------+--------------------+--------------------------------------+--------------------------------------+
| 15558 | 476959932151783624 | D00520191016155810548 | 184742744017614672 | 313834373432373434303137363134363732 | 313834373432373434303137363134363732 |
+-------+--------------------+-----------------------+--------------------+--------------------------------------+--------------------------------------+
1 row in set (0.00 sec)

dba@ma.e.com:3306 [xx_swo]  : Fri Sep 18 14:40:55 2020 >select id,wo_id,delivery_code,vehicle_order_id, hex(vehicle_order_id), hex('18474              )  FROM insurance_wo where  wo_id = '476959932151783620';
+-------+--------------------+-----------------------+---------------------+----------------------------------------+--------------------------------------+
| id    | wo_id              | delivery_code         | vehicle_order_id    | hex(vehicle_order_id)                  | hex('184742761197489224')            |
+-------+--------------------+-----------------------+---------------------+----------------------------------------+--------------------------------------+
| 15556 | 476959932151783620 | D00620191015122055373 | 184742761197489224  | 31383437343237363131393734383932323420 | 313834373432373631313937343839323234 |
+-------+--------------------+-----------------------+---------------------+----------------------------------------+--------------------------------------+
  • 通过上面的验证,PingCAP的后端技术人员基本推断这条Tidb上查询不到的数据vehicle_order_id这个字段内容里面应该是有一个空格,然后通过下面方法验证空格的16进制确实是20
MySQL [test]> select hex('');
+----------+
| hex(' ') |
+----------+
| 20       |
+----------+
1 row in set (0.001 sec)
  • OK,至此已经定位到问题的原因是数据内容里面有空格,Tidb和Mysql对空格的处理逻辑不一样,那么如何解决问题呢,有两个方案:
  1. 数据里面有空格是不正常的,从Mysql源头定位到问题并修正,清洗库里面现在的数据把空格去掉,清洗SQL:
UPDATE `insurance_wo`
SET `vehicle_order_id` = REPLACE (
    `vehicle_order_id`,
    ' ',
    ''
);
  1. 新建4.0的Tidb集群并开启新框架的排序规则new_collations_enabled_on_first_bootstrap,重新通过DM同步数据到新集群,业务切割到新集群。

新框架的排序规则目前只有在新建集群的时候才能开启,已经跑了业务的集群无法开启,官方后面会支持老集群开启这个功能
新框架下的排序规则文档

更多学习和交流可以关注我公众号:

%E5%85%AC%E4%BC%97%E5%8F%B7%E5%90%8D%E7%89%87

1赞

感谢您的经验分享,:+1:

客气哈,:smiley: