tidb 数据查询数据不相等,但筛选结果相等

版本: 4.0 5.0
说明:
可见的展示字段值不相等,使用sql筛选相等

数据:

CREATE TABLE `t_test_decimal`  (
  `order_id` bigint(19) NOT NULL,
  `bill_number` varchar(60) NULL,
  PRIMARY KEY (`order_id`)
);

insert into t_test_decimal
values
(766953373240004608,'766953373240004610'),
(766953682666393600,'766953682666393602'),
(766953819945963520,'766953819945963522'),
(766953934056198144,'766953934056198146');

select 
*
from t_test_decimal 
where order_id = bill_number;

结果:

不过这个在mysql上执行也是同样的结果

您好,

这个问题可以看作是一个 MySQL 的已知缺陷,但 MySQL 没有计划修复,请见:https://bugs.mysql.com/bug.php?id=78821。我在 MySQL 5.7.28 和 8.0.18 版本都可以复现。从兼容性上考虑,TiDB 与 MySQL 保持一致。

这个问题的原因可以从 MySQL 的官方文档查看(https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html):

When conversions from string to floating-point and from integer to floating-point occur, they do not necessarily occur the same way. The integer may be converted to floating-point by the CPU, whereas the string is converted digit by digit in an operation that involves floating-point multiplications. Also, results can be affected by factors such as computer architecture or the compiler version or optimization level. One way to avoid such problems is to use CAST() so that a value is not converted implicitly to a float-point number.

也就是说,bigint 与 varchar 比较的时候默认会被转换为浮点,其中转换过程中对精度的处理导致了比较结果相等。如果这个缺陷造成了问题,建议使用显式 cast 来规避,例如:

tidb> select  * from t_test_decimal  where order_id = cast(bill_number as unsigned);
Empty set (0.01 sec)
4 个赞

好的,谢谢。使用cast函数转换会导致索引失效吗?

这种不同类型的比较,由于需要 cast(无论是显式的 cast as unsigned,还是隐式的浮点转换),都会导致索引失效。

同样是上面的 MySQL 文档,您可以看到:

For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:

SELECT * FROM tbl_name WHERE str_col=1;

The reason for this is that there are many different strings that may convert to the value 1 , such as '1' , ' 1' , or '1a' .

对应到 TiDB 中,即便我们在 bill_number 上添加一个索引,比如叫做 b_idx,然后执行查询 “select * from t_test_decimal use index(b_idx) where bill_number = 766953682666393600;”,您可以看到如下的 explain 结果:

tidb> explain select  * from t_test_decimal use index(b_idx) where bill_number = 766953682666393600;
+-------------------------+---------+-----------+------------------------------------------------+------------------------------------------------------------------+
| id                      | estRows | task      | access object                                  | operator info                                                    |
+-------------------------+---------+-----------+------------------------------------------------+------------------------------------------------------------------+
| IndexReader_7           | 3.20    | root      |                                                | index:Selection_6                                                |
| └─Selection_6           | 3.20    | cop[tikv] |                                                | eq(cast(test.t_test_decimal.bill_number), 7.669536826663936e+17) |
|   └─IndexFullScan_5     | 4.00    | cop[tikv] | table:t_test_decimal, index:b_idx(bill_number) | keep order:false, stats:pseudo                                   |
+-------------------------+---------+-----------+------------------------------------------------+------------------------------------------------------------------+
3 rows in set (0.00 sec)

如上,在强制使用 b_idx 的情况下,TiDB 仍然要做一次 Full Index Scan,然后执行 cast bill_number 与 766953682666393600 的比较过滤。可以认为索引并没有起到作用。

1 个赞

嗯嗯,明白了,谢谢。

这个可能从设计开始就需要注意了,感谢分享

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。