JSON查询异常,部分数据查询不到

【 TiDB 使用环境】生产环境
【 TiDB 版本】8.1.0
【遇到的问题:问题现象及影响】

select *
from test
where json_extract(trackid, '$[0]') = 9;

select *
from test
where json_extract(trackid, '$[0]') = 9
order by id;

上面的两条SQL查询的表和条件完全一样,不加order的情况下查询出来的数据不全。第一个sql查询出来的结果是100多条,第二个SQL查询的结果是300多条。其中trackid字段为JSON格式,值为整数数组。

1 个赞

你能都count下吗?因为只是多了个排序不应该,所以我想看看数据差多少

select
json_extract(trackid, ‘$[0]’)
from
test
where json_extract(trackid, ‘$[0]’) = 9;

两个字段都查下,然后两个的执行计划都发下

1 个赞

我把json中的第一个值提取出来加了一个虚拟列(我们也只存了一个值),现在这两个查询一致了,原来不加排序是113,加了排序是306.

查询计划如下:

explain select *
from test
where json_extract(trackid, '$[0]') = 9 ;

TableReader_12	17586.40	root		MppVersion: 2, data:ExchangeSender_11
└─ExchangeSender_11	17586.40	mpp[tiflash]		ExchangeType: PassThrough
  └─Selection_10	17586.40	mpp[tiflash]		eq(json_extract(test.test.trackid, "$[0]"), cast(9, json BINARY))
    └─TableFullScan_9	21983.00	mpp[tiflash]	table:test	pushed down filter:empty, keep order:false

explain select *
from test
where json_extract(trackid, '$[0]') = 9 order by id;

TableReader_21	17586.40	root		data:Selection_20
└─Selection_20	17586.40	cop[tikv]		eq(json_extract(test.test.trackid, "$[0]"), cast(9, json BINARY))
  └─TableFullScan_19	21983.00	cop[tikv]	table:test	keep order:true

explain select json_extract(trackid, '$[0]')
from test
where json_extract(trackid, '$[0]') = 9;

Projection_4	17586.40	root		json_extract(test.test.trackid, $[0])->Column#51
└─TableReader_12	17586.40	root		MppVersion: 2, data:ExchangeSender_11
  └─ExchangeSender_11	17586.40	mpp[tiflash]		ExchangeType: PassThrough
    └─Selection_10	17586.40	mpp[tiflash]		eq(json_extract(test.test.trackid, "$[0]"), cast(9, json BINARY))
      └─TableFullScan_9	21983.00	mpp[tiflash]	table:test	pushed down filter:empty, keep order:false

explain select json_extract(trackid, '$[0]')
from test
where json_extract(trackid, '$[0]') = 9 order by id;

Projection_6	17586.40	root		Column#51->Column#52
└─Projection_23	17586.40	root		json_extract(test.test.trackid, $[0])->Column#51, test.test.id
  └─TableReader_27	17586.40	root		data:Selection_26
    └─Selection_26	17586.40	cop[tikv]		eq(json_extract(test.test.trackid, "$[0]"), cast(9, json BINARY))
      └─TableFullScan_25	21983.00	cop[tikv]	table:test	keep order:true

看eq是一样的。就差个order理论上应该相同啊

走的存储引擎不一样,一个 tikv 一个 tiflash

恩,select算子的表达式相同,我觉得不太需要关注走了哪个引擎,结果应该是相同的。。

就这不一样,你不关注 :flushed:

1 个赞

不加order by走了tiflash,加了order by走的tikv。
应该是其中一个存储引擎的计算结果有问题。不过不清楚那个是正确的。

我感觉是个bug。

无论哪个引擎,应该可以保证数据一致,返回结果一致吧

没理由不一致的,不是工具问题就是BUG了

复制出来一张表,把后来加的虚拟列去掉之后问题复现:

select *
from test
where json_extract(trackid, '$[0]') = 9;

返回113条数据,执行计划如下:

TableReader_16	17588.00	root		MppVersion: 2, data:ExchangeSender_15
└─ExchangeSender_15	17588.00	mpp[tiflash]		ExchangeType: PassThrough
  └─Selection_14	17588.00	mpp[tiflash]		eq(json_extract(mbase.test.trackid, "$[0]"), cast(9, json BINARY))
    └─TableFullScan_13	21985.00	mpp[tiflash]	table:test	pushed down filter:empty, keep order:false, stats:pseudo

在未开启TiFlash副本的时候此SQL执行正常。

加了order by之后的查询:

select *
from test
where json_extract(trackid, '$[0]') = 9 order by id;

返回306条数据,执行计划如下:

TableReader_24	17588.00	root		data:Selection_23
└─Selection_23	17588.00	cop[tikv]		eq(json_extract(mbase.test.trackid, "$[0]"), cast(9, json BINARY))
  └─TableFullScan_22	21985.00	cop[tikv]	table:test	keep order:true, stats:pseudo

另外原来的表添加了虚拟列之后貌似TiFlash副本实际增加了此列,从而恢复了查询结果正常:

添加虚拟列SQL:
alter table test add column track_id INT(64) AS (JSON_EXTRACT(trackid, '$[0]')) after trackid;

select *
from test
where json_extract(trackid, '$[0]') = 9;

返回306条数据,执行计划如下:
TableReader_12	17586.40	root		MppVersion: 2, data:ExchangeSender_11
└─ExchangeSender_11	17586.40	mpp[tiflash]		ExchangeType: PassThrough
  └─Selection_10	17586.40	mpp[tiflash]		eq(json_extract(test.test.trackid, "$[0]"), cast(9, json BINARY))
    └─TableFullScan_9	21983.00	mpp[tiflash]	table:test	pushed down filter:empty, keep order:false


select *
from test
where track_id = 9;

返回306条数据,执行计划如下:

Selection_12	0.00	root		eq(test.test.track_id, 9)
└─TableReader_11	21983.00	root		MppVersion: 2, data:ExchangeSender_10
  └─ExchangeSender_10	21983.00	mpp[tiflash]		ExchangeType: PassThrough
    └─TableFullScan_9	21983.00	mpp[tiflash]	table:test	keep order:false

附原始建表语句参考:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `trackid` json NOT NULL COMMENT '赛道ID',
  `name` varchar(45) NOT NULL COMMENT '名称',
  `createtime` datetime NOT NULL COMMENT '创建时间',
  `updatetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  UNIQUE KEY `uniq_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![auto_id_cache] AUTO_ID_CACHE=1 */ COMMENT='信息表' /*T![placement] PLACEMENT POLICY=`storageonnvme` */

让两条语句都在只使用tikv引擎的情况下执行下,看看执行计划和执行结果是怎样的。用如下命令设置下使用引擎的会话参数值,然后在同一个会话执行你的2条语句。

set @@session.tidb_isolation_read_engines='tikv';

有道理,感觉直接加hint也可以

可以导出几条有113和306数据的包含的记录,从建表开始可以重现的场景吗?

看看执行的引擎一样吗

虽然一个走kv,一个走flash,但结果应该是一样的才对,感觉是bug?

tiflash 有延迟吗? 是不是还没同步到tiflash,

TiFlash副本的Available和Progress都是1