4.0.8版本BUG,不兼容 not in 语法

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0.8
  • 【问题描述】:该问题在3.0.3版本不存在。升级到4.0.8之后出现。

查询语句:

select JSON_EXTRACT(all_json,‘$.“properties”.“video_id”’) as video_id,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168,1,0)) as play_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=216,1,0)) as played_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=221,1,0)) as played95_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=229,1,0)) as originality_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=223,1,0)) as fav_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=242,1,0)) as download_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“recommend”’)>0,1,0)) as recommand_count,
sum(case when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=2 then 0.5 when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=3 then 1 else 0 end) as play_count_pro,
sum(case when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=216 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=2 then 0.5 when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=3 then 1 else 0 end) as played_count_pro,
sum(case when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=221 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=2 then 0.5 when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=3 then 1 else 0 end) as played95_count_pro
from tvcbook where event=‘videoPlay’ and date in (“2020-12-14”,“2020-12-15”,“2020-12-16”,“2020-12-17”,“2020-12-18”,“2020-12-19”,“2020-12-20”) and remark=‘production’ and

JSON_EXTRACT(all_json,‘$.“properties”.“author_user_id”’) not in (‘17255d5e7ae38559’,‘1d6e526952b43de9’,‘7b0abb9b6fc5d193’)

group by video_id

其中斜体字部分,如果括号内只保留一个,则sql执行正常,如果多于一个,则报错

1105 - Illegal Json text: Error(“trailing characters”, line: 1, column: 6)

还有一个问题比较奇怪,但不是这次bug的重点,mysql查询都是 date>= and date<= 这样用性能高于 date in ()。但是在tidb上同样的表结构,就是date in ()性能更好。

临时解决方案:
该问题目前把 not in 换成 多个!= 用and链接跑了,如下方的sql。但是在v3.0.3版本是可以支持not in 多个的,是升级v4.0.8后出现的问题。

select JSON_EXTRACT(all_json,‘$.“properties”.“video_id”’) as video_id,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168,1,0)) as play_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=216,1,0)) as played_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=221,1,0)) as played95_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=229,1,0)) as originality_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=223,1,0)) as fav_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=242,1,0)) as download_count,
sum(if(JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“recommend”’)>0,1,0)) as recommand_count,
sum(case when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=2 then 0.5 when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=3 then 1 else 0 end) as play_count_pro,
sum(case when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=216 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=2 then 0.5 when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=3 then 1 else 0 end) as played_count_pro,
sum(case when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=221 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=2 then 0.5 when JSON_EXTRACT(all_json,‘$.“properties”.“action”’)=168 and JSON_EXTRACT(all_json,‘$.“properties”.“author_vip_id”’)=3 then 1 else 0 end) as played95_count_pro
from tvcbook where event=‘videoPlay’ and date in (“2020-12-14”,“2020-12-15”,“2020-12-16”,“2020-12-17”,“2020-12-18”,“2020-12-19”,“2020-12-20”) and remark=‘production’
and JSON_EXTRACT(all_json,‘$.“properties”.“author_user_id”’) !=‘17255d5e7ae38559’
and JSON_EXTRACT(all_json,‘$.“properties”.“author_user_id”’) !=‘1d6e526952b43de9’
and JSON_EXTRACT(all_json,‘$.“properties”.“author_user_id”’) !=‘7b0abb9b6fc5d193’
group by video_id

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出打印结果,请务必全选并复制粘贴上传。

你好,是否方便发一下表结构 show create table tvcbook,我测一下 v4.0.8 版本

感谢。表结构如下:

CREATE TABLE tvcbook (
track_id bigint(17) DEFAULT NULL,
distinct_id varchar(64) DEFAULT NULL,
lib varchar(255) DEFAULT NULL,
event varchar(255) DEFAULT NULL,
type varchar(255) DEFAULT NULL,
all_json json DEFAULT NULL,
host varchar(255) DEFAULT NULL,
user_agent varchar(2048) DEFAULT NULL,
ua_platform varchar(1024) DEFAULT NULL,
ua_browser varchar(1024) DEFAULT NULL,
ua_version varchar(1024) DEFAULT NULL,
ua_language varchar(1024) DEFAULT NULL,
connection varchar(255) DEFAULT NULL,
pragma varchar(255) DEFAULT NULL,
cache_control varchar(255) DEFAULT NULL,
accept varchar(255) DEFAULT NULL,
accept_encoding varchar(255) DEFAULT NULL,
accept_language varchar(255) DEFAULT NULL,
ip varchar(512) DEFAULT NULL,
ip_city json DEFAULT NULL,
ip_asn json DEFAULT NULL,
url text DEFAULT NULL,
referrer varchar(2048) DEFAULT NULL,
remark varchar(255) DEFAULT NULL,
created_at int(11) DEFAULT NULL,
date date DEFAULT NULL,
hour int(2) DEFAULT NULL,
KEY date (date),
KEY distinct_id (distinct_id),
KEY event (event),
KEY date_hour (date,hour),
KEY event_date (event,date),
KEY event_remark_date (event,remark,date)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

建空表跑上面的两个查询 4.0.8 都没有报错,可能和具体数据有关,可以对比下执行计划有没有差异,尝试将 not in 换成其他数值跑一下试试