多个or 之后 in 会有笛卡儿积

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
v6.1.0(v5.4.1)

【概述】 场景 + 问题概述
CREATE TABLE dashboard_dataset (
dataset_id bigint(20) NOT NULL AUTO_INCREMENT,
user_id varchar(100) NOT NULL,
category_name varchar(100) DEFAULT NULL,
dataset_name varchar(100) DEFAULT NULL,
data_json text DEFAULT NULL,
create_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
update_time datetime DEFAULT NULL,
type varchar(100) DEFAULT NULL,
jobUUid varchar(100) DEFAULT NULL,
PRIMARY KEY (dataset_id) /*T![clustered_index] CLUSTERED */,
KEY user_id (user_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=71472179902385

CREATE TABLE dashboard_user (
user_id varchar(50) NOT NULL,
login_name varchar(100) DEFAULT NULL,
user_name varchar(100) DEFAULT NULL,
user_password varchar(100) DEFAULT NULL,
user_status varchar(100) DEFAULT NULL,
PRIMARY KEY (user_id) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

CREATE TABLE dashboard_user_role (
user_role_id bigint(20) NOT NULL AUTO_INCREMENT,
user_id varchar(100) DEFAULT NULL,
role_id varchar(100) DEFAULT NULL,
PRIMARY KEY (user_role_id) /*T![clustered_index] CLUSTERED */,
KEY user_id (user_id),
KEY role_id (role_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=70447245620785

CREATE TABLE dashboard_role_res (
role_res_id bigint(20) NOT NULL AUTO_INCREMENT,
role_id varchar(100) DEFAULT NULL,
res_type varchar(100) DEFAULT NULL,
res_id bigint(20) DEFAULT NULL,
permission varchar(20) DEFAULT NULL,
PRIMARY KEY (role_res_id) /*T![clustered_index] CLUSTERED */,
KEY role_id (role_id),
KEY res_id (res_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=70471468087217

explain SELECT dataset_id AS id, s.user_id AS userId, dataset_name AS name, data_json AS data, category_name AS categoryName, type AS type, jobUUid AS jobUUid, u.user_name AS userName, u.login_name AS loginName, create_time AS createTime, update_time AS updateTime FROM dashboard_dataset s LEFT JOIN dashboard_user u ON u.user_id = s.user_id WHERE “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” IN (“feb29428-0cf4-4bf6-95a5-4c7a8f54d8ee”,“35738ee8-c9be-4d63-acda-694778ad13d2”) OR s.user_id = “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” OR dataset_id IN ( SELECT res_id FROM dashboard_user_role ur LEFT JOIN dashboard_role_res rr ON ur.role_id = rr.role_id WHERE ur.user_id = “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” AND rr.res_type = ‘dataset’ ) ORDER BY dataset_name
【背景】 做过哪些操作

【现象】 业务和数据库现象
CARTESIAN left outer semi join, other cond:eq(custombi.dashboard_dataset.dataset_id, custombi.dashboard_role_res.res_id)
去掉前面两个or之后就变正常了(两个or在业务中是为了判断是否超级管理员)
【问题】 当前遇到的问题
笛卡儿积导致缓慢
【业务影响】

【TiDB 版本】

【应用软件及版本】

【附件】 相关日志及配置信息

  • TiUP Cluster Display 信息
  • TiUP CLuster Edit config 信息

监控(https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana监控
  • TiDB Grafana 监控
  • TiKV Grafana 监控
  • PD Grafana 监控
  • 对应模块日志(包含问题前后 1 小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

先改写一下SQL试试。把IN里面那个LEFT JOIN改成JOIN,这样改是等价的,看看还报不报笛卡尔积

还是一样的

OK,再试下下面这种写法呢

SELECT 字段列表
FROM dashboard_dataset s LEFT JOIN dashboard_user u ON u.user_id = s.user_id
WHERE xxx
OR xxx

UNION ALL

SELECT 字段列表
FROM dashboard_dataset s
JOIN (SELECT DISTINCT res_id FROM dashboard_user_role ur JOIN dashboard_role_res rr ON ur.role_id = rr.role_id WHERE ur.user_id = “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” AND rr.res_type = ‘dataset’) a ON s.dataset_id = a.res_id
LEFT JOIN dashboard_user u ON u.user_id = s.user_id

ORDER BY dataset_name

SELECT dataset_id AS id, s.user_id AS userId, dataset_name AS name, data_json AS data, category_name AS categoryName, type AS type, jobUUid AS jobUUid, u.user_name AS userName, u.login_name AS loginName, create_time AS createTime, update_time AS updateTime FROM dashboard_dataset s LEFT JOIN dashboard_user u ON u.user_id = s.user_id left join ( SELECT res_id FROM dashboard_user_role ur LEFT JOIN dashboard_role_res rr ON ur.role_id = rr.role_id WHERE ur.user_id = “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” AND rr.res_type = ‘dataset’ ) x on s.dataset_id=x.res_id WHERE “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” IN (“feb29428-0cf4-4bf6-95a5-4c7a8f54d8ee”,“35738ee8-c9be-4d63-acda-694778ad13d2”) OR s.user_id = “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” OR x.res_id is not null ORDER BY dataset_name
我改成这样就好了,就是想知道为什么一开始那个不行,这个应用是从mysql迁过来的,有很多类似sql,一个个改太费劲了,就想看看是什么原因导致的笛卡尔

你这样改不是等价的哦。
从报错来看,估计是优化器认为这里的左连接和半连接的组合会产生笛卡尔积,具体我也没去看过源码。
所以我的思路就是怎么等价改写来避开左连接和半连接的组合

问题不是在左连接和半连接那里,如果去掉那两个or,最后的in就能正常判断为inner join
explain SELECT dataset_id AS id, s.user_id AS userId, dataset_name AS name, data_json AS data, category_name AS categoryName, type AS type, jobUUid AS jobUUid, u.user_name AS userName, u.login_name AS loginName, create_time AS createTime, update_time AS updateTime FROM dashboard_dataset s LEFT JOIN dashboard_user u ON u.user_id = s.user_id WHERE dataset_id IN ( SELECT res_id FROM dashboard_user_role ur JOIN dashboard_role_res rr ON ur.role_id = rr.role_id WHERE ur.user_id = “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7” AND rr.res_type = ‘dataset’ ) ORDER BY dataset_name

inner join, equal:[eq(custombi.dashboard_dataset.dataset_id, custombi.dashboard_role_res.res_id)]

贴一下原始SQL完整的执行计划看看

| id | estRows | task | access object | operator info |
±---------------------------------------------±--------±----------±---------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Sort_15 | 3881.60 | root | | custombi.dashboard_dataset.dataset_name |
| └─Projection_17 | 3881.60 | root | | custombi.dashboard_dataset.dataset_id, custombi.dashboard_dataset.user_id, custombi.dashboard_dataset.dataset_name, custombi.dashboard_dataset.data_json, custombi.dashboard_dataset.category_name, custombi.dashboard_dataset.type, custombi.dashboard_dataset.jobuuid, custombi.dashboard_user.user_name, custombi.dashboard_user.login_name, custombi.dashboard_dataset.create_time, custombi.dashboard_dataset.update_time |
| └─Selection_18 | 3881.60 | root | | or(0, or(eq(custombi.dashboard_dataset.user_id, “80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7”), Column#24)) |
| └─HashJoin_19 | 4852.00 | root | | CARTESIAN left outer semi join, other cond:eq(custombi.dashboard_dataset.dataset_id, custombi.dashboard_role_res.res_id) |
| ├─IndexHashJoin_54(Build) | 2758.80 | root | | inner join, inner:IndexLookUp_51, outer key:custombi.dashboard_user_role.role_id, inner key:custombi.dashboard_role_res.role_id, equal cond:eq(custombi.dashboard_user_role.role_id, custombi.dashboard_role_res.role_id) |
| │ ├─IndexLookUp_97(Build) | 10.00 | root | | |
| │ │ ├─IndexRangeScan_94(Build) | 10.00 | cop[tikv] | table:ur, index:user_id(user_id) | range:[“80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7”,“80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7”], keep order:false |
| │ │ └─Selection_96(Probe) | 10.00 | cop[tikv] | | not(isnull(custombi.dashboard_user_role.role_id)) |
| │ │ └─TableRowIDScan_95 | 10.00 | cop[tikv] | table:ur | keep order:false |
| │ └─IndexLookUp_51(Probe) | 275.88 | root | | |
| │ ├─Selection_49(Build) | 1036.61 | cop[tikv] | | not(isnull(custombi.dashboard_role_res.role_id)) |
| │ │ └─IndexRangeScan_47 | 1036.61 | cop[tikv] | table:rr, index:role_id(role_id) | range: decided by [eq(custombi.dashboard_role_res.role_id, custombi.dashboard_user_role.role_id)], keep order:false |
| │ └─Selection_50(Probe) | 275.88 | cop[tikv] | | eq(custombi.dashboard_role_res.res_type, “dataset”) |
| │ └─TableRowIDScan_48 | 1036.61 | cop[tikv] | table:rr | keep order:false |
| └─HashJoin_33(Probe) | 4852.00 | root | | left outer join, equal:[eq(custombi.dashboard_dataset.user_id, custombi.dashboard_user.user_id)] |
| ├─TableReader_45(Build) | 1254.00 | root | | data:TableFullScan_44 |
| │ └─TableFullScan_44 | 1254.00 | cop[tikv] | table:u | keep order:false |
| └─TableReader_43(Probe) | 4852.00 | root | | data:TableFullScan_42 |
| └─TableFullScan_42 | 4852.00 | cop[tikv] | table:s | keep order:false |
±---------------------------------------------±--------±----------±---------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

tidb的开发规范就是把or改成in()或者union。
image
专栏 - TiDB 数据库开发规范 | TiDB 社区

看了一下执行计划,大概步骤是这样的:
首先"80566854-c62d-4cfb-bd0d-5d7b2cd8d7e7" IN (“feb29428-0cf4-4bf6-95a5-4c7a8f54d8ee”,“35738ee8-c9be-4d63-acda-694778ad13d2”) 这个条件直接被抛弃掉了

1、97~48:ur表和rr表先做NL内连接,并且都走了索引
2、45~33:u表和s表再做Hash外连接,都是全表扫
3、19~18:上述2个结果集进行笛卡尔连接,并进行逻辑OR

所以看下你不带OR的SQL,执行计划是不是表s和ur、rr先做半连接,最后再和u表做外连接?另外这个CARTESIAN只是笛卡尔连接,不会影响你SQL最终返回的结果

1 个赞

该主题在最后一个回复创建后60天后自动关闭。不再允许新的回复。