v6.5.2(包括目前的7.1)版本默认配置虽然利用paging能力解决了region过大导致大量数据返回给tidb-server导致的oom问题,但是即使使用paging能力还是存在如下问题:
-
- 默认tidb_enable_chunk_rpc=ON下,数据类型内存放大问题,尤其decimal极端场景下会放大40倍。
-
- union all、tidb_index_lookup_concurrency、tidb_index_lookup_size、tidb_hash_join_concurrency等并发参数导致distsql请求的放大。
-
- tidb_min_paging_size、tidb_max_paging_size,行大小不固定,如果大量decimal类型数据返回会导致内存急剧升高。
-
- hashAgg、sort等落盘速度跟不上tikv传过来数据的速度导致RPC返回来的数据大量堆积。
-
- hashJoin、hashAgg等虽然可以落盘,但是行指针均存放在内存中,数据量过大时无法全部缓存指针也会发生OOM,这种在实际生产中有些较大数据量处理还是很容易发生,
发生了再调整参数属于事后处理而且不知道哪些需要调调多少。该场景需要较大数据量模拟,暂不测试。
相关案例参考:当HashJoin的BuildSide过大时容易OOM
- hashJoin、hashAgg等虽然可以落盘,但是行指针均存放在内存中,数据量过大时无法全部缓存指针也会发生OOM,这种在实际生产中有些较大数据量处理还是很容易发生,
总结为:内存放大、并发放大、一个paging(多行)内存不固定、tikv-client缓存数据未与算子分隔内存管控太粗导致的语句级OOM问题。
根据该特性模拟较为极端SQL语句,希望产品可以优化能让下面这种语句在默认配置下可以不发生OOM(慢慢执行,不发生OOM,最终执行完毕)。
涉及默认配置项为:
mysql> show variables like '%tidb_mem_quota_query%';
+----------------------+------------+
| Variable_name | Value |
+----------------------+------------+
| tidb_mem_quota_query | 1073741824 |
+----------------------+------------+
1 row in set (0.00 sec)
mysql> show variables like 'tidb_enable_chunk_rpc';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| tidb_enable_chunk_rpc | ON |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'tidb_executor_concurrency';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| tidb_executor_concurrency | 5 |
+---------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'tidb_distsql_scan_concurrency';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| tidb_distsql_scan_concurrency | 15 |
+-------------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'tidb_index_lookup_size';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| tidb_index_lookup_size | 20000 |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'tidb_index_lookup_concurrency';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| tidb_index_lookup_concurrency | -1 |
+-------------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like '%paging%';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| tidb_enable_paging | ON |
| tidb_max_paging_size | 50000 |
| tidb_min_paging_size | 128 |
+----------------------+-------+
3 rows in set (0.01 sec)
加载tpch为1的测试数据,并对lineitem表添加decimal字段
time tiup bench tpch -D tpch1 -H 192.168.31.201 -P 4000 -U root -p '' -T 5 prepare --ignore-error --sf=1
--添加decimal字段(内存放大效应)
alter table lineitem add column c1 decimal(5,2) not null default 1.0;
alter table lineitem add column c2 decimal(5,2) not null default 1.0;
alter table lineitem add column c3 decimal(5,2) not null default 1.0;
alter table lineitem add column c4 decimal(5,2) not null default 1.0;
alter table lineitem add column c5 decimal(5,2) not null default 1.0;
alter table lineitem add column c6 decimal(5,2) not null default 1.0;
alter table lineitem add column c7 decimal(5,2) not null default 1.0;
alter table lineitem add column c8 decimal(5,2) not null default 1.0;
alter table lineitem add column c9 decimal(5,2) not null default 1.0;
alter table lineitem add column c10 decimal(5,2) not null default 1.0;
alter table lineitem add column c11 decimal(5,2) not null default 1.0;
alter table lineitem add column c12 decimal(5,2) not null default 1.0;
alter table lineitem add column c13 decimal(5,2) not null default 1.0;
alter table lineitem add column c14 decimal(5,2) not null default 1.0;
alter table lineitem add column c15 decimal(5,2) not null default 1.0;
alter table lineitem add column c16 decimal(5,2) not null default 1.0;
alter table lineitem add column c17 decimal(5,2) not null default 1.0;
alter table lineitem add column c18 decimal(5,2) not null default 1.0;
alter table lineitem add column c19 decimal(5,2) not null default 1.0;
alter table lineitem add column c20 decimal(5,2) not null default 1.0;
--添加索引(索引
alter table lineitem add index idx1 (L_RETURNFLAG,L_SHIPDATE);
表结构如下:
mysql> show create table lineitem \G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`L_ORDERKEY` bigint(20) NOT NULL,
`L_PARTKEY` bigint(20) NOT NULL,
`L_SUPPKEY` bigint(20) NOT NULL,
`L_LINENUMBER` bigint(20) NOT NULL,
`L_QUANTITY` decimal(15,2) NOT NULL,
`L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
`L_DISCOUNT` decimal(15,2) NOT NULL,
`L_TAX` decimal(15,2) NOT NULL,
`L_RETURNFLAG` char(1) NOT NULL,
`L_LINESTATUS` char(1) NOT NULL,
`L_SHIPDATE` date NOT NULL,
`L_COMMITDATE` date NOT NULL,
`L_RECEIPTDATE` date NOT NULL,
`L_SHIPINSTRUCT` char(25) NOT NULL,
`L_SHIPMODE` char(10) NOT NULL,
`L_COMMENT` varchar(44) NOT NULL,
`c1` decimal(5,2) NOT NULL DEFAULT '1.0',
`c2` decimal(5,2) NOT NULL DEFAULT '1.0',
`c3` decimal(5,2) NOT NULL DEFAULT '1.0',
`c4` decimal(5,2) NOT NULL DEFAULT '1.0',
`c5` decimal(5,2) NOT NULL DEFAULT '1.0',
`c6` decimal(5,2) NOT NULL DEFAULT '1.0',
`c7` decimal(5,2) NOT NULL DEFAULT '1.0',
`c8` decimal(5,2) NOT NULL DEFAULT '1.0',
`c9` decimal(5,2) NOT NULL DEFAULT '1.0',
`c10` decimal(5,2) NOT NULL DEFAULT '1.0',
`c11` decimal(5,2) NOT NULL DEFAULT '1.0',
`c12` decimal(5,2) NOT NULL DEFAULT '1.0',
`c13` decimal(5,2) NOT NULL DEFAULT '1.0',
`c14` decimal(5,2) NOT NULL DEFAULT '1.0',
`c15` decimal(5,2) NOT NULL DEFAULT '1.0',
`c16` decimal(5,2) NOT NULL DEFAULT '1.0',
`c17` decimal(5,2) NOT NULL DEFAULT '1.0',
`c18` decimal(5,2) NOT NULL DEFAULT '1.0',
`c19` decimal(5,2) NOT NULL DEFAULT '1.0',
`c20` decimal(5,2) NOT NULL DEFAULT '1.0',
PRIMARY KEY (`L_ORDERKEY`,`L_LINENUMBER`) /*T![clustered_index] CLUSTERED */,
KEY `idx1` (`L_RETURNFLAG`,`L_SHIPDATE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)
--表数据量为:
mysql> select count(*) from lineitem;
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
1 row in set (0.67 sec)
mysql> select count(*) from orders;
+----------+
| count(*) |
+----------+
| 1500000 |
+----------+
1 row in set (0.24 sec)
mysql> select count(*) from customer;
+----------+
| count(*) |
+----------+
| 150000 |
+----------+
1 row in set (0.11 sec)
--具体可以创建tpch --sf=1 的测试数据来观察。
测试SQL语句
--语句中包含union all 多个union all语句加剧distsql_scan
--语句最外层做笛卡尔积,但是probe端记录数为1000,build端数据为644207 * 5 = 3221035,因此即使笛卡尔积总扫描次数为:3221035 * 1000= 32亿次,不算多,理论上可以跑的完。
--做笛卡尔积的原因是,在较多场景中会存在此类情况,比如数据的多对多,null情况下的反半连接,错误的执行计划(评估数据量较少,可能走笛卡尔积更优)等等,笛卡尔积形式如果可以管控住其它较多场景也就涵盖住了。
explain analyze select /*+ HASH_JOIN_BUILD(b) */ * from
(select /*+ hash_join(a,b,c) */ a.*,c.* from customer a left join orders b on a.C_CUSTKEY=b.O_ORDERKEY left join orders c on b.O_ORDERKEY=c.O_ORDERKEY limit 1000) a
left join
(
select * from (select /*+use_index(lineitem,idx1) */ row_number()over(partition by L_ORDERKEY) as nbr ,lineitem.* from lineitem where L_RETURNFLAG=(select L_RETURNFLAG from lineitem limit 1) and L_SHIPDATE < date_sub((select max(L_SHIPDATE) from lineitem),interval 1 year)) a where nbr=1
union all
select * from (select /*+use_index(lineitem,idx1) */ row_number()over(partition by L_ORDERKEY) as nbr ,lineitem.* from lineitem where L_RETURNFLAG=(select L_RETURNFLAG from lineitem limit 1) and L_SHIPDATE < date_sub((select max(L_SHIPDATE) from lineitem),interval 1 year)) a where nbr=1
union all
select * from (select /*+use_index(lineitem,idx1) */ row_number()over(partition by L_ORDERKEY) as nbr ,lineitem.* from lineitem where L_RETURNFLAG=(select L_RETURNFLAG from lineitem limit 1) and L_SHIPDATE < date_sub((select max(L_SHIPDATE) from lineitem),interval 1 year)) a where nbr=1
union all
select * from (select /*+use_index(lineitem,idx1) */ row_number()over(partition by L_ORDERKEY) as nbr ,lineitem.* from lineitem where L_RETURNFLAG=(select L_RETURNFLAG from lineitem limit 1) and L_SHIPDATE < date_sub((select max(L_SHIPDATE) from lineitem),interval 1 year)) a where nbr=1
union all
select * from (select /*+use_index(lineitem,idx1) */ row_number()over(partition by L_ORDERKEY) as nbr ,lineitem.* from lineitem where L_RETURNFLAG=(select L_RETURNFLAG from lineitem limit 1) and L_SHIPDATE < date_sub((select max(L_SHIPDATE) from lineitem),interval 1 year)) a where nbr=1
) b on 1=1
order by b.L_QUANTITY desc,b.L_EXTENDEDPRICE asc,b.L_DISCOUNT desc limit 10000000,10;
整个语句的执行计划:
+-----------------------------------------------------+---------------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------------------------------+---------------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TopN_273 | 10.00 | root | | Column#597:desc, Column#598, Column#599:desc, offset:10000000, count:10 |
| └─HashJoin_278 | 5502754848.24 | root | | CARTESIAN left outer join |
| ├─Union_316(Build) | 5502754.85 | root | | |
| │ ├─Projection_317 | 1100550.97 | root | | Column#139, tpch1.lineitem.l_orderkey, tpch1.lineitem.l_partkey, tpch1.lineitem.l_suppkey, tpch1.lineitem.l_linenumber, tpch1.lineitem.l_quantity, tpch1.lineitem.l_extendedprice, tpch1.lineitem.l_discount, tpch1.lineitem.l_tax, tpch1.lineitem.l_returnflag, tpch1.lineitem.l_linestatus, tpch1.lineitem.l_shipdate, tpch1.lineitem.l_commitdate, tpch1.lineitem.l_receiptdate, tpch1.lineitem.l_shipinstruct, tpch1.lineitem.l_shipmode, tpch1.lineitem.l_comment, tpch1.lineitem.c1, tpch1.lineitem.c2, tpch1.lineitem.c3, tpch1.lineitem.c4, tpch1.lineitem.c5, tpch1.lineitem.c6, tpch1.lineitem.c7, tpch1.lineitem.c8, tpch1.lineitem.c9, tpch1.lineitem.c10, tpch1.lineitem.c11, tpch1.lineitem.c12, tpch1.lineitem.c13, tpch1.lineitem.c14, tpch1.lineitem.c15, tpch1.lineitem.c16, tpch1.lineitem.c17, tpch1.lineitem.c18, tpch1.lineitem.c19, tpch1.lineitem.c20 |
| │ │ └─Selection_318 | 1100550.97 | root | | eq(Column#139, 1) |
| │ │ └─Shuffle_324 | 1375688.71 | root | | execution info: concurrency:5, data sources:[IndexLookUp_322] |
| │ │ └─Window_319 | 1375688.71 | root | | row_number()->Column#139 over(partition by tpch1.lineitem.l_orderkey rows between current row and current row) |
| │ │ └─Sort_323 | 1375688.71 | root | | tpch1.lineitem.l_orderkey |
| │ │ └─IndexLookUp_322 | 1375688.71 | root | | |
| │ │ ├─IndexRangeScan_320(Build) | 1375688.71 | cop[tikv] | table:lineitem, index:idx1(L_RETURNFLAG, L_SHIPDATE) | range:["A" -inf,"A" 1997-12-01), keep order:false |
| │ │ └─TableRowIDScan_321(Probe) | 1375688.71 | cop[tikv] | table:lineitem | keep order:false |
| │ ├─Projection_325 | 1100550.97 | root | | Column#252, tpch1.lineitem.l_orderkey, tpch1.lineitem.l_partkey, tpch1.lineitem.l_suppkey, tpch1.lineitem.l_linenumber, tpch1.lineitem.l_quantity, tpch1.lineitem.l_extendedprice, tpch1.lineitem.l_discount, tpch1.lineitem.l_tax, tpch1.lineitem.l_returnflag, tpch1.lineitem.l_linestatus, tpch1.lineitem.l_shipdate, tpch1.lineitem.l_commitdate, tpch1.lineitem.l_receiptdate, tpch1.lineitem.l_shipinstruct, tpch1.lineitem.l_shipmode, tpch1.lineitem.l_comment, tpch1.lineitem.c1, tpch1.lineitem.c2, tpch1.lineitem.c3, tpch1.lineitem.c4, tpch1.lineitem.c5, tpch1.lineitem.c6, tpch1.lineitem.c7, tpch1.lineitem.c8, tpch1.lineitem.c9, tpch1.lineitem.c10, tpch1.lineitem.c11, tpch1.lineitem.c12, tpch1.lineitem.c13, tpch1.lineitem.c14, tpch1.lineitem.c15, tpch1.lineitem.c16, tpch1.lineitem.c17, tpch1.lineitem.c18, tpch1.lineitem.c19, tpch1.lineitem.c20 |
| │ │ └─Selection_326 | 1100550.97 | root | | eq(Column#252, 1) |
| │ │ └─Shuffle_332 | 1375688.71 | root | | execution info: concurrency:5, data sources:[IndexLookUp_330] |
| │ │ └─Window_327 | 1375688.71 | root | | row_number()->Column#252 over(partition by tpch1.lineitem.l_orderkey rows between current row and current row) |
| │ │ └─Sort_331 | 1375688.71 | root | | tpch1.lineitem.l_orderkey |
| │ │ └─IndexLookUp_330 | 1375688.71 | root | | |
| │ │ ├─IndexRangeScan_328(Build) | 1375688.71 | cop[tikv] | table:lineitem, index:idx1(L_RETURNFLAG, L_SHIPDATE) | range:["A" -inf,"A" 1997-12-01), keep order:false |
| │ │ └─TableRowIDScan_329(Probe) | 1375688.71 | cop[tikv] | table:lineitem | keep order:false |
| │ ├─Projection_333 | 1100550.97 | root | | Column#365, tpch1.lineitem.l_orderkey, tpch1.lineitem.l_partkey, tpch1.lineitem.l_suppkey, tpch1.lineitem.l_linenumber, tpch1.lineitem.l_quantity, tpch1.lineitem.l_extendedprice, tpch1.lineitem.l_discount, tpch1.lineitem.l_tax, tpch1.lineitem.l_returnflag, tpch1.lineitem.l_linestatus, tpch1.lineitem.l_shipdate, tpch1.lineitem.l_commitdate, tpch1.lineitem.l_receiptdate, tpch1.lineitem.l_shipinstruct, tpch1.lineitem.l_shipmode, tpch1.lineitem.l_comment, tpch1.lineitem.c1, tpch1.lineitem.c2, tpch1.lineitem.c3, tpch1.lineitem.c4, tpch1.lineitem.c5, tpch1.lineitem.c6, tpch1.lineitem.c7, tpch1.lineitem.c8, tpch1.lineitem.c9, tpch1.lineitem.c10, tpch1.lineitem.c11, tpch1.lineitem.c12, tpch1.lineitem.c13, tpch1.lineitem.c14, tpch1.lineitem.c15, tpch1.lineitem.c16, tpch1.lineitem.c17, tpch1.lineitem.c18, tpch1.lineitem.c19, tpch1.lineitem.c20 |
| │ │ └─Selection_334 | 1100550.97 | root | | eq(Column#365, 1) |
| │ │ └─Shuffle_340 | 1375688.71 | root | | execution info: concurrency:5, data sources:[IndexLookUp_338] |
| │ │ └─Window_335 | 1375688.71 | root | | row_number()->Column#365 over(partition by tpch1.lineitem.l_orderkey rows between current row and current row) |
| │ │ └─Sort_339 | 1375688.71 | root | | tpch1.lineitem.l_orderkey |
| │ │ └─IndexLookUp_338 | 1375688.71 | root | | |
| │ │ ├─IndexRangeScan_336(Build) | 1375688.71 | cop[tikv] | table:lineitem, index:idx1(L_RETURNFLAG, L_SHIPDATE) | range:["A" -inf,"A" 1997-12-01), keep order:false |
| │ │ └─TableRowIDScan_337(Probe) | 1375688.71 | cop[tikv] | table:lineitem | keep order:false |
| │ ├─Projection_341 | 1100550.97 | root | | Column#478, tpch1.lineitem.l_orderkey, tpch1.lineitem.l_partkey, tpch1.lineitem.l_suppkey, tpch1.lineitem.l_linenumber, tpch1.lineitem.l_quantity, tpch1.lineitem.l_extendedprice, tpch1.lineitem.l_discount, tpch1.lineitem.l_tax, tpch1.lineitem.l_returnflag, tpch1.lineitem.l_linestatus, tpch1.lineitem.l_shipdate, tpch1.lineitem.l_commitdate, tpch1.lineitem.l_receiptdate, tpch1.lineitem.l_shipinstruct, tpch1.lineitem.l_shipmode, tpch1.lineitem.l_comment, tpch1.lineitem.c1, tpch1.lineitem.c2, tpch1.lineitem.c3, tpch1.lineitem.c4, tpch1.lineitem.c5, tpch1.lineitem.c6, tpch1.lineitem.c7, tpch1.lineitem.c8, tpch1.lineitem.c9, tpch1.lineitem.c10, tpch1.lineitem.c11, tpch1.lineitem.c12, tpch1.lineitem.c13, tpch1.lineitem.c14, tpch1.lineitem.c15, tpch1.lineitem.c16, tpch1.lineitem.c17, tpch1.lineitem.c18, tpch1.lineitem.c19, tpch1.lineitem.c20 |
| │ │ └─Selection_342 | 1100550.97 | root | | eq(Column#478, 1) |
| │ │ └─Shuffle_348 | 1375688.71 | root | | execution info: concurrency:5, data sources:[IndexLookUp_346] |
| │ │ └─Window_343 | 1375688.71 | root | | row_number()->Column#478 over(partition by tpch1.lineitem.l_orderkey rows between current row and current row) |
| │ │ └─Sort_347 | 1375688.71 | root | | tpch1.lineitem.l_orderkey |
| │ │ └─IndexLookUp_346 | 1375688.71 | root | | |
| │ │ ├─IndexRangeScan_344(Build) | 1375688.71 | cop[tikv] | table:lineitem, index:idx1(L_RETURNFLAG, L_SHIPDATE) | range:["A" -inf,"A" 1997-12-01), keep order:false |
| │ │ └─TableRowIDScan_345(Probe) | 1375688.71 | cop[tikv] | table:lineitem | keep order:false |
| │ └─Projection_349 | 1100550.97 | root | | Column#591, tpch1.lineitem.l_orderkey, tpch1.lineitem.l_partkey, tpch1.lineitem.l_suppkey, tpch1.lineitem.l_linenumber, tpch1.lineitem.l_quantity, tpch1.lineitem.l_extendedprice, tpch1.lineitem.l_discount, tpch1.lineitem.l_tax, tpch1.lineitem.l_returnflag, tpch1.lineitem.l_linestatus, tpch1.lineitem.l_shipdate, tpch1.lineitem.l_commitdate, tpch1.lineitem.l_receiptdate, tpch1.lineitem.l_shipinstruct, tpch1.lineitem.l_shipmode, tpch1.lineitem.l_comment, tpch1.lineitem.c1, tpch1.lineitem.c2, tpch1.lineitem.c3, tpch1.lineitem.c4, tpch1.lineitem.c5, tpch1.lineitem.c6, tpch1.lineitem.c7, tpch1.lineitem.c8, tpch1.lineitem.c9, tpch1.lineitem.c10, tpch1.lineitem.c11, tpch1.lineitem.c12, tpch1.lineitem.c13, tpch1.lineitem.c14, tpch1.lineitem.c15, tpch1.lineitem.c16, tpch1.lineitem.c17, tpch1.lineitem.c18, tpch1.lineitem.c19, tpch1.lineitem.c20 |
| │ └─Selection_350 | 1100550.97 | root | | eq(Column#591, 1) |
| │ └─Shuffle_356 | 1375688.71 | root | | execution info: concurrency:5, data sources:[IndexLookUp_354] |
| │ └─Window_351 | 1375688.71 | root | | row_number()->Column#591 over(partition by tpch1.lineitem.l_orderkey rows between current row and current row) |
| │ └─Sort_355 | 1375688.71 | root | | tpch1.lineitem.l_orderkey |
| │ └─IndexLookUp_354 | 1375688.71 | root | | |
| │ ├─IndexRangeScan_352(Build) | 1375688.71 | cop[tikv] | table:lineitem, index:idx1(L_RETURNFLAG, L_SHIPDATE) | range:["A" -inf,"A" 1997-12-01), keep order:false |
| │ └─TableRowIDScan_353(Probe) | 1375688.71 | cop[tikv] | table:lineitem | keep order:false |
| └─Limit_281(Probe) | 1000.00 | root | | offset:0, count:1000 |
| └─HashJoin_292 | 1000.00 | root | | left outer join, equal:[eq(tpch1.orders.o_orderkey, tpch1.orders.o_orderkey)] |
| ├─Limit_295(Build) | 1000.00 | root | | offset:0, count:1000 |
| │ └─HashJoin_306 | 1000.00 | root | | left outer join, equal:[eq(tpch1.customer.c_custkey, tpch1.orders.o_orderkey)] |
| │ ├─Limit_307(Build) | 1000.00 | root | | offset:0, count:1000 |
| │ │ └─TableReader_311 | 1000.00 | root | | data:Limit_310 |
| │ │ └─Limit_310 | 1000.00 | cop[tikv] | | offset:0, count:1000 |
| │ │ └─TableFullScan_309 | 1000.00 | cop[tikv] | table:a | keep order:false |
| │ └─TableReader_313(Probe) | 1872736.00 | root | | data:TableFullScan_312 |
| │ └─TableFullScan_312 | 1872736.00 | cop[tikv] | table:b | keep order:false |
| └─TableReader_315(Probe) | 1872736.00 | root | | data:TableFullScan_314 |
| └─TableFullScan_314 | 1872736.00 | cop[tikv] | table:c | keep order:false |
+-----------------------------------------------------+---------------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
55 rows in set (3.48 sec)
其中HashJoin_271的Build端是由多个union all组成,union中的一个子语句(如:Projection_310)的执行信息如下:
mysql> explain analyze select * from (select /*+use_index(lineitem,idx1) */ row_number()over(partition by L_ORDERKEY) as nbr ,lineitem.* from lineitem where L_RETURNFLAG=(select L_RETURNFLAG from lineitem limit 1) and L_SHIPDATE < date_sub((select max(L_SHIPDATE) from lineitem),interval 1 year)) a where nbr=1;
+--------------------------------------------+------------+---------+-----------+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+--------------------------------------------+------------+---------+-----------+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------+
| Projection_50 | 1100550.97 | 644207 | root | | time:1m11.9s, loops:631, Concurrency:5 | Column#113, tpch1.lineitem.l_orderkey, tpch1.lineitem.l_partkey, tpch1.lineitem.l_suppkey, tpch1.lineitem.l_linenumber, tpch1.lineitem.l_quantity, tpch1.lineitem.l_extendedprice, tpch1.lineitem.l_discount, tpch1.lineitem.l_tax, tpch1.lineitem.l_returnflag, tpch1.lineitem.l_linestatus, tpch1.lineitem.l_shipdate, tpch1.lineitem.l_commitdate, tpch1.lineitem.l_receiptdate, tpch1.lineitem.l_shipinstruct, tpch1.lineitem.l_shipmode, tpch1.lineitem.l_comment, tpch1.lineitem.c1, tpch1.lineitem.c2, tpch1.lineitem.c3, tpch1.lineitem.c4, tpch1.lineitem.c5, tpch1.lineitem.c6, tpch1.lineitem.c7, tpch1.lineitem.c8, tpch1.lineitem.c9, tpch1.lineitem.c10, tpch1.lineitem.c11, tpch1.lineitem.c12, tpch1.lineitem.c13, tpch1.lineitem.c14, tpch1.lineitem.c15, tpch1.lineitem.c16, tpch1.lineitem.c17, tpch1.lineitem.c18, tpch1.lineitem.c19, tpch1.lineitem.c20 | 11.5 MB | N/A |
| └─Selection_51 | 1100550.97 | 644207 | root | | time:1m11.9s, loops:631 | eq(Column#113, 1) | 1.13 MB | N/A |
| └─Shuffle_57 | 1375688.71 | 1478493 | root | | time:1m10.9s, loops:1448, ShuffleConcurrency:5 | execution info: concurrency:5, data sources:[IndexLookUp_55] | N/A | N/A |
| └─Window_52 | 1375688.71 | 1478493 | root | | time:3m53.5s, loops:1451 | row_number()->Column#113 over(partition by tpch1.lineitem.l_orderkey rows between current row and current row) | N/A | N/A |
| └─Sort_56 | 1375688.71 | 1478493 | root | | time:3m53s, loops:1451 | tpch1.lineitem.l_orderkey | 190.1 MB | 223.9 MB |
| └─ShuffleReceiver_58 | 1375688.71 | 1478493 | root | | time:1m1s, loops:1451 | | N/A | N/A |
| └─IndexLookUp_55 | 1375688.71 | 1478493 | root | | time:5.72s, loops:1445, index_task: {total_time: 12.4s, fetch_handle: 5.53s, build: 337.3µs, wait: 6.87s}, table_task: {total_time: 1m2.4s, num: 76, concurrency: 5}, next: {wait_index: 2.05ms, wait_table_lookup_build: 8.17ms, wait_table_lookup_resp: 4.34s} | | 270.7 MB | N/A |
| ├─IndexRangeScan_53(Build) | 1375688.71 | 1478493 | cop[tikv] | table:lineitem, index:idx1(L_RETURNFLAG, L_SHIPDATE) | time:3.58s, loops:1452, cop_task: {num: 54, max: 711.2ms, min: 538.1µs, avg: 206.6ms, p95: 649ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 8.5s, tot_wait: 1.84s, rpc_num: 54, rpc_time: 11.1s, copr_cache: disabled, distsql_concurrency: 15}, tikv_task:{proc max:594ms, min:0s, avg: 157ms, p80:391ms, p95:522ms, iters:1655, tasks:54}, scan_detail: {total_process_keys: 1478493, total_process_keys_size: 110886975, total_keys: 1478547, get_snapshot_time: 1.13ms, rocksdb: {key_skipped_count: 1478493, block: {cache_hit_count: 1719}}} | range:["A" -inf,"A" 1997-12-01), keep order:false | N/A | N/A |
| └─TableRowIDScan_54(Probe) | 1375688.71 | 1478493 | cop[tikv] | table:lineitem | time:53.5s, loops:1524, cop_task: {num: 988, max: 1.47s, min: 17.4ms, avg: 414.9ms, p95: 722.9ms, max_proc_keys: 1756, p95_proc_keys: 1694, tot_proc: 5m23.2s, tot_wait: 59.2s, rpc_num: 988, rpc_time: 6m49.6s, copr_cache: disabled, distsql_concurrency: 15}, tikv_task:{proc max:1.35s, min:1ms, avg: 324.8ms, p80:472ms, p95:622ms, iters:5792, tasks:988}, scan_detail: {total_process_keys: 1478493, total_process_keys_size: 287479930, total_keys: 2956986, get_snapshot_time: 360.7ms, rocksdb: {key_skipped_count: 1478493, block: {cache_hit_count: 9879914}}} | keep order:false | N/A | N/A |
+--------------------------------------------+------------+---------+-----------+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------+
9 rows in set (1 min 12.71 sec)
期待官方对tidb-server的oom管控方面进一步优化,如果在默认配置下上面这条语句可以正常运行不会被oom-killer掉,那么我认为在内存管控方面已经能满足绝大多数的场景需求了。
优化方式比如:
1、划分统一缓冲池,tikv-client数据可溢出到缓冲池,缓冲池统一管理落盘。相关需求帖子参考:临时表能力的增强与tidb-server层缓存机制的建立
2、tikv-client独立内存管控(和算子分开,设置单独追踪链和溢出机制),根据算子的执行顺序和特性设置优先级(比如probe端需要low-speed即可,build需要high-speed)优先级不同内存分配不同来保证尽量少用内存的情况下速度不影响,并在语句整体上控制tikv-client使用量
3、内存评估,通过上一批返回数据评估行大小来根据内存大小自适应调整行记录请求。
4、算子计算多对多记录时候分批返回给父算子(比如外表一行匹配内表1kw行,占用大量内存,可多批次返回)。
上面建议纯属发散,总之比较迫切的希望tidb-server在oom方面进一步优化,减少语句级oom-killer的发生。