超大查询导致系统oom

【 TiDB 使用环境】生产环境
【 TiDB 版本】5.4.2

问题:tidb中一条大的sql 就可以把系统内存拉满导致oom。请问有这个参数控制住oom吗?

1、第一种情况
不使用hint,直接跑sql。由于sql超过所设定的mem-quota-query=500M限制,所以直接提示 Out Of Memory Quota! 说明mem-quota-query参数生效了,是tidb 的一种内存保护机制,没问题。


[SQL]SELECT
        wgc.*,wu.nickname AS 'sendUserNickName',wu.channel_id as register_channel_id,
        b.charm_score*wgc.gift_amount as charm_score from weplay_gift_consume wgc
        LEFT JOIN weplay_user wu ON wgc.uid = wu.uid
        LEFT JOIN weplay_gift_config b ON wgc.gift_id = b.id
          ORDER BY wgc.create_time DESC

[Err] 1105 - Out Of Memory Quota![conn_id=5]

2、第二种情况。使用hint,该sql会一直跑,直到系统oom。请问这里有参数控制吗?避免sql占满系统内存出现oom。


SELECT
        /*+ MERGE_JOIN(wu) */
        wgc.*,wu.nickname AS 'sendUserNickName',wu.channel_id as register_channel_id,
        b.charm_score*wgc.gift_amount as charm_score from weplay_gift_consume wgc
        LEFT JOIN weplay_user wu ON wgc.uid = wu.uid
        LEFT JOIN weplay_gift_config b ON wgc.gift_id = b.id
        ORDER BY wgc.create_time DESC

看看这个

1 个赞

谢谢分享,虽然没解决问题

你的意思是使用hint的时候,不受mem-quota-query的限制么?

explain 一下第2个的执行计划看下。

是的 ,可一直查到服务器oom

Pictures20221109175248.bmp (3.0 MB)

Projection_75	102550875.22	root		weplayzy.weplay_gift_consume.id, weplayzy.weplay_gift_consume.uid, weplayzy.weplay_gift_consume.recipient_id, weplayzy.weplay_gift_consume.recipient_name, weplayzy.weplay_gift_consume.gift_id, weplayzy.weplay_gift_consume.gift_name, weplayzy.weplay_gift_consume.gift_type, weplayzy.weplay_gift_consume.gift_amount, weplayzy.weplay_gift_consume.gift_icon, weplayzy.weplay_gift_consume.gift_price, weplayzy.weplay_gift_consume.host_id, weplayzy.weplay_gift_consume.room_id, weplayzy.weplay_gift_consume.diamond_amount, weplayzy.weplay_gift_consume.wealth_amount, weplayzy.weplay_gift_consume.remark, weplayzy.weplay_gift_consume.data_source, weplayzy.weplay_gift_consume.channel_id, weplayzy.weplay_gift_consume.create_time, weplayzy.weplay_gift_consume.gift_source, weplayzy.weplay_gift_consume.gift_order_id, weplayzy.weplay_user.nickname, weplayzy.weplay_user.channel_id, mul(weplayzy.weplay_gift_config.charm_score, weplayzy.weplay_gift_consume.gift_amount)->Column#115
└─IndexJoin_81	102550875.22	root		left outer join, inner:TableReader_78, outer key:weplayzy.weplay_gift_consume.gift_id, inner key:weplayzy.weplay_gift_config.id, equal cond:eq(weplayzy.weplay_gift_consume.gift_id, weplayzy.weplay_gift_config.id)
  ├─IndexJoin_93(Build)	102550875.22	root		left outer join, inner:TableReader_90, outer key:weplayzy.weplay_gift_consume.uid, inner key:weplayzy.weplay_user.uid, equal cond:eq(weplayzy.weplay_gift_consume.uid, weplayzy.weplay_user.uid)
  │ ├─IndexLookUp_99(Build)	102158297.00	root		
  │ │ ├─IndexFullScan_97(Build)	102158297.00	cop[tikv]	table:wgc, index:idx_create_time(create_time)	keep order:true, desc
  │ │ └─TableRowIDScan_98(Probe)	102158297.00	cop[tikv]	table:wgc	keep order:false
  │ └─TableReader_90(Probe)	1.00	root		data:TableRangeScan_89
  │   └─TableRangeScan_89	1.00	cop[tikv]	table:wu	range: decided by [weplayzy.weplay_gift_consume.uid], keep order:false
  └─TableReader_78(Probe)	0.96	root		data:TableRangeScan_77
    └─TableRangeScan_77	0.96	cop[tikv]	table:b	range: decided by [weplayzy.weplay_gift_consume.gift_id], keep order:false
  1. tidb_enable_rate_limit_action 把这个变量关了试下
  2. 打开数据落盘的功能
    3.把 tidb_distsql_scan_concurrency 调低试下(调成10或者8、6或者更低)
    可以先试下1和2,如果不行,加上3一起试下,

1,2,3都上了
set tidb_enable_rate_limit_action = 0;
set tidb_distsql_scan_concurrency = 1;

系统内存没之前张的那么块也在缓慢涨,但是跑了近10分钟,内存已经满 最后oom。

看上去 tidb,确实还不没有像mysql有这种“磁盘换时间”机制,mysql虽然慢但总可以出来。tidb要么提示 out of memory quota, 要么最后系统oom ,大的sql没办法???

[2022/11/10 11:17:10.053 +08:00] [INFO] [row_container.go:368] ["memory exceeds quota, spill to disk now."] [consumed=524678879] [quota=524288000]
[2022/11/10 11:17:10.053 +08:00] [INFO] [row_container.go:562] ["memory exceeds quota, spill to disk now."] [consumed=524638922] [quota=524288000]
[2022/11/10 11:17:10.058 +08:00] [INFO] [row_container.go:368] ["memory exceeds quota, spill to disk now."] [consumed=525997030] [quota=524288000]
[2022/11/10 11:17:10.669 +08:00] [INFO] [row_container.go:368] ["memory exceeds quota, spill to disk now."] [consumed=524618199] [quota=524288000]
[2022/11/10 11:17:10.677 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=15.470587844s] [cop_time=44.804289864s] [process_time=145.775s] [wait_time=5.234s] [request_count=21621] [total_keys=7242446] [process_keys=6781539] [num_cop_tasks=21621] [process_avg_time=0.006742287s] [process_p90_time=0.017s] [process_max_time=0.887s] [process_max_addr=10.0.8.39:20160] [wait_avg_time=0.000242079s] [wait_p90_time=0.001s] [wait_max_time=0.091s] [wait_max_addr=10.0.8.39:20161] [stats=weplay_user:437269348093526017,weplay_gift_config:437246981652611087,weplay_gift_consume:437268436107395086] [conn_id=29] [user=root] [database=weplayzy] [table_ids="[1008,308,1012]"] [index_names="[weplay_gift_consume:idx_uid]"] [txn_start_ts=437269355603689479] [mem_max="658532008 Bytes (628.0 MB)"] [sql="SELECT\r\n        /*+ MERGE_JOIN(wu) */\r\n        wgc.*,wu.nickname AS 'sendUserNickName',wu.channel_id as register_channel_id,\r\n        b.charm_score*wgc.gift_amount as charm_score from weplay_gift_consume wgc\r\n        LEFT JOIN weplay_user wu ON wgc.uid = wu.uid\r\n        LEFT JOIN weplay_gift_config b ON wgc.gift_id = b.id\r\n        ORDER BY wgc.create_time DESC"]

啥业务场景每次要查1亿+数据出来啊?感觉应该从业务层面入手优化下。

嗯,实际应用中会控制时间字段过滤,一周数据在 200w内,可以查出来。(要加hint( /*+ MERGE_JOIN(wu) */),不加hint 提示 out of memory quota)

抛开应用场景。对于大sql,tidb给出的选择要么 out of memory quota,要么系统oom,视乎没有磁盘换效率的机制(虽然它有数据落盘但是实际还是耗了内存最后oom)。我想达到的效果是,大的sql在tidb也能跑出来。

tidb_mem_quota_query
tidb_mem_oom_action
oom-use-tmp-storage
tmp-storage-quota

这几个变量和参数看一下,设置的是什么 ?

MySQL [(none)]> show variables like 'tidb_mem_quota_query%';
+----------------------+-----------+
| Variable_name        | Value     |
+----------------------+-----------+
| tidb_mem_quota_query | 524288000 |
+----------------------+-----------+
1 row in set (0.020 sec)

MySQL [(none)]> show config where name like '%oom_action%';
+------+-----------------+------------+--------+
| Type | Instance        | Name       | Value  |
+------+-----------------+------------+--------+
| tidb | 10.0.8.183:4000 | oom-action | cancel |
| tidb | 10.0.8.45:4000  | oom-action | cancel |
| tidb | 10.0.8.39:4000  | oom-action | cancel |
+------+-----------------+------------+--------+
3 rows in set (0.019 sec)

MySQL [(none)]> show config where name like '%tmp-storage%';
+------+-----------------+---------------------+--------------------------------------------------------------+
| Type | Instance        | Name                | Value                                                        |
+------+-----------------+---------------------+--------------------------------------------------------------+
| tidb | 10.0.8.183:4000 | oom-use-tmp-storage | true                                                         |
| tidb | 10.0.8.183:4000 | tmp-storage-path    | /tmp/0_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage |
| tidb | 10.0.8.183:4000 | tmp-storage-quota   | -1                                                           |
| tidb | 10.0.8.45:4000  | oom-use-tmp-storage | true                                                         |
| tidb | 10.0.8.45:4000  | tmp-storage-path    | /tmp/0_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage |
| tidb | 10.0.8.45:4000  | tmp-storage-quota   | -1                                                           |
| tidb | 10.0.8.39:4000  | oom-use-tmp-storage | true                                                         |
| tidb | 10.0.8.39:4000  | tmp-storage-path    | /tmp/0_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage |
| tidb | 10.0.8.39:4000  | tmp-storage-quota   | -1                                                           |
+------+-----------------+---------------------+--------------------------------------------------------------+
9 rows in set (0.022 sec)

MySQL [(none)]>  

你的 /tmp 是什么盘 ?
算子落盘的过程中,需要考虑 temp I/O 问题。写出速度不够快,导致仍然堆积在内存中,就还是会 OOM。

是ssd

tidb 总内存多大,tmp​:u6709:多大,1亿+的数据又有多大?
要是都满足条件,应该可以跑完,有一个不满足oom就正常

tmp 调整到ssd数据盘了。空间400G足够。
跑10多分钟时,系统oom

top - 15:05:27 up 54 days, 22:22,  3 users,  load average: 69.11, 29.01, 23.99
Tasks: 247 total,   2 running, 245 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us, 48.4 sy,  0.0 ni,  0.5 id, 48.8 wa,  0.0 hi,  2.1 si,  0.0 st
KiB Mem : 99.3/32778404 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
KiB Swap:  0.0/0        [                                                                                                    ]
     

##tidb日志(从开始到到oom最后tidb重启整个日志)
tidb.log (544.6 KB)

你tidb server 进程的内存有多大

  1. tidb_enable_rate_limit_action 把这个变量关了试下
  2. 打开数据落盘的功能
    3.把 tidb_distsql_scan_concurrency 调低试下(调成2个这样子)
  3. tidb_enable_chunk_rpc 把这个系统变量设置OFF(会导致语句执行效率低)

最终还是建议优化语句