TIDB 进程OOM

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【概述】:TIDB 进程OOM

【背景】:一条全表update语句,表有2亿条数据

【现象】:TIDB 进程OOM

【问题】:线上研发执行了一个全表update 语句,看监控TIDB 进程使用内存一下飙升到100多G,导致TIDB 进程OOM, 集群查了下 mem-quota-query 参数是默认的1G限制,这参数我的理解是单条 SQL 语句可以占用的最大内存阈值,那为啥这条SQL还会内存使用飙升到100G?请问以后从tidb层面如何规避此类问题?是否还有哪里需要设置?

【业务影响】:

【TiDB 版本】:v5.0

【TiDB Operator 版本】:

【K8s 版本】:

【附件】:


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

  1. tidb 版本是 v5.0.0 吗? 麻烦确认下小版本
  2. 可以查看下慢日志,从dashboard 或者 tidb.log 反馈下这个sql对应的信息
  3. 能否提供这个表结构?
  4. 麻烦查看下tidb.log 上次启动时的参数 mem-quota-query 值,多谢。

1、版本是v5.0.0
2、


3、Create Table: CREATE TABLE iap_daily_user_detail_1051 (
iggid bigint(20) NOT NULL COMMENT ‘iggid’,
order_id bigint(20) NOT NULL COMMENT ‘订单id’,
event_id varchar(255) DEFAULT NULL COMMENT ‘事件id’,
sn varchar(56) NOT NULL COMMENT ‘订单号’,
order_amount decimal(11,3) NOT NULL COMMENT ‘订单金额(单位美元)’,
game_id bigint(20) NOT NULL COMMENT ‘支付的游戏id’,
gc_id int(11) NOT NULL COMMENT ‘游戏分类id’,
gw_id int(11) NOT NULL COMMENT ‘游戏大世界id’,
platform_id int(11) NOT NULL COMMENT ‘平台id(新)’,
lang_id int(11) NOT NULL COMMENT ‘语言id’,
order_time datetime NOT NULL COMMENT ‘订单时间’,
event_time datetime DEFAULT NULL COMMENT ‘事件时间’,
pm_id int(11) NOT NULL COMMENT ‘支付方式id’,
pm_type int(11) NOT NULL COMMENT ‘支付类型’,
pc_id int(11) NOT NULL COMMENT ‘点卡id’,
ip varchar(56) NOT NULL COMMENT ‘支付IP’,
country_iso_code varchar(56) DEFAULT NULL COMMENT ‘区域/国家’,
origin_country_iso_code varchar(56) DEFAULT NULL COMMENT ‘原始区域/国家’,
subdivision varchar(255) DEFAULT NULL COMMENT ‘子区域’,
city varchar(255) DEFAULT NULL COMMENT ‘城市’,
continent_code varchar(56) DEFAULT NULL COMMENT ‘大洲’,
server_id varchar(56) DEFAULT NULL COMMENT ‘分服id’,
order_state int(11) NOT NULL COMMENT ‘订单状态(1正常订单;2还款;5退款;9欺诈)’,
create_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘创建时间’,
modify_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘最后修改时间’,
KEY idx_key_iggid (iggid),
KEY idx_key_channel_group (event_time,gw_id,game_id,iggid,pm_id,country_iso_code),
KEY idx_key_event_time_state (event_time,pm_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT=‘应用内购买明细’
PARTITION BY RANGE ( TO_DAYS(event_time) ) (
PARTITION p0 VALUES LESS THAN (736695),
PARTITION p1 VALUES LESS THAN (736726),
PARTITION p2 VALUES LESS THAN (736754),
PARTITION p3 VALUES LESS THAN (736785),
PARTITION p4 VALUES LESS THAN (736815),
PARTITION p5 VALUES LESS THAN (736846),
PARTITION p6 VALUES LESS THAN (736876),
PARTITION p7 VALUES LESS THAN (736907),
PARTITION p8 VALUES LESS THAN (736938),
PARTITION p9 VALUES LESS THAN (736968),
PARTITION p10 VALUES LESS THAN (736999),
PARTITION p11 VALUES LESS THAN (737029),
PARTITION p12 VALUES LESS THAN (737060),
PARTITION p13 VALUES LESS THAN (737091),
PARTITION p14 VALUES LESS THAN (737119),
PARTITION p15 VALUES LESS THAN (737150),
PARTITION p16 VALUES LESS THAN (737180),
PARTITION p17 VALUES LESS THAN (737211),
PARTITION p18 VALUES LESS THAN (737241),
PARTITION p19 VALUES LESS THAN (737272),
PARTITION p20 VALUES LESS THAN (737303),
PARTITION p21 VALUES LESS THAN (737333),
PARTITION p22 VALUES LESS THAN (737364),
PARTITION p23 VALUES LESS THAN (737394),
PARTITION p24 VALUES LESS THAN (737425),
PARTITION p25 VALUES LESS THAN (737456),
PARTITION p26 VALUES LESS THAN (737484),
PARTITION p27 VALUES LESS THAN (737515),
PARTITION p28 VALUES LESS THAN (737545),
PARTITION p29 VALUES LESS THAN (737576),
PARTITION p30 VALUES LESS THAN (737606),
PARTITION p31 VALUES LESS THAN (737637),
PARTITION p32 VALUES LESS THAN (737668),
PARTITION p33 VALUES LESS THAN (737698),
PARTITION p34 VALUES LESS THAN (737729),
PARTITION p35 VALUES LESS THAN (737759),
PARTITION p36 VALUES LESS THAN (737790),
PARTITION p37 VALUES LESS THAN (737821),
PARTITION p38 VALUES LESS THAN (737850),
PARTITION p39 VALUES LESS THAN (737881),
PARTITION p40 VALUES LESS THAN (737911),
PARTITION p41 VALUES LESS THAN (737942),
PARTITION p42 VALUES LESS THAN (737972),
PARTITION p43 VALUES LESS THAN (738003),
PARTITION p44 VALUES LESS THAN (738034),
PARTITION p45 VALUES LESS THAN (738064),
PARTITION p46 VALUES LESS THAN (738095),
PARTITION p47 VALUES LESS THAN (738125),
PARTITION p48 VALUES LESS THAN (738156),
PARTITION p49 VALUES LESS THAN (738187),
PARTITION p50 VALUES LESS THAN (738215),
PARTITION p51 VALUES LESS THAN (738246),
PARTITION p52 VALUES LESS THAN (738276),
PARTITION p53 VALUES LESS THAN (738307),
PARTITION p54 VALUES LESS THAN (738337),
PARTITION p55 VALUES LESS THAN (738368),
PARTITION p56 VALUES LESS THAN (738399),
PARTITION p57 VALUES LESS THAN (738429),
PARTITION p58 VALUES LESS THAN (738460),
PARTITION p59 VALUES LESS THAN (738490),
PARTITION p60 VALUES LESS THAN (738521),
PARTITION p61 VALUES LESS THAN (738552),
PARTITION p62 VALUES LESS THAN (738580),
PARTITION p63 VALUES LESS THAN (738611),
PARTITION p64 VALUES LESS THAN (738641),
PARTITION p65 VALUES LESS THAN (738672),
PARTITION p66 VALUES LESS THAN (738702),
PARTITION p67 VALUES LESS THAN (738733),
PARTITION p68 VALUES LESS THAN (738764),
PARTITION p69 VALUES LESS THAN (738794),
PARTITION p70 VALUES LESS THAN (738825),
PARTITION p71 VALUES LESS THAN (738855),
PARTITION p72 VALUES LESS THAN (738886),
PARTITION p73 VALUES LESS THAN (738917),
PARTITION p74 VALUES LESS THAN (738945),
PARTITION p75 VALUES LESS THAN (738976),
PARTITION p76 VALUES LESS THAN (739006),
PARTITION p77 VALUES LESS THAN (739037),
PARTITION p78 VALUES LESS THAN (739067),
PARTITION p79 VALUES LESS THAN (739098),
PARTITION p80 VALUES LESS THAN (739129),
PARTITION p81 VALUES LESS THAN (739159),
PARTITION p82 VALUES LESS THAN (739190),
PARTITION p83 VALUES LESS THAN (739220),
PARTITION p84 VALUES LESS THAN (739251),
PARTITION p85 VALUES LESS THAN (739282),
PARTITION p86 VALUES LESS THAN (739311),
PARTITION p87 VALUES LESS THAN (739342),
PARTITION p88 VALUES LESS THAN (739372),
PARTITION p89 VALUES LESS THAN (739403),
PARTITION p90 VALUES LESS THAN (739433),
PARTITION p91 VALUES LESS THAN (739464),
PARTITION p92 VALUES LESS THAN (739495),
PARTITION p93 VALUES LESS THAN (739525),
PARTITION p94 VALUES LESS THAN (739556),
PARTITION p95 VALUES LESS THAN (739586),
PARTITION p96 VALUES LESS THAN (739617),
PARTITION p97 VALUES LESS THAN (739648),
PARTITION p98 VALUES LESS THAN (739676),
PARTITION p99 VALUES LESS THAN (739707),
PARTITION p100 VALUES LESS THAN (739737),
PARTITION p101 VALUES LESS THAN (739768),
PARTITION p102 VALUES LESS THAN (739798),
PARTITION p103 VALUES LESS THAN (739829),
PARTITION p104 VALUES LESS THAN (739860),
PARTITION p105 VALUES LESS THAN (739890),
PARTITION p106 VALUES LESS THAN (739921),
PARTITION p107 VALUES LESS THAN (739951),
PARTITION p108 VALUES LESS THAN (739982),
PARTITION p109 VALUES LESS THAN (740013),
PARTITION p110 VALUES LESS THAN (740041),
PARTITION p111 VALUES LESS THAN (740072),
PARTITION p112 VALUES LESS THAN (740102),
PARTITION p113 VALUES LESS THAN (740133),
PARTITION p114 VALUES LESS THAN (740163),
PARTITION p115 VALUES LESS THAN (740194),
PARTITION p116 VALUES LESS THAN (740225),
PARTITION p117 VALUES LESS THAN (740255),
PARTITION p118 VALUES LESS THAN (740286),
PARTITION p119 VALUES LESS THAN (740316),
PARTITION p120 VALUES LESS THAN (740347),
PARTITION p121 VALUES LESS THAN (740378),
PARTITION p122 VALUES LESS THAN (740406),
PARTITION p123 VALUES LESS THAN (740437),
PARTITION p124 VALUES LESS THAN (740467),
PARTITION p125 VALUES LESS THAN (740498),
PARTITION p126 VALUES LESS THAN (740528),
PARTITION p127 VALUES LESS THAN (740559),
PARTITION p128 VALUES LESS THAN (740590),
PARTITION p129 VALUES LESS THAN (740620),
PARTITION p130 VALUES LESS THAN (740651),
PARTITION p131 VALUES LESS THAN (740681),
PARTITION p132 VALUES LESS THAN (740712),
PARTITION p133 VALUES LESS THAN (740743),
PARTITION p134 VALUES LESS THAN (740772),
PARTITION p135 VALUES LESS THAN (740803),
PARTITION p136 VALUES LESS THAN (740833),
PARTITION p137 VALUES LESS THAN (740864),
PARTITION p138 VALUES LESS THAN (740894),
PARTITION p139 VALUES LESS THAN (740925),
PARTITION p140 VALUES LESS THAN (740956),
PARTITION p141 VALUES LESS THAN (740986),
PARTITION p142 VALUES LESS THAN (741017),
PARTITION p143 VALUES LESS THAN (741047),
PARTITION p144 VALUES LESS THAN (741078),
PARTITION p145 VALUES LESS THAN (741109),
PARTITION p146 VALUES LESS THAN (741137),
PARTITION p147 VALUES LESS THAN (741168),
PARTITION p148 VALUES LESS THAN (741198),
PARTITION p149 VALUES LESS THAN (741229),
PARTITION p150 VALUES LESS THAN (741259),
PARTITION p151 VALUES LESS THAN (741290),
PARTITION p152 VALUES LESS THAN (741321),
PARTITION p153 VALUES LESS THAN (741351),
PARTITION p154 VALUES LESS THAN (741382),
PARTITION p155 VALUES LESS THAN (741412),
PARTITION p156 VALUES LESS THAN (741443),
PARTITION p157 VALUES LESS THAN (741474),
PARTITION p158 VALUES LESS THAN (741502),
PARTITION p159 VALUES LESS THAN (741533),
PARTITION p160 VALUES LESS THAN (741563),
PARTITION p161 VALUES LESS THAN (741594),
PARTITION p162 VALUES LESS THAN (741624),
PARTITION p163 VALUES LESS THAN (741655),
PARTITION p164 VALUES LESS THAN (741686),
PARTITION p165 VALUES LESS THAN (741716),
PARTITION p166 VALUES LESS THAN (741747),
PARTITION p167 VALUES LESS THAN (741777),
PARTITION pother VALUES LESS THAN (MAXVALUE)
)
4、“oom-action”:“cancel”,“mem-quota-query”:1073741824,

是否有参数设置可以加快tidb 内存占用的释放回收速度?

推测是由于 tidb 目前的 memmory tracker 只追踪了读路径的内存消耗,update 语句数据读上来时候未触发到 mem-quota 的阈值,而加上 membuffer 后,触发了操作了系统的 OOM killer,进而造成了 tidb 的 OOM。所以可能需要改下 SQL 逻辑,减少 update 影响的语句数量。

我看 Dashboard 中记录的 SQL 最大使用内存已经是 7.6GB 了,oom 阈值配置的是 1GB,oom action 配置的是 cancel,但是这条 query 貌似并没有被 cancel 掉,可以看看 dashboard 中这条记录的 SQL 还有对应的执行计划吗?