TiDB OOM原因问题排查

【 TiDB 使用环境】生产环境
【 TiDB 版本】v6.1.0
【复现路径】无
【遇到的问题:问题现象及影响】
求助:TiDB三个节点同时OOM,实际物理机内存最大使用率50%,最大内存消耗13GB左右(TiDB和PB同一台机器混合部署)。
dmesg -T|grep tidb-serveranon-rss用到了25G左右, anon-rss :表示匿名内存(即未映射到文件的内存)的常驻集大小(RSS),这种会是内存泄漏导致吗?
【资源配置】
TiDB节点 16核32GB
【附件:截图/日志/监控】



oom时刻的heap文件分析内存总共使用大概9GB,具体OOM原因应该如何分析呢?

到tidb.log 里 grep 一下 expensive_query ,看看能否结合Dashboard一起找到节点OOM重启前的SQL

1 个赞

TiDB三个节点同时OOM理论上不应该发生的

尽量避免混合部署…

混合部署,不如单节点的独立部署(混合三副本的调度上比较困难,需要考虑资源上的隔离,实际上会有很大的难度)

你看下tidb的日志中是否有OOM关键字,另外看下grafana中overview中system info的内存使用情况,另外你进行numa资源隔离了吗?

可以看下tidb-4000的log目录下的OOM日志或者tidb.log,看看什么SQL导致的OOM

看一下日志看看导致oom的sql

主要还是查一下log,看看具体是什么原因导致OOM

有SQL卡出来了

之前遇到过V7.1版本的一个BUG,我看新版本修复了,不知道您这个版本是不是也是BUG

同时发生,概率小

https://github.com/pingcap/tidb/blob/master/pkg/types/parser_driver/value_expr.go#L204

建议查一下当时的sql是那些,毕竟从你的dump看,应该是解析了大量的sql,以至于疯狂调用上面这个,new了很多对象。以至于内存炸了。
再加上3台一起炸,这个也不是内存泄漏的特征,假如是内存泄漏,每个tidb最终都会炸,但一起炸的可能性非常低。一起炸更像是3台同时收到了大量的sql,来不及回收导致的。

日志里面的expensive_query ,有很多相同的delete操作的sql,但这些sql出现在OOM重启之后,不是OOM之前出现的

[2024/03/11 12:20:30.130 +08:00] [Info] [printer.go:48] ["loaded config"] [config="{\"host\":\"0.0.0.0\",\"advertise-address\":\"10.50.33.243\",\"port\":4000,\"cors\":\"\",\"store\":\"tikv\",\"path\":\"10.50.33.243:2379,10.50.35.9:2379,10.50.37.214:2379\",\"socket\":\"/tmp/tidb-4000.sock\",\"lease\":\"45s\",\"run-ddl\":true,\"split-table\":true,\"token-limit\":1000,\"oom-use-tmp-storage\":true,\"tmp-storage-path\":\"/tmp/1202_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage\",\"tmp-storage-quota\":-1,\"server-version\":\"\",\"version-comment\":\"\",\"tidb-edition\":\"\",\"tidb-release-version\":\"\",\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":null,\"enable-timestamp\":null,\"disable-error-stack\":null,\"enable-error-stack\":null,\"file\":{\"filename\":\"/data/tidb-deploy/tidb-4000/log/tidb.log\",\"max-size\":300,\"max-days\":2,\"max-backups\":0},\"enable-slow-log\":true,\"slow-query-file\":\"/data/tidb-deploy/tidb-4000/log/tidb_slow_query.log\",\"slow-threshold\":3000,\"expensive-threshold\":10000,\"record-plan-in-slow-log\":1,\"query-log-max-len\":4096},\"instance\":{\"tidb_general_log\":false,\"tidb_pprof_sql_cpu\":false,\"ddl_slow_threshold\":300,\"tidb_expensive_query_time_threshold\":60,\"tidb_enable_slow_log\":true,\"tidb_slow_log_threshold\":3000,\"tidb_record_plan_in_slow_log\":1,\"tidb_check_mb4_value_in_utf8\":true,\"tidb_force_priority\":\"NO_PRIORITY\",\"tidb_memory_usage_alarm_ratio\":0.8,\"tidb_enable_collect_execution_info\":true,\"plugin_dir\":\"/data/deploy/plugin\",\"plugin_load\":\"\"},\"security\":{\"skip-grant-table\":false,\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\",\"cluster-ssl-ca\":\"\",\"cluster-ssl-cert\":\"\",\"cluster-ssl-key\":\"\",\"cluster-verify-cn\":null,\"spilled-file-encryption-method\":\"plaintext\",\"enable-sem\":false,\"auto-tls\":false,\"tls-version\":\"\",\"rsa-key-size\":4096,\"secure-bootstrap\":false},\"status\":{\"status-host\":\"0.0.0.0\",\"metrics-addr\":\"\",\"status-port\":10080,\"metrics-interval\":15,\"report-status\":true,\"record-db-qps\":false,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-concurrent-streams\":1024,\"grpc-initial-window-size\":2097152,\"grpc-max-send-msg-size\":2147483647},\"performance\":{\"max-procs\":0,\"max-memory\":0,\"server-memory-quota\":0,\"memory-usage-alarm-ratio\":0.8,\"stats-lease\":\"3s\",\"stmt-count-limit\":5000,\"feedback-probability\":0,\"query-feedback-limit\":1024,\"pseudo-estimate-ratio\":0.8,\"force-priority\":\"NO_PRIORITY\",\"bind-info-lease\":\"3s\",\"txn-entry-size-limit\":6291456,\"txn-total-size-limit\":104857600,\"tcp-keep-alive\":true,\"tcp-no-delay\":true,\"cross-join\":true,\"distinct-agg-push-down\":false,\"projection-push-down\":false,\"max-txn-ttl\":3600000,\"index-usage-sync-lease\":\"0s\",\"plan-replayer-gc-lease\":\"10m\",\"gogc\":100,\"enforce-mpp\":false,\"stats-load-concurrency\":5,\"stats-load-queue-size\":1000,\"enable-stats-cache-mem-quota\":false,\"committer-concurrency\":256,\"run-auto-analyze\":true},\"prepared-plan-cache\":{\"enabled\":true,\"capacity\":100,\"memory-guard-ratio\":0.1},\"opentracing\":{\"enable\":false,\"rpc-metrics\":false,\"sampler\":{\"type\":\"const\",\"param\":1,\"sampling-server-url\":\"\",\"max-operations\":0,\"sampling-refresh-interval\":0},\"reporter\":{\"queue-size\":0,\"buffer-flush-interval\":0,\"log-spans\":false,\"local-agent-host-port\":\"\"}},\"proxy-protocol\":{\"networks\":\"\",\"header-timeout\":5},\"pd-client\":{\"pd-server-timeout\":3},\"tikv-client\":{\"grpc-connection-count\":4,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-compression-type\":\"none\",\"commit-timeout\":\"41s\",\"async-commit\":{\"keys-limit\":256,\"total-key-size-limit\":4096,\"safe-window\":2000000000,\"allowed-clock-drift\":500000000},\"max-batch-size\":128,\"overload-threshold\":200,\"max-batch-wait-time\":0,\"batch-wait-size\":8,\"enable-chunk-rpc\":true,\"region-cache-ttl\":600,\"store-limit\":0,\"store-liveness-timeout\":\"1s\",\"copr-cache\":{\"capacity-mb\":1000},\"ttl-refreshed-txn-size\":33554432,\"resolve-lock-lite-threshold\":16},\"binlog\":{\"enable\":false,\"ignore-error\":false,\"write-timeout\":\"15s\",\"binlog-socket\":\"\",\"strategy\":\"range\"},\"compatible-kill-query\":false,\"plugin\":{\"dir\":\"/data/deploy/plugin\",\"load\":\"\"},\"pessimistic-txn\":{\"max-retry-count\":256,\"deadlock-history-capacity\":10,\"deadlock-history-collect-retryable\":false,\"pessimistic-auto-commit\":false},\"check-mb4-value-in-utf8\":true,\"max-index-length\":3072,\"index-limit\":64,\"table-column-count-limit\":1017,\"graceful-wait-before-shutdown\":0,\"alter-primary-key\":false,\"treat-old-version-utf8-as-utf8mb4\":true,\"enable-table-lock\":false,\"delay-clean-table-lock\":0,\"split-region-max-num\":1000,\"top-sql\":{\"receiver-address\":\"\"},\"repair-mode\":false,\"repair-table-list\":[],\"isolation-read\":{\"engines\":[\"tikv\",\"tiflash\",\"tidb\"]},\"max-server-connections\":0,\"new_collations_enabled_on_first_bootstrap\":true,\"experimental\":{\"allow-expression-index\":false},\"enable-collect-execution-info\":true,\"skip-register-to-dashboard\":false,\"enable-telemetry\":true,\"labels\":{},\"enable-global-index\":false,\"deprecate-integer-display-length\":false,\"enable-enum-length-limit\":true,\"stores-refresh-interval\":60,\"enable-tcp4-only\":false,\"enable-forwarding\":false,\"max-ballast-object-size\":0,\"ballast-object-size\":0,\"enable-global-kill\":true,\"enable-batch-dml\":false,\"mem-quota-query\":1073741824,\"oom-action\":\"cancel\"}"]
[2024/03/11 12:22:12.487 +08:00] [Warn] [expensivequery.go:188] [expensive_query] [cost_time=60.004846747s] [conn_id=42619269715853315] [txn_start_ts=0] [mem_max="645308 Bytes (630.2 KB)"] [sql="analyze table `sp_finance_prd`.`finance_shipment_event`"]
[2024/03/11 12:23:14.787 +08:00] [Warn] [expensivequery.go:188] [expensive_query] [cost_time=60.013551403s] [conn_id=42619269715855561] [user=mws_user] [database=sp_finance_prd] [txn_start_ts=448300563752550633] [mem_max="0 Bytes (0 Bytes)"] [sql="DELETE FROM finance_shipment_event WHERE finance_shipment_event.seller_id = 'A1X43XAF7536UH' AND finance_shipment_event.financial_event_group_id = 'URAgbDO3NUyi4Divii3pjWl5EF2sNonibStISqg-pQY' AND finance_shipment_event.shipment_item_md5 IN ('2078c740090fc0e73cd50300aeb5e5eb0', '269be8d5d7b1928121da06eb36ca27e90', '1bda2237b042b202f59de6fdf492487a0', '1a11de5dae4f2bd916ff20b2697248460', '29327abccc5689edb49a1eb63c25729b0', '2b226954e8eacffb98a5e3ec306590ff0', '2b920fd04e1deb915feab21bf52bc0010', '1d55968b108b1080eee25408015e1b950', '261ac49b72d0ca3f20f00ef1c248f52f0', '2b693df4d35bb2fd6aca662d774a30790', '2a4e50ef8aaa0d2137cd5f09386c73a00', '19a6f5b3ab43e956a3a9d990797a054a0', '211abaaf1c428fc87e445168a4175faa0', '238139d4e086462723d9a5de086d155b0', '1f170862ccc328f340456b1d41f2766a0', '28e5bfa8050b0d334f528b461ac4a2c10', '1e0bcd3a31eb4ab936a57b27b88fe9c70', '264b443fbee76ee564c50ce796c128b60', '1d1f78acaa7bb9681aa47a17e8fc790d0', '22c38819e8eb059a5e281c9a528331cb0', '1cbcf96d5e5b16222ed3ea25a321f79a0', '28bd8499a73f557c3d45d2c55854e31d0', '216099e1e7025d34a3be6f80f2fea8d30', '29d75635b6cc582a5d6d4a915aa800910', '16236e44e0f32307244c19c9863b01930', '2ae071e515612d76c726ffc3df3734ef0', '16d093cc898546fbfc30db4017f04dc40', '2572ab1d0976328ee3cd655e67b564ae0', '18086d65eb76b2c4a493e0465f6f9bd30', '250bcfbe425dcd3fc9e34f934d98870d0', '2527d910271f8c49320dcd5c59d7cc460', '2524b2f65ad346d88aaed73a2b7ade5a0', '25da7e2a38e988ea8f3c82bdb459235d0', '2b2ede010dff35d7803c4c3813be23150', '1e05dba6e57815a3c8412b7dcdb6f32f0', '211aa152d5cf4654cf74ec11d616e79a0', '2bfde13b22cc98d6ef829f1d03c09e390', '2943ca6c36eac74858e90b28caa5b4120', '250dc73274473994782ee650eb72846f0', '1d5aa2b150d9874541165d2efca1711d0', '1f15f852724a9472e52a3585f29062d70', '299790098e59815092ca687e77b865420', '1dd73764d2d07aeca59c60e7319417d90', '24395f4e374de1e58c7edc52e0ae05370', '18cd76caba5091669df207d289f00abb0', '2981e6059281124faada9c9d05671b2f0', '213f9aa73bc55070d4fac9b41091deda0', '26d85201ad0171ef64c0821052a79ce60', '2d23c248edcd9eaf2b6b8d42f2700a570', '21a664fb234439dc4186b6de2fbda4350', '20f83a67988ca96cfdc700db7476f4b40', '21155df53064698bdce57ed72cff26fc0', '1cb3b0d16f32945a4c07bb2e0e12e0460', '2670134f70180415149405cdc89a09c20', '2c5f41c3c5e8b73b0699d5d4831940120', '21a1a86e9699b254171cca620427ef4e0', '1aa78a5a69853f5406b709ffc118f8e80', '1d399f49e759d7b1d317d0925e57d6ed0', '239acd780fa1a4937cf1a9ab3cb635130', '199bcf3b7c7bb21eee0fcef5f827051a0', '1d505799def73d43430d60888f6e94120', '1897c40fc8e7d6c2a38857d97d2bffa40', '27f66a680325495cb55abbf5f22ac9050', '1f2ea058ba48b9e9d0ece81f84c48f010', '2ce790e24ef9f1a61c9e05f6211102f60', '223790065bcf685a4919fd948350ef150', '19dd0c2dc9910f164f85ec48a22458270', '24ae455108f6c18db52fc8873c42835f0', '29639d7e743cf3d41428352e9b254e210', '1ea87b148ca96088eb1841add2785b700', '16ea55ad94262ef88d03554ba3b4d29c0', '2097c993cee8b3b781b23e35754995690', '1a5447396ab8b7f839ff52bc2812a7950', '2017f322f1ef51169cc7361d3b3c09440', '2277c39dfc41e206e1c741417622cb3f0', '2782c265a8b62388a5888963d1b9ea010', '2b268cbb435dd9ca659c8bf517e21bf00', '251a0dfb12e36f3789084617bc2b8c9c0', '26351c22f9121a24ada8528d22fb53900', '2bc368e037dd478394bf8ca331685f920', '18ecee8f102579c2a5f21fb43d2fed830', '2418d5964f85d94c35a742babd181b160', '1dee881a96b49cf21a6c074e625f63610', '210319398265feb2e4eaa492bf7513ae0', '2a993d7977575ff9e7b563a84d15bdd60', '29d014d3579ee8443ce2a04f9a3de0340', '174efbb9c7b1754565d7684a1cceadef0', '1e4d1f021584c53122947284c63dbe350', '257d5d2fd6188874f5e21ddf4695ad880', '2c93e71399aa91526724ba07a84869460', '1a8fa663a27042ed824019a8b51d80d70', '1f0c847aa0b0ea9240bb15f01347ec3d0', '293bed62e3ae658004a2b1b8be0f6be40', '2d3bc127a505c093a005f4b597fce1300', '1cd3049be4b5fbfa3bda2a49822d538c0', '18e4494b2cc509f691ecbb24c5db501c0', '1b86e47c8f393a3995f5c5182149a3d60', '1fb7edd500afae41b99eba94ddb5c7560', '265744e6ea11142b84073c549357135c0', '1bd5707b4f8d162afc37006ac0f6b6890', '1d3586fec96a274c2ca2dfecfbafb16c0', '2cf9a5e1423158a7d8948c5097df73140', '26f95eba5d14c940cf3c724f631be0ff0', '1ccc53ef44d89ec1e5182c8b283251e50', '251988cf728659183c9763f6101962340', '1f6ac89d6291e9242300ecc592d991ad0', '2cc9ef3933df6e0ce1bfdafaa6b7fb880', '1eb445c40519ce6a8c7f2b3f8a8997030', '2235ed2aa02df0ed14880195386a96f70', '2cda3233e4a14d7741f06f32808ec6e20', '21d8f53feea37d90311ae029219d18bc0', '21d8f53feea37d90311ae029219d18bc1', '25d9d20f9c717614c46d06163670e6ee0', '221d2a031daf3a02c5c0d5cd42df30770', '1f96ae0d3c6f700a02e93a3a37f218be0', '1e0cfcf7a54eb64bc3a40a8c208d65090', '1869340dce883ac1e5df24c121a7a7e40', '22b7b60bb5fb4ecd70ff5bb80f2cf4650', '1c90eb71a87538b3e3ac14c2dfc4dbcb0', '28bc603c64408a7ac39ce2ef6ee00bb60', '1879e9af6719f43f8cd99f3a80ffa8980', '1eef6de349d0f7398ac428acef2d06c50', '212a3a8a22e30f7a8294f824b2d4cc470', '24d8dae68b3a0cbb73c6efb065cb50ad0', '2a3f2dee024277265d0dd33f2d2697000', '2bbf4a72dffc7dc8fb75202403d104d80', '188e5ae6704c6eefb7ee9c2bb7ddd7630', '2c660ca5606bc9226d36f9370de0f7ef0', '260c5f5853d52aef3baa7e758dee3f750', '1f930c8420c27d860c2a06901aee0e1d0', '2c2c777fbe9abef35025b8a5d6f3b8820', '286309fbdfb4f23cb24d92a8bd190a240', '1ca5bfdabdd5ed1c78fd835fc14efeb80', '1e256736885dbd9bd860431d537cc0eb0', '298f15eb7d991576dc5c82df20e9b4df0', '27854ce36ac05aec0cb0c715e398765a0', '1e72e231e6e0378236d8eab943db08670', '2b70ca9cfc53f905fc896eb2ea642e9f0', '1641c515ddb7a94c05fc0a9f37e783680', '1aeb80ce493c40243e6daa130ed65b140', '2c37471b1f7e0f50faae57c12925a5fa0', '1e4e04a1d90b1a912604c99311f8e1710', '1c241f2c56036c29cfb7de5e56c713370', '1bce2ef786bd8910fd14835167e234500', '1b34d66b7b0dd197ccad57dc4b4942910', '1b78e781e087650e6c8658790c8995740', '2a6c5bda0b52e6a573bf6bb4f53ec5f30', '28092e14a0a41f69326025a191ae5b130', '219954ecdee511c5d36d22f4150811e40', '202c4476c7ad7ab8c6536a05ab1447e90', '1d4ff98eeba9703429867181772b24d30', '22d146eaea86d73a9bfffa8b9d1d2db20', '27d968ea653a6166780078ffe69d3beb0', '2686af842497cb237c1f88a72b295fa70', '23d7c06879b24e68a1b930edb69961a20', '292cc668f614149092144867d22c41fd0', '1ea9aa1991684b76020a56d518c9886b0', '1ea9aa1991684b76020a56d518c9886b1', '2d8711179906c9130afbf177fd5146280', '260558466815bbc116d8162e189d52bf0', '212ec3eff158f43e8c832b3fab024eec0', '24b0c4efbfed29a69650d13c16e1ef3c0', '1a366d24b29726ce14936bece6bc53aa0', '1f1d75b9baf560878dc1f9aba633abe20', '1a44595c62c1f1d91a4a07f8edf9f1930', '1b2763cb802979f54a89afbb82a2f4370', '17d989d23cc60bc2ab12f59c5ece1fa00', '17dae0bc0747aeb2fcadd8003e090e430', '1e1883b72f176b811df3596722ba00d00', '1c24ab03a442a53d9072713c17bf1b850', '26310af6b4c9a48d82258cc5524e4af60', '1764ee02e7940b8900f4bd846dd18b9a0', '2121cd94db17df72255310a87451c0ac0', '2c532a06a3dbb6a512d8011516c3537b0', '271b9342e8f521b75f58957e98bd687e0', '22d3a176aaa379d0559e13fd60ab0f230', '26cff92ebd2d46572ab1b5cb2142f7460', '2839d421834977f21cd0cf64eacfe5f30', '1789912dd1a4fc7a77a28779a447e03b0', '2aa56dea2b347fbebbc0a56e9f5f3fe10', '186058b4a54874f4c3e7024c6b0c8c530', '20926cf75b00193230405461c039e4b30', '261cfc1e945aa0babdd06d9c2b3646910', '23933095c4d1b61315a2cbf32a6806ab0', '200f2a45ce8f687d637bf1c7935ca0670', '2cd3cf079f6d4f3800cd8da6bc122a640', '1feeff75440a78f319387b4c53f418430', '2af014dca471c7617e794d87f4a937d10', '2d5f6de0dde47a20856771597187356a0', '185bb6730327dbed61a5a0b2fb5b20b90', '2365f6515cf123d26d6a580d5c1aa6ca0', '2b5213c8fb0720f7df160eadc9befae80', '16c447c223f1a13692b3e16f2e2871ae0', '1637ac6721827d84652bcfddeebf2fa80', '16354b34fbf2d7f6dca80af32ec4cedb0', '1c4e8a9876e138a1d115653b059afbb00', '19cb66ee58a230789fefd700cf5606e10', '2845c0228f1187681829c2635f0d93080', '2018898e7524282b0c23207f2e3c4ab30', '1d52835936ccfb0d88180e2ea759ea940', '20651b9e26956dcd34310f3ed9fde4560', '19e85dd4160dc8fdcb4dd1423e8a7f940', '1bbadd531272ba1e72c7ac6b80c1fd7c0', '23ef092f52b277c46057f414619bc5360', '29108eb95e8d4f2e94c3944bf02c719e0', '1a5340cd7c58cb7d82f776f4d65e6b6e0', '1d5e264cedf9c2d1b80fc1a57d6bf1fc0', '1d5e264cedf9c2d1b80fc1a57d6bf1fc1', '280a28b5730228bb1ebc1e8660323f380', '1db54a1ca8414128124ded701910efc20', '27b6d6608c7a95fa2289ae4603d0ae0f0', '2541c142cb6abcf3ae6fc69675d0fca50', '2d815ad7937051644422e99bae2026560', '2be29747a67ec4625072ab5b7588b0d90', '1bb046a201587565473b30e2467a190 len(18741)"]

如果是有大量请求的话,qps上也看不出太大变化,11号中午12:20分



你说的这个numa资源隔离好像没有,我又看了一下node监控里面的numa memory好像oom是不足了

v7.1里面对应的是哪一个bug啊

重启之前好像没有expensive_query ,OOM重启之后才有,楼下发了日志

升级新版本,配置 tidb-server 内存,混合部署要做好资源配置,减少 OOM 情况。

3个节点同时oom ,不像是某一个大SQL导致,集群配置看下 tiup cluster displayXXXX , tiup cluster show-config XXX

看一下 tidb 集群的部署架构,tiup cluster display 一下,以及对应机器的配置。

然后看一下监控面板 tikv、system info 中内存的使用情况

tiup cluster edit-config 看下有numa_node关键字没有,另外可以numactl --hardware看下你机器的numa情况


例如我一个机器内存有192G,有两个numa节点,然后上面部署了PD和tidb-server,PD numa_node绑定的0,tidb-server numa_node绑定的1,那我的tidb-server最多能用的内存就是96G,即使PD所在numa_node 0 的内存还有剩余,tidb-server也不会用,超过96G内存就会oom被杀掉。