tidb-server 单条语句内存限制mem-quota-query不生效

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0.8
  • 【问题描述】:tidb-server 单条语句内存限制mem-quota-query不生效

tidb-server 相关配置
server_configs:
tidb:
log.slow-threshold: 500
mem-quota-query: 1073741824
oom-action: cancel

概要

MySQL [sbtest]> show config where Type = ‘tidb’;
±-----±--------------------±------------------------------------------------±----------------------------------------------------------------+
| Type | Instance | Name | Value |
±-----±--------------------±------------------------------------------------±----------------------------------------------------------------+
| tidb | 172.29.238.115:4000 | advertise-address | 172.29.238.115 |
| tidb | 172.29.238.115:4000 | alter-primary-key | false |
| tidb | 172.29.238.115:4000 | binlog.binlog-socket | |
| tidb | 172.29.238.115:4000 | binlog.enable | false |
| tidb | 172.29.238.115:4000 | binlog.ignore-error | false |
| tidb | 172.29.238.115:4000 | binlog.strategy | range |
| tidb | 172.29.238.115:4000 | binlog.write-timeout | 15s |
| tidb | 172.29.238.115:4000 | check-mb4-value-in-utf8 | true |
| tidb | 172.29.238.115:4000 | compatible-kill-query | false |
| tidb | 172.29.238.115:4000 | cors | |
| tidb | 172.29.238.115:4000 | delay-clean-table-lock | 0 |
| tidb | 172.29.238.115:4000 | enable-batch-dml | false |
| tidb | 172.29.238.115:4000 | enable-collect-execution-info | true |
| tidb | 172.29.238.115:4000 | enable-redact-log | 0 |
| tidb | 172.29.238.115:4000 | enable-streaming | false |
| tidb | 172.29.238.115:4000 | enable-table-lock | false |
| tidb | 172.29.238.115:4000 | enable-telemetry | true |
| tidb | 172.29.238.115:4000 | experimental.allow-expression-index | false |
| tidb | 172.29.238.115:4000 | host | 0.0.0.0 |
| tidb | 172.29.238.115:4000 | isolation-read.engines | [“tikv”,“tiflash”,“tidb”] |
| tidb | 172.29.238.115:4000 | lease | 45s |
| tidb | 172.29.238.115:4000 | log.disable-error-stack | null |
| tidb | 172.29.238.115:4000 | log.disable-timestamp | null |
| tidb | 172.29.238.115:4000 | log.enable-error-stack | null |
| tidb | 172.29.238.115:4000 | log.enable-slow-log | true |
| tidb | 172.29.238.115:4000 | log.enable-timestamp | null |
| tidb | 172.29.238.115:4000 | log.expensive-threshold | 10000 |
| tidb | 172.29.238.115:4000 | log.file.filename | /data/tidb/deploy/tidb-4000/log/tidb.log |
| tidb | 172.29.238.115:4000 | log.file.max-backups | 0 |
| tidb | 172.29.238.115:4000 | log.file.max-days | 0 |
| tidb | 172.29.238.115:4000 | log.file.max-size | 300 |
| tidb | 172.29.238.115:4000 | log.format | text |
| tidb | 172.29.238.115:4000 | log.level | info |
| tidb | 172.29.238.115:4000 | log.query-log-max-len | 4096 |
| tidb | 172.29.238.115:4000 | log.record-plan-in-slow-log | 1 |
| tidb | 172.29.238.115:4000 | log.slow-query-file | log/tidb_slow_query.log |
| tidb | 172.29.238.115:4000 | log.slow-threshold | 500 |
| tidb | 172.29.238.115:4000 | lower-case-table-names | 2 |
| tidb | 172.29.238.115:4000 | max-index-length | 3072 |
| tidb | 172.29.238.115:4000 | max-server-connections | 0 |
| tidb | 172.29.238.115:4000 | mem-quota-query | 1073741824 |
| tidb | 172.29.238.115:4000 | new_collations_enabled_on_first_bootstrap | false |
| tidb | 172.29.238.115:4000 | oom-action | cancel |
| tidb | 172.29.238.115:4000 | oom-use-tmp-storage | true |
| tidb | 172.29.238.115:4000 | opentracing.enable | false |
| tidb | 172.29.238.115:4000 | opentracing.reporter.buffer-flush-interval | 0 |
| tidb | 172.29.238.115:4000 | opentracing.reporter.local-agent-host-port | |
| tidb | 172.29.238.115:4000 | opentracing.reporter.log-spans | false |
| tidb | 172.29.238.115:4000 | opentracing.reporter.queue-size | 0 |
| tidb | 172.29.238.115:4000 | opentracing.rpc-metrics | false |
| tidb | 172.29.238.115:4000 | opentracing.sampler.max-operations | 0 |
| tidb | 172.29.238.115:4000 | opentracing.sampler.param | 1 |
| tidb | 172.29.238.115:4000 | opentracing.sampler.sampling-refresh-interval | 0 |
| tidb | 172.29.238.115:4000 | opentracing.sampler.sampling-server-url | |
| tidb | 172.29.238.115:4000 | opentracing.sampler.type | const |
| tidb | 172.29.238.115:4000 | path | 172.29.238.86:2379,172.29.238.113:2379,172.29.238.115:2379 |
| tidb | 172.29.238.115:4000 | performance.agg-push-down-join | false |
| tidb | 172.29.238.115:4000 | performance.bind-info-lease | 3s |
| tidb | 172.29.238.115:4000 | performance.committer-concurrency | 16 |
| tidb | 172.29.238.115:4000 | performance.cross-join | true |
| tidb | 172.29.238.115:4000 | performance.feedback-probability | 0 |
| tidb | 172.29.238.115:4000 | performance.force-priority | NO_PRIORITY |
| tidb | 172.29.238.115:4000 | performance.max-memory | 0 |
| tidb | 172.29.238.115:4000 | performance.max-procs | 0 |
| tidb | 172.29.238.115:4000 | performance.max-txn-ttl | 600000 |
| tidb | 172.29.238.115:4000 | performance.pseudo-estimate-ratio | 0.8 |
| tidb | 172.29.238.115:4000 | performance.query-feedback-limit | 512 |
| tidb | 172.29.238.115:4000 | performance.run-auto-analyze | true |
| tidb | 172.29.238.115:4000 | performance.stats-lease | 3s |
| tidb | 172.29.238.115:4000 | performance.stmt-count-limit | 5000 |
| tidb | 172.29.238.115:4000 | performance.tcp-keep-alive | true |
| tidb | 172.29.238.115:4000 | performance.txn-total-size-limit | 104857600 |
| tidb | 172.29.238.115:4000 | pessimistic-txn.enable | true |
| tidb | 172.29.238.115:4000 | pessimistic-txn.max-retry-count | 256 |
| tidb | 172.29.238.115:4000 | plugin.dir | |
| tidb | 172.29.238.115:4000 | plugin.load | |
| tidb | 172.29.238.115:4000 | port | 4000 |
| tidb | 172.29.238.115:4000 | prepared-plan-cache.capacity | 100 |
| tidb | 172.29.238.115:4000 | prepared-plan-cache.enabled | false |
| tidb | 172.29.238.115:4000 | prepared-plan-cache.memory-guard-ratio | 0.1 |
| tidb | 172.29.238.115:4000 | proxy-protocol.header-timeout | 5 |
| tidb | 172.29.238.115:4000 | proxy-protocol.networks | |
| tidb | 172.29.238.115:4000 | repair-mode | false |
| tidb | 172.29.238.115:4000 | repair-table-list | [] |
| tidb | 172.29.238.115:4000 | run-ddl | true |
| tidb | 172.29.238.115:4000 | security.cluster-ssl-ca | |
| tidb | 172.29.238.115:4000 | security.cluster-ssl-cert | |
| tidb | 172.29.238.115:4000 | security.cluster-ssl-key | |
| tidb | 172.29.238.115:4000 | security.cluster-verify-cn | null |
| tidb | 172.29.238.115:4000 | security.require-secure-transport | false |
| tidb | 172.29.238.115:4000 | security.skip-grant-table | false |
| tidb | 172.29.238.115:4000 | security.ssl-ca | |
| tidb | 172.29.238.115:4000 | security.ssl-cert | |
| tidb | 172.29.238.115:4000 | security.ssl-key | |
| tidb | 172.29.238.115:4000 | server-version | |
| tidb | 172.29.238.115:4000 | skip-register-to-dashboard | false |
| tidb | 172.29.238.115:4000 | socket | |
| tidb | 172.29.238.115:4000 | split-region-max-num | 1000 |
| tidb | 172.29.238.115:4000 | split-table | true |
| tidb | 172.29.238.115:4000 | status.metrics-addr | |
| tidb | 172.29.238.115:4000 | status.metrics-interval | 15 |
| tidb | 172.29.238.115:4000 | status.record-db-qps | false |
| tidb | 172.29.238.115:4000 | status.report-status | true |
| tidb | 172.29.238.115:4000 | status.status-host | 0.0.0.0 |
| tidb | 172.29.238.115:4000 | status.status-port | 10080 |
| tidb | 172.29.238.115:4000 | stmt-summary.enable | true |
| tidb | 172.29.238.115:4000 | stmt-summary.enable-internal-query | false |
| tidb | 172.29.238.115:4000 | stmt-summary.history-size | 24 |
| tidb | 172.29.238.115:4000 | stmt-summary.max-sql-length | 4096 |
| tidb | 172.29.238.115:4000 | stmt-summary.max-stmt-count | 200 |
| tidb | 172.29.238.115:4000 | stmt-summary.refresh-interval | 1800 |
| tidb | 172.29.238.115:4000 | store | tikv |
| tidb | 172.29.238.115:4000 | tikv-client.batch-wait-size | 8 |
| tidb | 172.29.238.115:4000 | tikv-client.commit-timeout | 41s |
| tidb | 172.29.238.115:4000 | tikv-client.copr-cache.admission-max-result-mb | 10 |
| tidb | 172.29.238.115:4000 | tikv-client.copr-cache.admission-min-process-ms | 5 |
| tidb | 172.29.238.115:4000 | tikv-client.copr-cache.capacity-mb | 1000 |
| tidb | 172.29.238.115:4000 | tikv-client.copr-cache.enable | false |
| tidb | 172.29.238.115:4000 | tikv-client.enable-chunk-rpc | true |
| tidb | 172.29.238.115:4000 | tikv-client.grpc-connection-count | 4 |
| tidb | 172.29.238.115:4000 | tikv-client.grpc-keepalive-time | 10 |
| tidb | 172.29.238.115:4000 | tikv-client.grpc-keepalive-timeout | 3 |
| tidb | 172.29.238.115:4000 | tikv-client.max-batch-size | 128 |
| tidb | 172.29.238.115:4000 | tikv-client.max-batch-wait-time | 0 |
| tidb | 172.29.238.115:4000 | tikv-client.overload-threshold | 200 |
| tidb | 172.29.238.115:4000 | tikv-client.region-cache-ttl | 600 |
| tidb | 172.29.238.115:4000 | tikv-client.store-limit | 0 |
| tidb | 172.29.238.115:4000 | tikv-client.store-liveness-timeout | 5s |
| tidb | 172.29.238.115:4000 | tmp-storage-path | /tmp/2004_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage |
| tidb | 172.29.238.115:4000 | tmp-storage-quota | -1 |
| tidb | 172.29.238.115:4000 | token-limit | 1000 |
| tidb | 172.29.238.115:4000 | treat-old-version-utf8-as-utf8mb4 | true |

执行语句为:
select id,k,count(c),sum(k) from sbtest1 where id >=10000 group by pad;
此表数据量较大,2亿。我想达到的效果是通过mem-quota-query 参数限制每条sql 用到的内存最大为1G
但是现在的情况是还会oom

tidb-server 的日志为


[2020/12/31 13:04:39.839 +08:00] [INFO] [printer.go:33] [“Welcome to TiDB.”] [“Release Version”=v4.0.8] [Edition=Community]

补充一下:
刚才测试中发现,有的情况下会出发1G 内存限制,有的情况下还是会oom

如果能够生效,那么是否OOM时,并发很高,虽然都没有到达1G,但是所有sql使用的内存,超过了总内存大小的限制? 可以尝试扩容 tidb,分散到不同tidb-server,或者降低1G为 500M试试?

我这个是新的集群,就我自己手动执行了一条复杂的sql,没有其他链接。
有时不会oom,会提示内存限制那个。有时就会oom 。
我把那个参数降低到500M,再看看。感觉这个机制不是很灵敏

如果是相同的sql,并且只是你手工执行一次,如果 OOM 比较好复现,可以试着在OOM时收集下火焰图

curl -G “ip:port/debug/pprof/heap?seconds=30” > heap.profile 内存

ip地址为tidb服务器的ip,端口为tidb_status_port的端口

我看之前的帖子也有这个问题,不知现在修复了没有。

另外我在快要oom 的时候抓取的火焰图如下
heap.profile4 (25.4 KB)

heap.profile5 (25.5 KB)

  1. 查看火焰图,HashAggPartialWorker占用的很多
  2. 麻烦反馈下 sql 的执行计划 explain analyze sql

目前 4.0 上还不支持统计聚合函数内部的内存使用。
可以考虑用 hint 让它使用 Stream Aggregation 绕过 OOM 的问题。

好的, 多谢。

:+1:

TIDB V5.0.1
使用tiup 的如何操作?
在 tiup cluster edit-config 配置中增加了mem-quota-query仍不生效:
。。。。。。。。
server_configs:
tidb:
log.slow-threshold: 300
mem-quota-query: 34359738368
oom-action: log
。。。。。。。。。。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。