tidb请求中断&重启原因未知

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:2.1.4
  • 【问题描述】:查询时,执行一半后提示can’t access to MySQL server

登录tidb机器,执行 ps -aux | grep tidb-server 看应该是有重新启动过: [root@TiDBserver ~]# ps -aux | grep tidb-ser tidb 7185 268 60.4 42260056 39785168 ? Ssl 10:28 14:17 bin/tidb-server -P 4000 --status=10080 --advertise-address=50.16.170.104 --path=50.16.170.114:2379,50.16.170.115:2379 --config=conf/tidb.toml --log-file=/data/deploy/log/tidb.log

这个显现大概每2天会出现一次

不知道如何debug确认重启原因

查询语句参考: select jra.entity_type, jra.entity_value, count(DISTINCT CONCAT(LEFT(jra.xzqh,2),‘0000’) ) as p_count, count(DISTINCT CONCAT(LEFT(jra.xzqh,4),‘00’) ) as pc_count, count(DISTINCT jra.jq_id) as jq_id_count, min(CONCAT(jte.BJNR ,jte.CLJGNR )) as jjnr_cljgnr FROM jingqing_related_all jra join jingqing_table jte on jra.jq_id = jte.JQBH group by jra.entity_type,jra.entity_value order by jq_id_count DESC limit 1000

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

重启时的log: 2020/02/11 10:35:04.877 region_cache.go:482: [info] drop regions that on the store 4(50.16.170.111:20160) due to send request fail, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2020/02/11 10:35:04.894 region_cache.go:482: [info] drop regions that on the store 1(50.16.170.112:20160) due to send request fail, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2020/02/11 10:35:05.990 region_cache.go:482: [info] drop regions that on the store 5(50.16.170.113:20160) due to send request fail, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2020/02/11 10:35:21.916 region_cache.go:482: [info] drop regions that on the store 5(50.16.170.113:20160) due to send request fail, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2020/02/11 10:35:24.164 region_cache.go:482: [info] drop regions that on the store 1(50.16.170.112:20160) due to send request fail, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2020/02/11 10:35:48.177 printer.go:52: [info] Config: {“host”:“0.0.0.0”,“advertise-address”:“50.16.170.104”,“port”:4000,“store”:“tikv”,“path”:“50.16.170.114:2379,50.16.170.115:2379”,“socket”:"",“lease”:“45s”,“run-ddl”:true,“split-table”:true,“token-limit”:1000,“oom-action”:“log”,“mem-quota-query”:34359738368,“enable-streaming”:false,“txn-local-latches”:{“enabled”:false,“capacity”:10240000},“lower-case-table-names”:2,“log”:{“level”:“info”,“format”:“text”,“disable-timestamp”:false,“file”:{“filename”:"/data/deploy/log/tidb.log",“log-rotate”:true,“max-size”:300,“max-days”:0,“max-backups”:0},“slow-query-file”:"",“slow-threshold”:300,“expensive-threshold”:10000,“query-log-max-len”:2048},“security”:{“skip-grant-table”:false,“ssl-ca”:"",“ssl-cert”:"",“ssl-key”:"",“cluster-ssl-ca”:"",“cluster-ssl-cert”:"",“cluster-ssl-key”:""},“status”:{“report-status”:true,“status-port”:10080,“metrics-addr”:"",“metrics-interval”:15},“performance”:{“max-procs”:0,“tcp-keep-alive”:true,“cross-join”:true,“stats-lease”:“3s”,“run-auto-analyze”:true,“stmt-count-limit”:5000,“feedback-probability”:0.05,“query-feedback-limit”:1024,“pseudo-estimate-ratio”:0.8,“force-priority”:“NO_PRIORITY”},“xprotocol”:{“xserver”:false,“xhost”:"",“xport”:0,“xsocket”:""},“prepared-plan-cache”:{“enabled”:false,“capacity”:100},“opentracing”:{“enable”:false,“sampler”:{“type”:“const”,“param”:1,“sampling-server-url”:"",“max-operations”:0,“sampling-refresh-interval”:0},“reporter”:{“queue-size”:0,“buffer-flush-interval”:0,“log-spans”:false,“local-agent-host-port”:""},“rpc-metrics”:false},“proxy-protocol”:{“networks”:"",“header-timeout”:5},“tikv-client”:{“grpc-connection-count”:16,“grpc-keepalive-time”:10,“grpc-keepalive-timeout”:3,“commit-timeout”:“41s”},“binlog”:{“enable”:false,“write-timeout”:“15s”,“ignore-error”:false,“binlog-socket”:""},“compatible-kill-query”:false,“check-mb4-value-in-utf8”:true}

可以检查一下系统日志中,是不是由于 OOM 导致的 TiDB 重启

收到,这个应该查询哪个tidb节点合适,以及有没有相关命令可以参考,谢谢!

查询 tidb 发生过重启的节点

grep -i "out of memory" /var/log/messages 或者 grep -i "oom" /var/log/messages

谢谢,是的,那这类问题应该如何解决或者避免呢,升级3.0会有改善吗? Feb 11 10:35:30 TiDBserver kernel: Out of memory: Kill process 7197 (tidb-server) score 976 or sacrifice child Feb 11 10:35:30 TiDBserver kernel: Out of memory: Kill process 7244 (tidb-server) score 976 or sacrifice child Feb 11 11:05:28 TiDBserver kernel: Out of memory: Kill process 7281 (tidb-server) score 975 or sacrifice child Feb 11 11:10:59 TiDBserver kernel: Out of memory: Kill process 7400 (tidb-server) score 976 or sacrifice child Feb 11 12:52:29 TiDBserver kernel: Out of memory: Kill process 7480 (tidb-server) score 976 or sacrifice child Feb 11 13:00:59 TiDBserver kernel: Out of memory: Kill process 7596 (tidb-server) score 976 or sacrifice child Feb 11 13:08:53 TiDBserver kernel: Out of memory: Kill process 7690 (tidb-server) score 977 or sacrifice child Feb 11 13:08:53 TiDBserver kernel: Out of memory: Kill process 7694 (tidb-server) score 977 or sacrifice child Feb 11 13:08:53 TiDBserver kernel: Out of memory: Kill process 7699 (tidb-server) score 977 or sacrifice child Feb 11 13:08:53 TiDBserver kernel: Out of memory: Kill process 7703 (tidb-server) score 977 or sacrifice child Feb 11 13:08:53 TiDBserver kernel: Out of memory: Kill process 7728 (tidb-server) score 977 or sacrifice child Feb 11 13:55:34 TiDBserver kernel: Out of memory: Kill process 7770 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7867 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7871 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7874 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7895 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7907 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7921 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7936 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7939 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7942 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7959 (tidb-server) score 976 or sacrifice child Feb 11 18:43:25 TiDBserver kernel: Out of memory: Kill process 7965 (tidb-server) score 976 or sacrifice child

Feb 10 17:32:01 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 17:32:01 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 17:32:01 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 17:59:13 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 17:59:13 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 17:59:13 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:39:51 TiDBserver kernel: tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:39:51 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:39:51 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:39:51 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:39:51 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:39:51 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:39:51 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:39:51 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:39:51 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:39:51 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:39:51 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:39:51 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:47:30 TiDBserver kernel: tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:47:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:47:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:47:30 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:47:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:47:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:47:30 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:47:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:47:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:47:30 TiDBserver kernel: in:imjournal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:47:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:47:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:47:30 TiDBserver kernel: grafana-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:47:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:47:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:58:28 TiDBserver kernel: in:imjournal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:58:28 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:58:28 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 18:58:28 TiDBserver kernel: tuned invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 18:58:28 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 18:58:28 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 19:09:38 TiDBserver kernel: grafana-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 19:09:38 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 19:09:38 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 19:25:44 TiDBserver kernel: tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 19:25:44 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 19:25:44 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 10 19:37:44 TiDBserver kernel: grafana-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 10 19:37:44 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 10 19:37:44 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 08:18:58 TiDBserver kernel: tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 08:18:58 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 08:18:58 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 08:51:07 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 08:51:07 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 08:51:07 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 08:51:07 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 08:51:07 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 08:51:07 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 08:51:07 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 08:51:07 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 08:51:07 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 08:51:07 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 08:51:07 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 08:51:07 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 08:51:07 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 08:51:07 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 08:51:07 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:06:30 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:06:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:06:31 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:19:34 TiDBserver kernel: tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:19:34 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:19:34 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:28:40 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:28:40 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:28:40 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:35:30 TiDBserver kernel: irqbalance invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:35:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:35:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:35:30 TiDBserver kernel: irqbalance invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:35:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:35:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:35:30 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:35:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:35:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:35:30 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:35:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:35:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 10:35:30 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 10:35:30 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 10:35:30 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 11:05:28 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 11:05:28 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 11:05:28 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 11:10:59 TiDBserver kernel: tuned invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 11:10:59 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 11:10:59 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 12:52:29 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 12:52:29 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 12:52:29 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:00:59 TiDBserver kernel: tuned invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:00:59 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:00:59 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:08:53 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:08:53 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:08:53 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:08:53 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:08:53 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:08:53 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:08:53 TiDBserver kernel: tuned invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:08:53 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:08:53 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:08:53 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:08:53 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:08:53 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:08:53 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:08:53 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:08:53 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 13:55:34 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 13:55:34 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 13:55:34 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: prometheus invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 11 18:43:25 TiDBserver kernel: in:imjournal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Feb 11 18:43:25 TiDBserver kernel: [] oom_kill_process+0x254/0x3d0 Feb 11 18:43:25 TiDBserver kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name

  1. 目前使用的机器配置是怎么样的?
  2. 可以考虑设置 tidb 参数 oom-action = cancel,当遇到遇到内存使用过大的查询会被 cancel 掉
  3. 设置 mem-quota-query 参数,调整 oom-action 触发的阈值

谢谢! 三个TiKV: 20核, 40GB, 2TB SSD

一个PD Leader + TiDB: 32核,64GB, 500GB SSD

两个PD Follow: 4核,8GB,200GB SAS

请问,这个config在这里找到了:https://pingcap.com/docs/v2.1/reference/configuration/tidb-server/configuration-file/ 如何不重启在线修改调整;不行的话如何离线调整;

目前这两个参数不支持在线修改,参数修改方式: 各个节点上的配置文件是按照中控机上的配置为模板生成的,一般为了统一管理集群的配置文件,建议修改中控机上 tidb-ansible 的 conf 目录下的对应配置模板文件,然后通过 deploy 或者 rolling_update(会包含 deploy 步骤) ,更新各个节点上的配置文件生效。

抱歉还是问一下。。 在中控机上,/home/tidb/tidb-ansible下目录: ansible.cfg common_tasks downloads inventory.ini README.md rolling_update.yml stop_spark.yml bootstrap.yml conf fact_files library requirements.txt scripts stop.yml callback_plugins create_users.yml filter_plugins LICENSE resources start_drainer.yml templates clean_log_cron.yml deploy_drainer.yml graceful_stop.yml local_prepare.yml retry_files start_spark.yml unsafe_cleanup_container.yml cloud deploy_ntp.yml group_vars log roles start.yml unsafe_cleanup_data.yml collect_diagnosis.yml deploy.yml hosts.ini migrate_monitor.yml rolling_update_monitor.yml stop_drainer.yml unsafe_cleanup.yml

/home/tidb/tidb-ansible/conf下的文件 alertmanager.yml common_packages.yml grafana_collector.toml pd.yml spark-defaults.yml ssl tidb.yml tikv.yml binary_packages.yml drainer.toml keys pump.yml spark-env.yml tidb-lightning.yml tikv-importer.yml

应该改哪个文件有参考的步骤嘛

这两个参数是属于 tidb 组件的,修改 conf/tidb.yml 文件即可,注意配置文件需要符合 yaml 语法规范

收到,修改tidb.yaml之后 执行 ansible-playbook deploy.yml嘛? 还是只执行: ``` ansible-playbook start.yml

ansible-playbook rolling_update.yml --tags=tidb 这个命令是滚动更新 tidb 节点

恩,执行了命令,更新的时候如截图: “oom-action”: “cancel”, 但是执行那个select查询语句后,看系统log,还是oom重启了: grep -i “out of memory” /var/log/messages Feb 12 16:29:48 TiDBserver kernel: Out of memory: Kill process 10656 (tidb-server) score 976 or sacrifice child

mem-quota-query 这个参数调整了吗,这个默认是 32 G,控制单个 SQL 内存的使用阈值,达到这个阈值才会触发 oom-action 设置的操作。

没有调整,因为看机器整体是64GB觉得足够就没有调整,刚试了一下改成3GB,执行没有重启了 报错: 2020/02/12 16:39:46.132 2pc.go:141: [info] [BIG_TXN] con:1635 table id:14619 size:1329890, keys:15882, puts:10588, dels:5294, locks:0, startTS:414579888915742722

所以可能是还没到32GB,已经把系统资源占满被系统先kill掉了,而没有触发tidb的oom-aciton

恩,没有调整的时候,看了下tidb-server进程内存已经有60GB了 所以机器是64GB的话,建议这个mem-quota-query 配置多少合适呀(默认的32GB,实际tidb-server能到60GB)

[root@TiDBserver ~]# cat /proc/14531/status Name: tidb-server Umask: 0022 State: S (sleeping) Tgid: 14531 Ngid: 0 Pid: 14531 PPid: 1 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 1024 Groups: 1000 VmPeak: 63773296 kB VmSize: 63773296 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 63460956 kB VmRSS: 63460956 kB

这个根据你们的 SQL 情况以及连接数情况来确定的,如果是简单查询且连接数比较多,可以将值设置小一些

恩,还是确认下,把这四个参数都设置到了12GB,整个过程只有一个sql查询,但是tidb的占用内存还是能够上升到60GB,不知道应该怎么设置合适了,有没有什么计算方法能够推算出:设置的大小和实际占用大小的关系 image