Lost connection to MySQL server during query 问题

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:V4.0
  • 【问题描述】:查询大表语句报连接丢失,以下是查询的相关信息,请帮忙确认下是否和参数net_read_timeout 有关?还是因为OOM的问题?谢谢。

查询表报错
MySQL [hdb_broker]> SELECT r_build_id, COUNT(1) FROM cli WHERE r_build_id IS NOT NULL GROUP BY r_build_id HAVING COUNT(1) > 1;
ERROR 2013 (HY000): Lost connection to MySQL server during query

表所占空间大小
select concat(truncate(sum(data_length)/1024/1024,2),’ MB’) as data_size, concat(truncate(sum(index_length)/1024/1024,2),‘MB’) as index_size from information_schema.tables where TABLE_NAME=‘cli’;
±------------±-----------+
| data_size | index_size |
±------------±-----------+
| 21140.11 MB | 6860.04MB |
±------------±-----------+

数据库相关参数
show variables like ‘%timeout’;
±-------------------------------±---------+
| Variable_name | Value |
±-------------------------------±---------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| innodb_flush_log_at_timeout | 1 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | 0 |
| interactive_timeout | 28800 |
| lock_wait_timeout | 31536000 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| rpl_semi_sync_master_timeout | |
| rpl_stop_slave_timeout | 31536000 |
| slave_net_timeout | 3600 |
| tidb_wait_split_region_timeout | 300 |
| wait_timeout | 28800 |
±-------------------------------±---------+

show VARIABLES like ‘%max_allowed_packet%’;

±-------------------------±-----------+
| Variable_name | Value |
±-------------------------±-----------+
| max_allowed_packet | 67108864 |
| slave_max_allowed_packet | 1073741824 |
±-------------------------±-----------+

查询过程中的系统日志
dmesg -T | grep -i oom|tail
[Thu Aug 6 09:15:54 2020] [] ? oom_unkillable_task+0xcd/0x120
[Thu Aug 6 09:15:54 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Thu Aug 6 09:15:54 2020] systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Thu Aug 6 09:15:54 2020] [] oom_kill_process+0x254/0x3e0
[Thu Aug 6 09:15:54 2020] [] ? oom_unkillable_task+0xcd/0x120
[Thu Aug 6 09:15:54 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Thu Aug 6 09:15:54 2020] systemd-journal invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[Thu Aug 6 09:15:54 2020] [] oom_kill_process+0x254/0x3e0
[Thu Aug 6 09:15:54 2020] [] ? oom_unkillable_task+0xcd/0x120
[Thu Aug 6 09:15:54 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name

刚刚重新执行这条语句,然后实时观察主机内存 free -g ,发现所有内存都被吃光了,对于这种情况,有什么优化的方法吗?
查询前内存
total used free shared buff/cache available
Mem: 31 0 29 0 0 30
Swap: 1 0 1

查询过程中的内存
total used free shared buff/cache available
Mem: 31 28 2 0 0 2
Swap: 1 0 1

          total        used        free      shared  buff/cache   available

Mem: 31 29 0 0 0 0
Swap: 1 0 1

  1. 优化 sql
  2. mem-quota-query 配置参数,遇到占用多的 sql ,取消执行

https://docs.pingcap.com/zh/tidb/stable/tidb-configuration-file#tmp-storage-path