每次查询sql时候会引起ERROR 2013 (HY000): Lost connection to MySQL server during query

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】

linux
【概述】 场景 + 问题概述
查询一条sql,出现了ERROR 2013 (HY000): Lost connection to MySQL server during query
观看错误日志,有如下信息

尝试按照这篇解决,并没有起作用

【背景】 做过哪些操作
sql语句如下,单独执行select可以成功,目前尚不知如何解决
insert ignore into (cid,period)
select cid,date_format(
,‘%Y-%m-01’) as period
from ***
where >=date_format(current_date,‘%Y-%m-01’) - interval ‘3’ month
and record_type in ('
‘,’
‘,’‘,’‘) and order_type in (’‘,’‘,’*')
group by 1,2
having (cid,period) not in (select cid,period from *** group by 1,2)

【现象】 业务和数据库现象

【问题】 当前遇到的问题

【业务影响】
某些sql有问题
【TiDB 版本】
v4.0.0
【应用软件及版本】

【附件】 相关日志及配置信息

  • TiUP Cluster Display 信息
  • TiUP CLuster Edit config 信息

监控(https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana监控
  • TiDB Grafana 监控
  • TiKV Grafana 监控
  • PD Grafana 监控
  • 对应模块日志(包含问题前后 1 小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

1 个赞

建议检查下,在执行上述 SQL 期间,TiDB Server 的运行状态,比如是否出现过异常重启,OOM ~

没有,这点我确定,每一次执行这个sql不出两秒就会报这个,已经多次执行,并查看了tidb的运行状态,没有问题,暂时不清楚什么引起的,我把group by 替换为distinct就会查出对应结果,很奇怪

1、原始 SQL 运行期间的 TiDB Server 的 log,stderr.log

2、原始 SQL 执行期间 tidb server 的 grafana 的监控,不用太久,包括运行时间点十几分钟就好

3、去掉 insert 后,尝试看下执行计划:
explain analyze select cid,date_format(,’%Y-%m-01’) as period
from ***
where >=date_format(current_date,’%Y-%m-01’) - interval ‘3’ month
and record_type in (’’,’’,’’,’’) and order_type in (’’,’’,’*’)
group by 1,2
having (cid,period) not in (select cid,period from *** group by 1,2)

4、新 SQL group by 替换为 distinct 后的执行计划也请提供下

distinct的执行计划

group by 的执行计划

错误日志我需要处理下,稍等

错误日志
tmp.log (385.9 KB)
辛苦大佬看下

好的,收到,这边先看下,有进展会跟帖回复 ~

好的,期待回复

这边看到在 tidb.log 中有下述的报错信息:

[2021/08/09 08:11:39.161 +00:00] [ERROR] [conn.go:660] ["connection running loop panic"] [conn=10076154] [lastSQL"] [err="runtime error: index out of range [2] with length 2"] [stack="goroutine 27048478378 [running]:\
github.com/pingcap/tidb/server.(*clientConn).Run.func1(0x36c5420, 0xc017533350, 0xc007f86000)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:658 +0xee\
panic(0x2fa9060, 0xc00991e1e0)\
\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\
github.com/pingcap/tidb/server.(*clientConn).writeResultset.func1(0x0, 0x36dfe80, 0xc01ba91310, 0xc00b1b3238, 0x36c5420, 0xc017533350, 0xc007f86000)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1358 +0x597\
panic(0x2fa9060, 0xc00991e1e0)\
\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\
github.com/pingcap/tidb/executor.colNames2ResultFields(0xc01c938270, 0xc01ba6e210, 0x2, 0x2, 0xc01b031480, 0x9, 0x20, 0xc01e6b4fa0, 0x2872636)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/adapter.go:81 +0x64f\
github.com/pingcap/tidb/executor.(*recordSet).Fields(0xc01ba912c0, 0x36c5420, 0xc017533350, 0xc01ba912c0)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/adapter.go:72 +0xd7\
github.com/pingcap/tidb/server.(*tidbResultSet).Columns(0xc01ba91310, 0x36c5420, 0xc017533350, 0xc01ba91360)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/driver_tidb.go:428 +0xa8\
github.com/pingcap/tidb/server.(*clientConn).writeChunks(0xc007f86000, 0x36c5420, 0xc017533350, 0x36dfe80, 0xc01ba91310, 0x7200, 0x2f89940, 0xc01ba91310)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1412 +0x454\
github.com/pingcap/tidb/server.(*clientConn).writeResultset(0xc007f86000, 0x36c5420, 0xc017533350, 0x36dfe80, 0xc01ba91310, 0xc00000f100, 0x0, 0x0, 0x0)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1371 +0x206\
github.com/pingcap/tidb/server.(*clientConn).handleQuery(0xc007f86000, 0x36c5420, 0xc017533350, 0xc01a6e6341, 0x19a, 0x0, 0x0)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1279 +0x209\
github.com/pingcap/tidb/server.(*clientConn).dispatch(0xc007f86000, 0x36c5420, 0xc017533350, 0xc01a6e6341, 0x19b, 0x19a, 0x0, 0x0)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:899 +0x5e2\
github.com/pingcap/tidb/server.(*clientConn).Run(0xc007f86000, 0x36c5420, 0xc017533350)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:713 +0x27c\
github.com/pingcap/tidb/server.(*Server).onConn(0xc001026000, 0xc007f86000)\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/server.go:415 +0xb12\
created by github.com/pingcap/tidb/server.(*Server).Run\
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/server.go:333 +0x709\
"]

请确认下执行下述 SQL 时间点前后,是否有出现过上面的报错信息:

explain analyze select cid,date_format(,’%Y-%m-01’) as period
from ***
where >=date_format(current_date,’%Y-%m-01’) - interval ‘3’ month
and record_type in (’’,’’,’’,’’) and order_type in (’’,’’,’*’)
group by 1,2
having (cid,period) not in (select cid,period from *** group by 1,2)

另外,如果方便,请提供下 select 查询的目标表的表结构 ~

嗯,是有这个错误,每次查询都会有这个错误,查询别的语句就没有事,表结构现在不太方便提供,请问下报这个错误的原因大概都有什么呢

重新梳理下信息:

1、这个 sql 能执行成功,但是会返回下面的报错

  • 在 client 端同时返回 ERROR 2013 (HY000): Lost connection to MySQL server during query 的报错

  • tidb server 的日志同时显示下面的报错

    [2021/08/09 08:11:39.161 +00:00] [ERROR] [conn.go:660] ["connection running loop panic"] [conn=10076154] [lastSQL"] [err="runtime error: index out of range [2] with length 2"] [stack="goroutine 27048478378 [running]:\
    

github.com/pingcap/tidb/server.(*clientConn).Run.func1(0x36c5420, 0xc017533350, 0xc007f86000)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:658 +0xee
panic(0x2fa9060, 0xc00991e1e0)
\t/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/pingcap/tidb/server.(*clientConn).writeResultset.func1(0x0, 0x36dfe80, 0xc01ba91310, 0xc00b1b3238, 0x36c5420, 0xc017533350, 0xc007f86000)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1358 +0x597
panic(0x2fa9060, 0xc00991e1e0)
\t/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/pingcap/tidb/executor.colNames2ResultFields(0xc01c938270, 0xc01ba6e210, 0x2, 0x2, 0xc01b031480, 0x9, 0x20, 0xc01e6b4fa0, 0x2872636)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/adapter.go:81 +0x64f
github.com/pingcap/tidb/executor.(*recordSet).Fields(0xc01ba912c0, 0x36c5420, 0xc017533350, 0xc01ba912c0)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/adapter.go:72 +0xd7
github.com/pingcap/tidb/server.(*tidbResultSet).Columns(0xc01ba91310, 0x36c5420, 0xc017533350, 0xc01ba91360)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/driver_tidb.go:428 +0xa8
github.com/pingcap/tidb/server.(*clientConn).writeChunks(0xc007f86000, 0x36c5420, 0xc017533350, 0x36dfe80, 0xc01ba91310, 0x7200, 0x2f89940, 0xc01ba91310)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1412 +0x454
github.com/pingcap/tidb/server.(*clientConn).writeResultset(0xc007f86000, 0x36c5420, 0xc017533350, 0x36dfe80, 0xc01ba91310, 0xc00000f100, 0x0, 0x0, 0x0)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1371 +0x206
github.com/pingcap/tidb/server.(*clientConn).handleQuery(0xc007f86000, 0x36c5420, 0xc017533350, 0xc01a6e6341, 0x19a, 0x0, 0x0)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:1279 +0x209
github.com/pingcap/tidb/server.(*clientConn).dispatch(0xc007f86000, 0x36c5420, 0xc017533350, 0xc01a6e6341, 0x19b, 0x19a, 0x0, 0x0)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:899 +0x5e2
github.com/pingcap/tidb/server.(*clientConn).Run(0xc007f86000, 0x36c5420, 0xc017533350)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/conn.go:713 +0x27c
github.com/pingcap/tidb/server.(*Server).onConn(0xc001026000, 0xc007f86000)
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/server.go:415 +0xb12
created by github.com/pingcap/tidb/server.(*Server).Run
\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/server/server.go:333 +0x709
"]
```
2、看上面的信息当前集群版本为 v4.0.0,属于比较老的版本。如果不方便提供表结构,统计信息,完整的 sql 执行计划,那么不好确认是已修复的问题还是未修复的新问题,你那边,可以将测试环境升级到最新的 v4.0.14 ,观察下问题是否解决

sql是执行不成功的,2s之内客户端就报错了,应该是没有返回信息的,稍后我看下别的版本是否有样的报错,然后给个反馈,我估计是group by 所涉及到的算法的问题,因为distinct是没有报错的

好的,辛苦将 v4.0 的新版本测试验证后的结果上传下,这边再一起看下 ~

看样子像是oom了, 执行计划没有 tidb 没有下推到 tikv,

你到相关的节点 dmesg -T |grep oom

如果影响业务可以考虑 oom_action : cancel

好的,稍等我测试下,感觉不太像,不可能执行一条语句都oom,之前也看了监控,没有oom的现象,我可能得明天才能弄,这两天需要处理之前环境的问题

试了下4.0.14没有问题

这个没有发现oom,而且我运行很快就出现了报错,我觉得oom也要一段时间才会吧

好的,收到,我这里尝试找下修复该问题的 github issue,另外,也请你那里评估下集群升级的可行性 ~

好的,我这头暂时采用distinct来绕过,然后和开发一起评估下

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。