当服务端调用集群数据库出现connection running loop panic

【 TiDB 使用环境】生产环境 /测试/
【 TiDB 版本】v7.1.0
【复现路径】该问题是不定时产生,有时出现有时不出现
【遇到的问题:问题现象及影响】当服务端调用集群数据库出现connection running loop panic
【资源配置】


【附件:截图/日志/监控】
TiDB日志:
[2023/09/23 11:00:56.044 +08:00] [ERROR] [conn.go:1072] [“connection running loop panic”] [conn=5574488768452667013] [lastSQL=“select RES.ID_,\n RES.REV_,\n RES.DUEDATE_,\n RES.PROCESS_INSTANCE_ID_,\n RES.EXCLUSIVE_\n \n from ACT_RU_JOB RES\n\n where (RES.RETRIES_ > 0)\n and (\n \n RES.DUEDATE_ is null or\n \n RES.DUEDATE_ <= ?\n )\n and (RES.LOCK_OWNER_ is null or RES.LOCK_EXP_TIME_ < ?)\n and RES.SUSPENSION_STATE_ = 1\n\n \n\n \n \n\n and ( \n ( \n RES.EXCLUSIVE_ = 1\n and not exists(\n select J2.ID_ from ACT_RU_JOB J2\n where J2.PROCESS_INSTANCE_ID_ = RES.PROCESS_INSTANCE_ID_ – from the same proc. inst.\n and (J2.EXCLUSIVE_ = 1) – also exclusive\n and (J2.LOCK_OWNER_ is not null and J2.LOCK_EXP_TIME_ >= ?) – in progress\n )\n )\n or\n \n RES.EXCLUSIVE_ = 0\n \n )\n\n \n\n \n LIMIT ? OFFSET ? [arguments: ("2023-09-23 11:00:56.042000", "2023-09-23 11:00:56.042000", "2023-09-23 11:00:56.042000", 3, 0)]”] [err=“runtime error: invalid memory address or nil pointer dereference”] [stack=“github.com/pingcap/tidb/server.(*clientConn).Run.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1075\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:837\ngithub.com/pingcap/tidb/planner/core.getJoinHints\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/hints.go:128\ngithub.com/pingcap/tidb/planner/core.genHintsFromSingle\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/hints.go:248\ngithub.com/pingcap/tidb/planner/core.GenHintsFromFlatPlan\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/planner/core/hints.go:47\ngithub.com/pingcap/tidb/executor.getEncodedPlan\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1700\ngithub.com/pingcap/tidb/executor.(*ExecStmt).SummaryStmt.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1757\ngithub.com/pingcap/tidb/util/stmtsummary.newStmtSummaryByDigestElement\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/stmtsummary/statement_summary.go:635\ngithub.com/pingcap/tidb/util/stmtsummary.(*stmtSummaryByDigest).add.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/stmtsummary/statement_summary.go:590\ngithub.com/pingcap/tidb/util/stmtsummary.(*stmtSummaryByDigest).add\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/stmtsummary/statement_summary.go:601\ngithub.com/pingcap/tidb/util/stmtsummary.(*stmtSummaryByDigestMap).AddStatement\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/stmtsummary/statement_summary.go:344\ngithub.com/pingcap/tidb/util/stmtsummary/v2.Add\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/stmtsummary/v2/stmtsummary.go:537\ngithub.com/pingcap/tidb/executor.(*ExecStmt).SummaryStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1855\ngithub.com/pingcap/tidb/executor.(*ExecStmt).FinishExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1342\ngithub.com/pingcap/tidb/executor.(*ExecStmt).CloseRecordSet\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1415\ngithub.com/pingcap/tidb/executor.(*recordSet).Close\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:180\ngithub.com/pingcap/tidb/session.(*execStmtResult).Close\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2425\ngithub.com/pingcap/tidb/server.(*tidbResultSet).Close\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:446\ngithub.com/pingcap/tidb/parser/terror.Call\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:315\ngithub.com/pingcap/tidb/server.(*clientConn).executePreparedStmtAndWriteResult\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn_stmt.go:357\ngithub.com/pingcap/tidb/server.(*clientConn).executePlanCacheStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn_stmt.go:226\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmtExecute\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn_stmt.go:218\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1399\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1153\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:677”]
服务端日志:
2023-06-18 12:53:32.007 [Druid-ConnectionPool-Create-180489140] [com.alibaba.druid.pool.DruidDataSource]
ERROR: create connection SQLException, url: jdbc:mysql://10.17.17.70:4000/ofs_xa?useSSL=false&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&transformedBitIsBoolean=true&serverTimezone=GMT%2B8&nullCatalogMeansCurrent=true&allowPublicKeyRetrieval=true, errorCode 0, state 08S01
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:828)
at com.mysql.cj.jdbc.ConnectionImpl.(ConnectionImpl.java:448)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:241)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198)
at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:156)
at com.alibaba.druid.filter.stat.StatFilter.connection_connect(StatFilter.java:218)
at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:150)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1646)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1710)
at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2777)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:89)
at com.mysql.cj.NativeSession.connect(NativeSession.java:120)
at com.mysql.cj.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:948)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:818)
… 9 common frames omitted
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:608)
at com.mysql.cj.protocol.StandardSocketFactory.connect(StandardSocketFactory.java:153)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:63)
… 12 common frames omitted

数据库配置:
tidb:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://192.168.2.140:4000/test_ofs_xa?useSSL=false&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&transformedBitIsBoolean=true&serverTimezone=GMT%2B8&nullCatalogMeansCurrent=true&allowPublicKeyRetrieval=true&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSize=100&prepStmtCacheSqlLimit=1024&rewriteBatchedStatements=true&allowMultiQueries=true
username: test_ofs_xa
password: kkMyei8Vb2Gvc34L
druid:
validation-query-timeout: 2000
# 启动程序时,在连接池中初始化多少个连接
initial-size: 50
# 回收空闲连接时,将保证至少有多少个连接
min-idle: 50
# 连接池中最多支持多少个活动会话
max-active: 50
# 程序向连接池中请求连接时,超过此值后即认为本次请求失败,即连接池没有可用连接。单位毫秒,设置-1时表示无限等待
max-wait: 60000
# 检查空闲连接的频率,单位毫秒, 非正整数时表示不进行检查(默认1分钟)
time-between-eviction-runs-millis: 60000
# 下次连接池空闲检查时将会回收已经空闲此设置毫秒数的连接(默认5分钟)
min-evictable-idle-time-millis: 300000
# 检查数据源是否存活
validation-query: select 1

应该是遇到 bug 了🤔 要不要升级到 714 看看呢?

这个是TiDB的BUG吗,有时这个SQL就正常运行 有时就出现连接失败。复现也比较困难。

你现在是 710 升级到 714 吧,tidb 第三个版本是补丁版本,修复了 4 个版本的 bug 了。万一你这个可能是已知问题呢。

在社区里面确实见到过类似问题 如 :tidb 7.1.0连接关闭错误

https://github.com/pingcap/tidb/issues/46791

应该是上面这个bug。

https://github.com/pingcap/tidb/pull/48642

往7.1分支上合并的时间是23.12.4日。

最小升级版本。7.1.3

升版吧

只能试试升级版本看看了

不过看看是偶发还是必发的问题,不然还不好复现