TiDB GC life time is shorter than transaction duration

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【概述】 场景 + 问题概述

【应用框架及开发适配业务逻辑】

   就简单的执行了一条SQl delete from xxx where xxx = ?; 

  TiDB程序直接报错:  
         {"log":"{\"level\":\"WARN\",\"time\":\"2021/07/04 15:48:47.986 +08:00\",\"caller\":\"session.go:1066\",\"message\":\"run statement failed\",\"conn\":348713,\"schemaVersion\":3402,\"error\":\"[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-07-04 14:48:57.043 +0800 CST, GC safe point is 2021-07-04 15:31:29.293 +0800 CST\",\"errorVerbose\":\"[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-07-04 14:48:57.043 +0800 CST, GC safe point is 2021-07-04 15:31:29.293 +0800 CST\\ngithub.com/pingcap/errors.AddStack\\n\\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20200917111840-a15ef68f753d/errors.go:174\\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\\n\\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20200917111840-a15ef68f753d/terror_error.go:204\\ngithub.com/pingcap/tidb/store/tikv.(*tikvStore).CheckVisibility\\n\\t/go/src/github.com/pingcap/tidb/store/tikv/kv.go:190\\ngithub.com/pingcap/tidb/store/tikv.(*copIterator).Next\\n\\t/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:671\\ngithub.com/pingcap/tidb/distsql.(*selectResult).fetchResp\\n\\t/go/src/github.com/pingcap/tidb/distsql/select_result.go:105\\ngithub.com/pingcap/tidb/distsql.(*selectResult).Next\\n\\t/go/src/github.com/pingcap/tidb/distsql/select_result.go:169\\ngithub.com/pingcap/tidb/executor.(*tableResultHandler).nextChunk\\n\\t/go/src/github.com/pingcap/tidb/executor/table_reader.go:285\\ngithub.com/pingcap/tidb/executor.(*TableReaderExecutor).Next\\n\\t/go/src/github.com/pingcap/tidb/executor/table_reader.go:174\\ngithub.com/pingcap/tidb/executor.Next\\n\\t/go/src/github.com/pingcap/tidb/executor/executor.go:253\\ngithub.com/pingcap/tidb/executor.(*DeleteExec).deleteSingleTableByChunk\\n\\t/go/src/github.com/pingcap/tidb/executor/delete.go:87\\ngithub.com/pingcap/tidb/executor.(*DeleteExec).Next\\n\\t/go/src/github.com/pingcap/tidb/executor/delete.go:48\\ngithub.com/pingcap/tidb/executor.Next\\n\\t/go/src/github.com/pingcap/tidb/executor/executor.go:253\\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\\n\\t/go/src/github.com/pingcap/tidb/executor/adapter.go:516\\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay\\n\\t/go/src/github.com/pingcap/tidb/executor/adapter.go:398\\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\\n\\t/go/src/github.com/pingcap/tidb/executor/adapter.go:354\\ngithub.com/pingcap/tidb/session.runStmt\\n\\t/go/src/github.com/pingcap/tidb/session/tidb.go:286\\ngithub.com/pingcap/tidb/session.(*session).executeStatement\\n\\t/go/src/github.com/pingcap/tidb/session/session.go:1063\\ngithub.com/pingcap/tidb/session.(*session).execute\\n\\t/go/src/github.com/pingcap/tidb/session/session.go:1175\\ngithub.com/pingcap/tidb/session.(*session).Execute\\n\\t/go/src/github.com/pingcap/tidb/session/session.go:1106\\ngithub.com/pingcap/tidb/server.(*TiDBContext).Execute\\n\\t/go/src/github.com/pingcap/tidb/server/driver_tidb.go:248\\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\\n\\t/go/src/github.com/pingcap/tidb/server/conn.go:1354\\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\\n\\t/go/src/github.com/pingcap/tidb/server/conn.go:985\\ngithub.com/pingcap/tidb/server.(*clientConn).Run\\n\\t/go/src/github.com/pingcap/tidb/server/conn.go:772\\ngithub.com/pingcap/tidb/server.(*Server).onConn\\n\\t/go/src/github.com/pingcap/tidb/server/server.go:421\\nruntime.goexit\\n\\t/usr/local/go/src/runtime/asm_amd64.s:1357\",\"session\":\"{\\n  \\\"currDBName\\\": \\\"tregistry\\\",\\n  \\\"id\\\": 348713,\\n  \\\"preparedStmtCount\\\": 17,\\n  \\\"status\\\": 2,\\n  \\\"strictMode\\\": true,\\n  \\\"user\\\": {\\n    \\\"Username\\\": \\\"root\\\",\\n    \\\"Hostname\\\": \\\"12.12.12.12\\\",\\n    \\\"CurrentUser\\\": false,\\n    \\\"AuthUsername\\\": \\\"root\\\",\\n    \\\"AuthHostname\\\": \\\"%\\\"\\n  }\\n}\"}\n","stream":"stdout","time":"2021-07-04T07:48:47.987156224Z"}

【背景】 做过哪些操作
Tikv gc life time 是10分钟

经过对TiDB 的日志分析发现每次在报这种 TiDB GC life time is shorter than transaction duration 错误之前,总会执行一条错误的sql (这个是业务方的错误)

{"log":"{"level":"INFO","time":"2021/07/04 14:48:57.080 +08:00","caller":"conn.go:787","message":"command dispatched failed","conn":348713,"connInfo":"id:348713, addr: 12.12.12.12:59366 status:10, collation:utf8_general_ci, user:root","command":"Prepare","status":"inTxn:0, autocommit:1","sql":"update table_xxx phxxx = ? where typexx = ? and name=? and repxxx_id =?","txn_mode":"PESSIMISTIC","err":"[parser:1064]You have an error in your SQL syntax; check the manual that corresponds to your TiDB version for the right syntax to use line 1 column 27 near \"= ? where type = ? and name=? and repository_id =?\" "}\n","stream":"stdout","time":"2021-07-04T06:48:57.080519247Z"}

`{"log":"{\"level\":\"INFO\",\"time\":\"2021/07/04 15:48:47.986 +08:00\",\"caller\":\"tidb.go:218\",\"message\":\"rollbackTxn for ddl/autocommit failed\"}\n","stream":"stdout","time":"2021-07-04T07:48:47.986802667Z"}`

但是 “TiDB GC life time is shorter than transaction duration” 这条错误日志中 transaction starts at 2021-07-04 14:48:57.043 +0800 CST 事务实际开始的时间是和上面这条update 格式错误的sql语句执行时间是对的上的, 并且这两条sql 是在同一个connection 中执行的。

并且TiDB 日志中还有这样的错误日志
“message”:“rollbackTxn for ddl/autocommit failed”

【现象】 业务和数据库现象

业务同一个connection在执行了错误格式的update 语句过了10分钟后,再去执行其他 delete 语句就会报错 "GC life time is shorter than transaction duration, transaction starts at 2021-07-04 14:48:57.043 +0800 CST, GC safe point is 2021-07-04 15:31:29.293 +0800 CST"

【问题】 当前遇到的问题

 现在怀疑在TiDB同一个connection中 执行一条错误格式的 update 这种 autocommit 的语句,导致事务提交不了,回滚也失败,会影响后面执行delete 这种语句的事务开始时间。

【业务影响】

 业务sql无法正常执行

【TiDB 版本】
v4.0.8
【附件】 相关日志及监控(https://metricstool.pingcap.com/)


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

1赞

可能和这个 PR https://github.com/pingcap/tidb/issues/24640 有关,先验证下,是否报错时间早于 gc 设置的 safe point 时间,如果是,尝试重启 tidb-server 看看能否解决。

让业务端把update 这条bug 语句修复了, 目前没有了GC life time 的错误,还在持续关注

看了下这个issue 的描述, TiDB报的错误是一样的, 但是具体的深入的原因可能还是有点区别

错误的SQL指的什么?是语法错误,还是什么?
这两个SQL在一个事物里吗?

同一个connection , 先执行一条 语法错误的update 语句,失败报错了 ! 过了一个小时再执行一条简单的delete 语句就会报这个错 “GC life time is shorter than transaction duration, transaction starts at 2021-07-04 14:48:57.043 +0800 CST, GC safe point is 2021-07-04 15:31:29.293 +0800 CST”。

业务代码中,这两条sql 不在同一个事务中。

现在业务方修掉 update 语法错bug 之后, 没有这个问题了。