查询 INFORMATION_SCHEMA.TIFLASH_TABLES 报错

查询

mysql> select * from INFORMATION_SCHEMA.TIFLASH_TABLES  limit 1\G
ERROR 1105 (HY000): rpc error: code = DeadlineExceeded desc = context deadline exceeded

详细的错误日志如下:

[2024/11/28 15:37:00.441 +08:00] [INFO] [conn.go:1131] ["command dispatched failed"] [conn=851670586] [session_alias=] [connInfo="id:851670586, addr:172.18.251.224:43088 status:10, collation:utf8mb4_0900_ai_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="select * from INFORMATION_SCHEMA.TIFLASH_TABLES  limit 1"] [txn_mode=PESSIMISTIC] [timestamp=0] [err="rpc error: code = DeadlineExceeded desc = context deadline exceeded\ngithub.com/tikv/client-go/v2/tikvrpc.CallRPC\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/tikvrpc/tikvrpc.go:1095\ngithub.com/tikv/client-go/v2/internal/client.(*RPCClient).sendRequest\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/client/client.go:654\ngithub.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/client/client.go:669\ngithub.com/pingcap/tidb/pkg/store/driver.(*injectTraceClient).SendRequest\n\t/workspace/source/tidb/pkg/store/driver/tikv_driver.go:431\ngithub.com/tikv/client-go/v2/internal/client.interceptedClient.SendRequest\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/client/client_interceptor.go:60\ngithub.com/tikv/client-go/v2/internal/client.reqCollapse.SendRequest\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/client/client_collapse.go:74\ngithub.com/pingcap/tidb/pkg/executor.(*TiFlashSystemTableRetriever).dataForTiFlashSystemTables\n\t/workspace/source/tidb/pkg/executor/infoschema_reader.go:3083\ngithub.com/pingcap/tidb/pkg/executor.(*TiFlashSystemTableRetriever).retrieve\n\t/workspace/source/tidb/pkg/executor/infoschema_reader.go:3008\ngithub.com/pingcap/tidb/pkg/executor.(*MemTableReaderExec).Next\n\t/workspace/source/tidb/pkg/executor/memtable_reader.go:119\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/workspace/source/tidb/pkg/executor/internal/exec/executor.go:283\ngithub.com/pingcap/tidb/pkg/executor.(*LimitExec).Next\n\t/workspace/source/tidb/pkg/executor/executor.go:1369\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/workspace/source/tidb/pkg/executor/internal/exec/executor.go:283\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).next\n\t/workspace/source/tidb/pkg/executor/adapter.go:1216\ngithub.com/pingcap/tidb/pkg/executor.(*recordSet).Next\n\t/workspace/source/tidb/pkg/executor/adapter.go:156\ngithub.com/pingcap/tidb/pkg/server/internal/resultset.(*tidbResultSet).Next\n\t/workspace/source/tidb/pkg/server/internal/resultset/resultset.go:62\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeChunks\n\t/workspace/source/tidb/pkg/server/conn.go:2290\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeResultSet\n\t/workspace/source/tidb/pkg/server/conn.go:2233\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt\n\t/workspace/source/tidb/pkg/server/conn.go:2101\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).handleQuery\n\t/workspace/source/tidb/pkg/server/conn.go:1838\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).dispatch\n\t/workspace/source/tidb/pkg/server/conn.go:1325\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1098\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\ngithub.com/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\ngithub.com/pingcap/tidb/pkg/executor.(*TiFlashSystemTableRetriever).dataForTiFlashSystemTables\n\t/workspace/source/tidb/pkg/executor/infoschema_reader.go:3085\ngithub.com/pingcap/tidb/pkg/executor.(*TiFlashSystemTableRetriever).retrieve\n\t/workspace/source/tidb/pkg/executor/infoschema_reader.go:3008\ngithub.com/pingcap/tidb/pkg/executor.(*MemTableReaderExec).Next\n\t/workspace/source/tidb/pkg/executor/memtable_reader.go:119\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/workspace/source/tidb/pkg/executor/internal/exec/executor.go:283\ngithub.com/pingcap/tidb/pkg/executor.(*LimitExec).Next\n\t/workspace/source/tidb/pkg/executor/executor.go:1369\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/workspace/source/tidb/pkg/executor/internal/exec/executor.go:283\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).next\n\t/workspace/source/tidb/pkg/executor/adapter.go:1216\ngithub.com/pingcap/tidb/pkg/executor.(*recordSet).Next\n\t/workspace/source/tidb/pkg/executor/adapter.go:156\ngithub.com/pingcap/tidb/pkg/server/internal/resultset.(*tidbResultSet).Next\n\t/workspace/source/tidb/pkg/server/internal/resultset/resultset.go:62\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeChunks\n\t/workspace/source/tidb/pkg/server/conn.go:2290\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeResultSet\n\t/workspace/source/tidb/pkg/server/conn.go:2233\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt\n\t/workspace/source/tidb/pkg/server/conn.go:2101\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).handleQuery\n\t/workspace/source/tidb/pkg/server/conn.go:1838\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).dispatch\n\t/workspace/source/tidb/pkg/server/conn.go:1325\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1098\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"]

你的tiflash正常吗?

TiFlash 节点是正常的,不过我用的是 TiFlash 存算分离架构,不知道是不是和这个有关系。

看报错可能是请求 tiflash 接口超时了。在表表多的情况下,tiflash 接口返回时间比较长。可以试试调大 max_execution_time 看看

还不是这个问题,执行差不多1、2s 就返回错误了。 在 tiflash write 节点上也没有报错出现

试了下存算分离,即使在 compute node 节点 down 的情况下,查这个系统表应该是可以查出来的。这个问题应该跟存算分离无关。

tidb 向单个 tiflash 发请求的时候,每个 rpc 的超时时间是 1 秒。可能是表比较多导致超时了。

你的这个集群上有多少表建了 tiflash 副本呀?count(*) 一下 tiflash_replica 看下数量?

select count(*) from INFORMATION_SCHEMA.TIFLASH_REPLICA;
1 个赞

总共 4892 个分区/表,总数据量在 13T * 2, REGION 个数是 788k * 2

mysql> select count(*) from INFORMATION_SCHEMA.TIFLASH_REPLICA;
+----------+
| count(*) |
+----------+
|     3090 |
+----------+
1 row in set (0.04 sec)

mysql> select count(*) from INFORMATION_SCHEMA.TIFLASH_REPLICA t join INFORMATION_SCHEMA.partitions p on t.table_schema = p.table_schema and t.table_name= p.table_name;
+----------+
| count(*) |
+----------+
|     4892 |
+----------+
1 row in set (0.30 sec)

mysql>
2 个赞

确实,等1s 就失败的,调整 max_execution_time没有用

1 个赞

Orz 建议调成 5分钟超时吧 :rofl:

https://github.com/pingcap/tidb/issues/57816

模拟了一下非存算分离下 4900 个表的情况,没有超时,耗费 0.2s。

这个情况可能是存算分离架构下,查询系统表涉及到一些元数据需要从 S3 读取,导致查询的时间超过代码里 hardcode 的超时 1秒。后面 patch 版本会加长这个系统表的查询超时时间。

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。