【 TiDB 使用环境`】生产环境
【 TiDB 版本】4.0.14
【遇到的问题】
集群较大,160T左右的数据、750W/3左右的分片数,目前出现tidb无法获取到正确的region路由信息的问题导致读写大部分失败,从监控上看错误分成9001和9005两种,以9005为主,初步怀疑是pd调度压力太大导致了,所以调整了以下参数,但是没有解决这个问题:
tikv:
raftstore.hibernate-regions: true
raftstore.pd-heartbeat-tick-interval: 1m30s
raftstore.pd-store-heartbeat-tick-interval: 20s
pd里和调度相关的参数:
leader-schedule-limit
merge-schedule-limit
region-schedule-limit
replica-schedule-limit
因为外部写入是顺序消费消息队列中的数据写入tidb的关系,因此即使只看到很少报错,整个写入流程也被完全堵住,目前的缓解措施是每隔一段时间重启pd,这样能防止队列中的数据堆积太久导致过时,但是感觉这只是个不是办法的办法,所以想问问有没有相关的解决经验或者思路可以参考,附上出错的日志。
9001:
[2022/06/06 17:19:05.900 +08:00] [WARN] [backoff.go:329] ["pdRPC backoffer.maxSleep 40000ms is exceeded, errors:\
region not found for key \"t\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x cf_r\\x81\\xa7T\\xbd\\xb3\\x8aR\\x17\" at 2022-06-06T17:19:02.483532549+08:00\
region not found for key \"t\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\xcf_r\\x81\\xa7T\\xbd\\xb3\\x8aR\\x1 7\" at 2022-06-06T17:19:04.123025368+08:00\
region not found for key \"t\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\xcf_r\\x81\\xa7T\\xbd\\xb3\\x8aR\\x17\" at 2022-06-06T17:19:05.90054506 3+08:00"]
[2022/06/06 17:19:05.900 +08:00] [WARN] [session.go:1384] ["run statement failed"] [conn=1413] [schemaVersion=412] [error="[tikv:9001]PD server timeout"] [session="{\
\"currDBNam e\": \"gifshow\",\
\"id\": 1413,\
\"status\": 1,\
\"strictMode\": true,\
\"txn\": \"433719110697222149\",\
\"user\": {\
\"Username\": \"pay_gateway_rw\",\
\"Hostn ame\": \"**.**.**.**\",\
\"CurrentUser\": false,\
\"AuthUsername\": \"pay_gateway_rw\",\
\"AuthHostname\": \"%\"\
}\
}"]
[2022/06/06 17:19:05.901 +08:00] [INFO] [conn.go:864] ["command dispatched failed"] [conn=1413] [connInfo="id:1413, addr:**.**.**.**:45704 status:1, collation:utf8_general_ci, use r:pay_gateway_rw"] [command=Query] [status="inTxn:1, autocommit:0"] [sql="/* \
ktrace:CAISGhCZgICAoKaduwoY3xAggN61vZMwKKX6/7UOGhoQx4CAgLCFk7YKGPACILq3isKTMCig2ozPCiASKhdrc3BheS1jb3 JlLWRhdGFidXMuUFJPRDIFa3NwYXk=\
trace_ctx:EgAyAA==\
*/
9005:
[2022/06/06 17:18:53.445 +08:00] [WARN] [backoff.go:329] ["regionMiss backoffer.maxSleep 40000ms is exceeded, errors:\
message:\"region 241388794 is missing\" region_not_found:<reg ion_id:241388794 > at 2022-06-06T17:18:52.442196869+08:00\
message:\"region 241388794 is missing\" region_not_found:<region_id:241388794 > at 2022-06-06T17:18:52.94406682+08:00\
message:\"region 241388794 is missing\" region_not_found:<region_id:241388794 > at 2022-06-06T17:18:53.445932518+08:00"]
[2022/06/06 17:18:53.446 +08:00] [WARN] [session.go:1384] ["run statement failed"] [conn=567] [schemaVersion=412] [error="[tikv:9005]Region is unavailable"] [session="{\
\"currDB Name\": \"gifshow\",\
\"id\": 567,\
\"status\": 1,\
\"strictMode\": true,\
\"txn\": \"433719107446637014\",\
\"user\": {\
\"Username\": \"pay_gateway_rw\",\
\"Hos tname\": \"**.**.**.**\",\
\"CurrentUser\": false,\
\"AuthUsername\": \"pay_gateway_rw\",\
\"AuthHostname\": \"%\"\
}\
}"]
[2022/06/06 17:18:53.446 +08:00] [INFO] [conn.go:864] ["command dispatched failed"]
可以确定的是,通过pd-ctl可以看到对应的region状态是正常的。
第一份日志剩余部分:
[err="[tikv:9001]PD server timeout\
github.com/pingcap/errors.AddStack\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.co m/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\
github.com/pingcap/errors.Trace\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/ github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\
github.com/pingcap/tidb/store/tikv.(*RegionCache).loadRegion\
\t/home/jenkins/agent/workspace/op timization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:996\
github.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey\
\t/home/jenkins/ag ent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:575\
github.com/pingcap/tidb/store/tikv.(*RegionCache).LocateKey\
\t/home/ jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:535\
github.com/pingcap/tidb/store/tikv.(*RegionCache).GroupKeys ByRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:711\
github.com/pingcap/tidb/store/tikv.(*tikv Snapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:222\
github.com/pingcap/tid b/store/tikv.(*tikvSnapshot).BatchGet\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:148\
github.com/pingc ap/tidb/kv.(*BufferBatchGetter).BatchGet\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/kv/memdb_buffer.go:227\
github.com/pingca p/tidb/store/tikv.(*tikvTxn).BatchGet\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/txn.go:192\
github.com/pingcap/ti db/session.(*TxnState).BatchGet\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/session/txn.go:345\
github.com/pingcap/tidb/execut or.prefetchUniqueIndices\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/insert.go:136\
github.com/pingcap/tidb/executor. (*InsertValues).batchCheckAndInsert\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/insert_common.go:1041\
github.com/pin gcap/tidb/executor.(*InsertExec).exec\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/insert.go:81\
github.com/pingcap/ti db/executor.insertRows\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/insert_common.go:272\
github.com/pingcap/tidb/exec utor.(*InsertExec).Next\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/insert.go:288\
github.com/pingcap/tidb/executor.N ext\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/executor.go:262\
github.com/pingcap/tidb/executor.(*ExecStmt).handleN oDelayExecutor\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:531\
github.com/pingcap/tidb/executor.(*ExecStm t).handlePessimisticDML\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:550\
github.com/pingcap/tidb/executor. (*ExecStmt).handleNoDelay\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:411\
github.com/pingcap/tidb/executo r.(*ExecStmt).Exec\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:366\
github.com/pingcap/tidb/session.runStm t\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/session/tidb.go:322\
github.com/pingcap/tidb/session.(*session).ExecuteStmt\
\t/ home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/session/session.go:1381\
github.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt\
\t /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/driver_tidb.go:270\
github.com/pingcap/tidb/server.(*clientConn).handleStmt\
\ t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1513\
github.com/pingcap/tidb/server.(*clientConn).handleQuery\
\t/ho me/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1502\
github.com/pingcap/tidb/server.(*clientConn).dispatch\
\t/home/jenk ins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1080\
github.com/pingcap/tidb/server.(*clientConn).Run\
\t/home/jenkins/agent/wo rkspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:849\
github.com/pingcap/tidb/server.(*Server).onConn\
\t/home/jenkins/agent/workspace/optimi zation-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:453\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
第二份日志剩余部分:
[err="[tikv:9005]Region is unavailable\
github.com/pingcap/errors.A ddStack\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\
github.com/ping cap/errors.Trace\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\
g ithub.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/ snapshot.go:301\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingc ap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/sr c/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-t idb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/op timization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins /agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegio n\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).b atchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tik v.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/ping cap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:2 38\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/ tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/ pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/ go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-bu ild-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspa ce/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/je nkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingle Region\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapsh ot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/stor e/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com /pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot .go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/s tore/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github .com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux -amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimizati on-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/wo rkspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/ho me/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetS ingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
github.com/pingcap/tidb/store/tikv.(*tikvS napshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:238\
github.com/pingcap/tidb /store/tikv.(*tikvSnapshot).batchGetSingleRegion\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:303\
githu b.com/pingcap/tidb/store/tikv.(*tikvSnapshot).batchGetKeysByRegions\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/tikv/sna pshot.go:238"]