扩容TiKV节点后,region分布不均,store打分差异很大

版本:v5.0.3

问题:
TiDB集群扩容一个TiKV节点后,只有一个原节点的数据和新增加的TiKV节点做数据均衡,其他的节点region不变,均衡之后,6个TiKV节点打分差异很大

这种情况如何处理?


但是leader均衡是正常的

数据盘大小一样吗?

select store_id,address,leader_weight,region_weight from information_schema.tikv_store_status; 看下

leader_weight和region_weight设置是一样的

数据盘大小是一样的

剩余空间呢

image 剩余空间不一样的

看着region那个图 下面的2个tikv的region数量是往上缓慢增长的趋势,上面的3个是往下走的

目前看调度已经正常了,但是在中午12:37左右5分钟出现集群不可用的,业务侧出现连接不上的问题(具体报错就是mysql gone away),qps也掉底,并且集群延迟也达到5min

image

tidb节点日志:

[2022/08/19 12:37:56.735 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.033386751s] [conn_id=168261983] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.735 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.02839721s] [conn_id=168261987] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.736 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.092538948s] [conn_id=168261935] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.736 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.051988913s] [conn_id=168261963] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.054041672s] [conn_id=168262029] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.003240048s] [conn_id=168262071] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.074404667s] [conn_id=168262019] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.013422469s] [conn_id=168262065] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.035368856s] [conn_id=168262041] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.036402935s] [conn_id=168262039] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.08437904s] [conn_id=168262011] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.030256464s] [conn_id=168262047] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.084899383s] [conn_id=168262013] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06933023s] [conn_id=168262023] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.029930219s] [conn_id=168262051] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.081400257s] [conn_id=168262015] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.101386805s] [conn_id=168262005] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.048546961s] [conn_id=168262033] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.047254189s] [conn_id=168262037] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06448464s] [conn_id=168262027] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.048679756s] [conn_id=168262035] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.07375419s] [conn_id=168262021] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06537717s] [conn_id=168262025] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.088627056s] [conn_id=168262009] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.056906712s] [conn_id=168262031] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.028678417s] [conn_id=168262053] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.021358943s] [conn_id=168262059] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.095096163s] [conn_id=168262007] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.032947036s] [conn_id=168262049] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.025771198s] [conn_id=168262055] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.036994875s] [conn_id=168262043] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.024306845s] [conn_id=168262057] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.037391433s] [conn_id=168262045] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.084064876s] [conn_id=168262017] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.018369699s] [conn_id=168262067] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]

[2022/08/19 12:38:35.754 +08:00] [INFO] [conn.go:812] [“command dispatched failed”] [conn=168230619] [connInfo=“id:168230619, addr:10.30.219.227:34772 status:10, collation:latin1_swedish_ci, user:srv_t1111l”] [command=Query] [status=“inTxn:0, autocommit:1”] [sql=“insert into bill_xxxxxx (user_id, hfsm_id, fsm_id, state_num, detail, c_date) values (941526237,10068300,1,3,‘’,from_unixtime(1660883758));”] [txn_mode=PESSIMISTIC] [err=“write tcp 10.30.128.28:4000->10.30.219.227:34772: write: broken pipe
github.com/pingcap/errors.AddStack
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb/server.(*packetIO).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:181
github.com/pingcap/tidb/server.(*clientConn).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1126
github.com/pingcap/tidb/server.(*clientConn).writeOkWith
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1159
github.com/pingcap/tidb/server.(*clientConn).handleQuerySpecial
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1703
github.com/pingcap/tidb/server.(*clientConn).handleStmt
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1661
github.com/pingcap/tidb/server.(*clientConn).handleQuery
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1503
github.com/pingcap/tidb/server.(*clientConn).dispatch
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1037
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:795
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] [“encountered error”] [error=“write tcp 10.30.128.28:4000->10.30.219.173:53016: write: broken pipe”] [stack=“github.com/pingcap/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210623034316-5ee95ed0081f/terror/terror.go:291
github.com/pingcap/tidb/server.(*packetIO).writePacket
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:168
github.com/pingcap/tidb/server.(*clientConn).writePacket
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:357
github.com/pingcap/tidb/server.(*clientConn).writeError
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1193
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:820
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210623034316-5ee95ed0081f/terror/terror.go:291
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:821
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] [“encountered error”] [error=“write tcp 10.30.128.28:4000->10.30.219.227:34772: write: broken pipe”] [stack=“github.com/pingcap/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210623034316-5ee95ed0081f/terror/terror.go:291
github.com/pingcap/tidb/server.(*packetIO).writePacket
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:168
github.com/pingcap/tidb/server.(*clientConn).writePacket
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:357
github.com/pingcap/tidb/server.(*clientConn).writeError
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1193
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:820
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210623034316-5ee95ed0081f/terror/terror.go:291
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:821
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477”]
[2022/08/19 12:38:35.754 +08:00] [INFO] [conn.go:812] [“command dispatched failed”] [conn=168231163] [connInfo=“id:168231163, addr:10.30.219.159:38330 status:10, collation:latin1_swedish_ci, user:srv_t1111l”] [command=Query] [status=“inTxn:0, autocommit:1”] [sql=“SET NAMES utf8”] [txn_mode=PESSIMISTIC] [err=“write tcp 10.30.128.28:4000->10.30.219.159:38330: write: broken pipe
github.com/pingcap/errors.AddStack
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb/server.(*packetIO).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:181
github.com/pingcap/tidb/server.(*clientConn).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1126
github.com/pingcap/tidb/server.(*clientConn).writeOkWith
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1159
github.com/pingcap/tidb/server.(*clientConn).handleQuerySpecial
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1703
github.com/pingcap/tidb/server.(*clientConn).handleStmt
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1661
github.com/pingcap/tidb/server.(*clientConn).handleQuery
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1503
github.com/pingcap/tidb/server.(*clientConn).dispatch
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1037
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:795
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] [“encountered error”] [error=“write tcp 10.30.128.28:4000->10.30.219.159:38330: write: broken pipe”] [stack=“github.com/pingcap/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210623034316-5ee95ed0081f/terror/terror.go:291
github.com/pingcap/tidb/server.(*packetIO).writePacket
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:168
github.com/pingcap/tidb/server.(*clientConn).writePacket
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:357
github.com/pingcap/tidb/server.(*clientConn).writeError
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1193
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:820
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210623034316-5ee95ed0081f/terror/terror.go:291
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:821
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477”]

网络出问题了?

如果服务器配置不一致,包括网络环境,打分不一致很正常吧。

服务器配置和网络环境都是一样的

不是网络的问题

还能查到 延迟高的时候 服务器的 资源使用情况吗?

看日志像是网络问题,建议排查下,比如网络抖动之类的

leader是均匀分布的还好,是不是有一些空region,影响了统计