lable 监控问题

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
线上环境
【概述】
大量慢查询,p999非常大;
【背景】
扩容增加节点,之前的TiFlash节点摘除,pd-dashbord的lable显示不对,总共只有9个节点,已经有9个节点有lable了,但是还显示有一个节点lable unkown
【现象】
lable显示有问题,
leader raft 显示也有问题
【业务影响】
目前查询和写入都非常慢
【TiDB 版本】
v4.0.5
【附件】

  1. TiUP Cluster Display 信息

  2. TiUP Cluster Edit Config 信息
    {
    “replication”: {
    “enable-placement-rules”: “false”,
    “location-labels”: “zone,dc,rack,host”,
    “max-replicas”: 3,
    “strictly-match-label”: “false”
    },
    “schedule”: {
    “enable-cross-table-merge”: “false”,
    “enable-debug-metrics”: “false”,
    “enable-location-replacement”: “true”,
    “enable-make-up-replica”: “true”,
    “enable-one-way-merge”: “false”,
    “enable-remove-down-replica”: “true”,
    “enable-remove-extra-replica”: “true”,
    “enable-replace-offline-replica”: “true”,
    “high-space-ratio”: 0.7,
    “hot-region-cache-hits-threshold”: 3,
    “hot-region-schedule-limit”: 4,
    “leader-schedule-limit”: 4,
    “leader-schedule-policy”: “count”,
    “low-space-ratio”: 0.8,
    “max-merge-region-keys”: 200000,
    “max-merge-region-size”: 20,
    “max-pending-peer-count”: 16,
    “max-snapshot-count”: 3,
    “max-store-down-time”: “30m0s”,
    “merge-schedule-limit”: 8,
    “patrol-region-interval”: “100ms”,
    “region-schedule-limit”: 2048,
    “replica-schedule-limit”: 64,
    “scheduler-max-waiting-operator”: 5,
    “split-merge-interval”: “1h0m0s”,
    “store-limit-mode”: “manual”,
    “tolerant-size-ratio”: 20
    }
    }

  3. TiDB- Overview 监控
    tidb-4s-cluster-Overview_2021-09-08T12_48_05.893Z.json (1.8 MB)

  • 对应模块日志(包含问题前后1小时日志)
1赞

[2021/09/08 15:14:23.114 +08:00] [INFO] [region_cache.go:829] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=946855] [currIdx=1] [leaderStoreID=1]。

计算节点,总有这种日志

另外有一个存储节点的leader region会出现,突然少1k;这样的尖谷

看一下具体的 SQL 的 slow query log 可以发一下

REPLACE INTO xes_eagleeye_commit_log_20210909 (
source_id,
unique_id,
STATUS,
source,
ACTION,
modify_time,
created,
updated,
mod_id,
grp_id,
biz_id,
wrong_time,
fix_time,
succ_time
)
VALUES
(
‘73955460’,
‘c4smdrgq7g3rgimoelmg-20210909’,
130,
‘ARRIS-29b21a74362fbf9e’,
2,
1631148074956,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
0
),
(
‘69514794’,
‘c4smdqoq7g3u7a63l5ng-20210909’,
200,
‘ARRIS-29b21a74362fbf9e’,
2,
1631148074956,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151853888
),
(
‘74234005_31571’,
‘c4smdr0q7g3kc9hjbgcg-20210909’,
200,
‘ARRIS-b2468873cb462fe7’,
1,
1631143347330,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151853944
),
(
‘17970575’,
‘c4smdr0q7g3u07hkfe00-20210909’,
200,
‘ARRIS-29b21a74362fbf9e’,
2,
1631148074956,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151853963
),
(
‘wmHqcDDgAAuGENrgL_gR5qu1vfw-O4Hw_wrHqcDDgAAkv0rxvgd4yqVPnFX2k8luQ’,
‘c4smdr0q7g3gipjcg6mg-20210909’,
200,
‘ARRIS-bb14b9c6c9785ac2’,
1,
1631144454374,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151854078
),
(
‘70567863’,
‘c4smdr8q7g3vvrlcuneg-20210909’,
200,
‘ARRIS-29b21a74362fbf9e’,
2,
1631148074956,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151854090
),
(
‘51424225_70008’,
‘c4smdr8q7g3ibhrfpn90-20210909’,
200,
‘ARRIS-b2468873cb462fe7’,
1,
1631143347330,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151854101
),
(
‘wmHqcDDgAAIqL1nRopCXtooQ6NPPvV5g_wrHqcDDgAAJGPvWIPzmd3t795vxITTFg’,
‘c4smdr0q7g3v3f17le40-20210909’,
200,
‘ARRIS-bb14b9c6c9785ac2’,
1,
1631144454374,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151853951
),
(
‘65709093’,
‘c4smdr0q7g3jvp7c7ip0-20210909’,
200,
‘ARRIS-29b21a74362fbf9e’,
2,
1631148074956,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151853963
),
(
‘6526381_73344’,
‘c4smdr8q7g3ibhrfpq3g-20210909’,
200,
‘ARRIS-b2468873cb462fe7’,
1,
1631143347330,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151854029
),
(
‘64966280’,
‘c4smdr8q7g3u7a63lkq0-20210909’,
200,
‘ARRIS-29b21a74362fbf9e’,
2,
1631148074956,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151854022
),
(
‘wmHqcDDgAAVpYx84dUz4bZySTZNDRl5A_wrHqcDDgAAkXSmmCJlGx8vlKiK_OpjDQ’,
‘c4smdr8q7g3o75hqc99g-20210909’,
200,
‘ARRIS-bb14b9c6c9785ac2’,
1,
1631144454374,
1631151854102,
1631151854102,
1,
2,
‘default’,
0,
0,
1631151854043
);

文字缩进 4 格`insert ignore into `xes_eagleeye_commit_metadata_20210909` (source_id,unique_id,status,source,action,modify_time,created,updated,mod_id,grp_id,biz_id) values ('wmHqcDDgAAUAn7VmEiNRdYwg3YswWMaQ_wrHqcDDgAAFKJMzp_av7aHvUlpbCOKLQ','c4smqg8q7g3kc585d3pg-20210909',120,'ARRIS-bb14b9c6c9785ac2',1,1631144454374,1631153473265,1631153473265,1,2,'default');

Time: 2021-09-09T10:11:14.157376224+08:00

Txn_start_ts: 427597096044724415

User: eagleeye_tw@10.20.43.52

Conn_ID: 76568

Query_time: 1.034083273

Parse_time: 0.000114229

Compile_time: 0.000074773

Prewrite_time: 1.031835809 Commit_time: 0.000980454 Get_commit_ts_time: 0.000255069 Write_keys: 24 Write_size: 2344 Prewrite_region: 3

DB: eagleeye_split

Is_internal: false

Digest: 6c31d02f419102e6d9ef15999021b08442e28c450a5885ba88dda076c13f2535

Num_cop_tasks: 0

Mem_max: 13160

Prepared: false

Plan_from_cache: false

Has_more_results: false

Succ: true`


发一下对应时间点 TiDB 和 TiKV-details 监控看看,时间是 “ Time: 2021-09-09T10:11:14.157376224+08:00” 前后 1 小时的。