PD Region Health 各属性 含义是什么?

【TiDB 版本】:

3.0.5

【问题描述】:

1. 各属性含义是什么

  • down-peer-region-count
  • empty-region-count
  • extra-peer-region-count
  • incorrect-namespace-region-count
  • learner-peer-region-count
  • miss-peer-region-count
  • offline-peer-region-count
  • pending-peer-region-count

2. empty-region-count 如何进行合并

[tidb@dev10 bin]$ ./pd-ctl -u http://192.168.180.33:2379 config show max-pending-peer-count
{
  "replication": {
    "location-labels": "",
    "max-replicas": 3,
    "strictly-match-label": "false"
  },
  "schedule": {
    "disable-location-replacement": "false",
    "disable-make-up-replica": "false",
    "disable-namespace-relocation": "false",
    "disable-raft-learner": "false",
    "disable-remove-down-replica": "false",
    "disable-remove-extra-replica": "false",
    "disable-replace-offline-replica": "false",
    "enable-one-way-merge": "false",
    "high-space-ratio": 0.6,
    "hot-region-cache-hits-threshold": 3,
    "hot-region-schedule-limit": 4,
    "leader-schedule-limit": 4,
    "low-space-ratio": 0.8,
    "max-merge-region-keys": 200000,
    "max-merge-region-size": 20,
    "max-pending-peer-count": 16,
    "max-snapshot-count": 3,
    "max-store-down-time": "30m0s",
    "merge-schedule-limit": 8,
    "patrol-region-interval": "100ms",
    "region-schedule-limit": 4,
    "replica-schedule-limit": 8,
    "scheduler-max-waiting-operator": 3,
    "schedulers-v2": [
      {
        "args": null,
        "disable": false,
        "type": "balance-region"
      },
      {
        "args": null,
        "disable": false,
        "type": "balance-leader"
      },
      {
        "args": null,
        "disable": false,
        "type": "hot-region"
      },
      {
        "args": null,
        "disable": false,
        "type": "label"
      }
    ],
    "split-merge-interval": "1h0m0s",
    "store-balance-rate": 15,
    "tolerant-size-ratio": 5
  }
}
[tidb@dev10 bin]$
[tidb@dev10 bin]$
[tidb@dev10 bin]$
[tidb@dev10 bin]$ ./pd-ctl -u http://192.168.180.33:2379 config show | grep  max-pending-peer-count
    "max-pending-peer-count": 16,
[tidb@dev10 bin]$ ./pd-ctl -u http://192.168.180.33:2379 config show | grep max-merge-region-size
    "max-merge-region-size": 20,
[tidb@dev10 bin]$ ./pd-ctl -u http://192.168.180.33:2379 config show | grep split-merge-interval
    "split-merge-interval": "1h0m0s",
[tidb@dev10 bin]$

  • down-peer-region-count #有副本状态为 down 的 region 总数
  • empty-region-count #有副本状态为 空 的 region 总数
  • extra-peer-region-count #多副本的 Region 总数
  • incorrect-namespace-region-count #有副本不符合 namespace 约束的 Region 总数
  • learner-peer-region-count #状态为 learner 的 region 总数
  • miss-peer-region-count #缺副本的 Region 总数
  • offline-peer-region-count #状态为 offline 的 region 总数
  • pending-peer-region-count #有副本状态为 Pending 的 Region 的总数

另外,pending 以及 down 可以参考该链接 https://pingcap.com/docs-cn/stable/glossary/#pendingdown

https://pingcap.com/docs-cn/stable/reference/tools/pd-control/#region-check-miss-peer--extra-peer--down-peer--pending-peer--incorrect-ns
2.空 region / 小 region 可以开启 region merge 进行合并。https://pingcap.com/docs-cn/stable/reference/best-practices/massive-regions/#方法五开启-region-merge

老师我还有问题

https://pingcap.com/docs-cn/stable/reference/best-practices/pd-scheduling/#region-merge-速度慢

问:
通过把 max-merge-region-sizemax-merge-region-keys 调整为较小值来加快 Merge 速度

1. 这每次要按照什么比例来减小呢?




问:
此时如果开启了 split table 特性,这些空 Region 是无法合并的,此时需要调整以下参数关闭这个特性

2. 是不是还需要关闭 split-region-on-table

https://pingcap.com/docs-cn/stable/reference/configuration/tikv-server/configuration-file/#split-region-on-table

3. namespace-classifier 这个也要调整?

https://pingcap.com/docs-cn/stable/reference/configuration/tikv-server/configuration-file/#split-region-on-table




4. 因第 2、3问题是 tidb v3.0.5 的默认配置, 那不是与 Region Merge 这个概念在默认配置上就发生了冲突?

image




1.每次按照什么样子的比例来调小,没有固定,根据自己的需求调整。

  • 对于空 region 很多的集群(经常 drop table),可以把 max-merge-region-size 和 max-merge-region-keys 调小,比如 2 和 2000,这样可以让 PD 优先选择几乎为空的 region 进行合并。
  • 加快 merge 速度:可以调大 region-schedule-limit 和 merge-schedule-limit。
  • 另外 region merge 推荐用默认值就行。如果 20MB 的 region 已经合并的差不多了,还想进一步减少 region 数的话,可以把 20 往上调。简单说就是,这两个数字越大,能触发 merge 的 region 就越多,但同时 merge 的平均速度会更慢(因为 merge 操作的平均 region size 变大了)

第二个问题稍等我确认下当前实现方式再回复。

老师改完配置,没啥效果啊

[tidb@dev10 bin]$ ./pd-ctl -u http://192.168.180.33:2379 config show
{
  "replication": {
    "location-labels": "",
    "max-replicas": 3,
    "strictly-match-label": "false"
  },
  "schedule": {
    "disable-location-replacement": "false",
    "disable-make-up-replica": "false",
    "disable-namespace-relocation": "false",
    "disable-raft-learner": "false",
    "disable-remove-down-replica": "false",
    "disable-remove-extra-replica": "false",
    "disable-replace-offline-replica": "false",
    "enable-one-way-merge": "false",
    "high-space-ratio": 0.6,
    "hot-region-cache-hits-threshold": 3,
    "hot-region-schedule-limit": 4,
    "leader-schedule-limit": 4,
    "low-space-ratio": 0.8,
    "max-merge-region-keys": 2000,
    "max-merge-region-size": 2,
    "max-pending-peer-count": 16,
    "max-snapshot-count": 3,
    "max-store-down-time": "30m0s",
    "merge-schedule-limit": 8,
    "patrol-region-interval": "10ms",
    "region-schedule-limit": 4,
    "replica-schedule-limit": 8,
    "scheduler-max-waiting-operator": 3,
    "schedulers-v2": [
      {
        "args": null,
        "disable": false,
        "type": "balance-region"
      },
      {
        "args": null,
        "disable": false,
        "type": "balance-leader"
      },
      {
        "args": null,
        "disable": false,
        "type": "hot-region"
      },
      {
        "args": null,
        "disable": false,
        "type": "label"
      }
    ],
    "split-merge-interval": "1h0m0s",
    "store-balance-rate": 15,
    "tolerant-size-ratio": 5
  }
}
[tidb@dev10 bin]$

可以参考下 pd-ctl 中, split-merge-interval 控制对同一个 Region 做 splitmerge 操作的间隔,即对于新 split 的 Region 一段时间内不会被 merge 。因此解决冲突问题。

https://pingcap.com/docs-cn/stable/reference/tools/pd-control/

我改过了 改成了 10分钟,还是不行啊

麻烦把问题详细描述下,是想得到什么结果 ? 另外调整了哪些参数 ? 麻烦详细描述下,谢谢。

我修改了如下参数

max-merge-region-keys 2000

max-merge-region-size 2

patrol-region-interval “10ms”

[tidb@dev10 conf]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 config set max-merge-region-keys 2000
Success!
[tidb@dev10 conf]$


[tidb@dev10 conf]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 config set max-merge-region-size 2
Success!
[tidb@dev10 conf]$


[tidb@dev10 conf]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 config set patrol-region-interval "10ms"
Success!
[tidb@dev10 conf]$

我期望 空 region 能有所减少, 实际结果如下图: 空region 没有任何变化

你好,请执行以下命令,反馈下结果,多谢。如果没有jq,请安装jq ./bin/pd-ctl -u http://127.0.0.1:2379 -d region | jq “.regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length”

[tidb@dev10 bin]$ ./pd-ctl -u http://192.168.180.33:2379 -d region | jq ".regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length"
5799
[tidb@dev10 bin]$
  1. 可以增加 region merge schedule limit ;
  2. 通过 ./pd-ctl -u http://192.168.180.33:2379 -d region | jq “.regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length” 观察小 region 个数是否减少。

改完以后,数量变的更多了

观察一段时间

config show all

[tidb@dev10 ~]$  /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 config show all
{
  "client-urls": "http://192.168.181.56:2379",
  "peer-urls": "http://192.168.181.56:2380",
  "advertise-client-urls": "http://192.168.181.56:2379",
  "advertise-peer-urls": "http://192.168.181.56:2380",
  "name": "pd_dev27",
  "data-dir": "/home/tidb/deploy/data.pd",
  "force-new-cluster": false,
  "enable-grpc-gateway": true,
  "initial-cluster": "pd_dev11=http://192.168.180.33:2380,pd_dev26=http://192.168.181.55:2380,pd_dev27=http://192.168.181.56:2380",
  "initial-cluster-state": "new",
  "join": "",
  "lease": 3,
  "log": {
    "level": "info",
    "format": "text",
    "disable-timestamp": false,
    "file": {
      "filename": "/home/tidb/deploy/log/pd.log",
      "log-rotate": true,
      "max-size": 300,
      "max-days": 0,
      "max-backups": 0
    },
    "development": false,
    "disable-caller": false,
    "disable-stacktrace": false,
    "disable-error-verbose": true,
    "sampling": null
  },
  "log-file": "",
  "log-level": "",
  "tso-save-interval": "3s",
  "metric": {
    "job": "pd_dev27",
    "address": "",
    "interval": "15s"
  },
  "schedule": {
    "max-snapshot-count": 3,
    "max-pending-peer-count": 16,
    "max-merge-region-size": 2,
    "max-merge-region-keys": 2000,
    "split-merge-interval": "1h0m0s",
    "enable-one-way-merge": "false",
    "patrol-region-interval": "10ms",
    "max-store-down-time": "30m0s",
    "leader-schedule-limit": 4,
    "region-schedule-limit": 4,
    "replica-schedule-limit": 8,
    "merge-schedule-limit": 16,
    "hot-region-schedule-limit": 4,
    "hot-region-cache-hits-threshold": 3,
    "store-balance-rate": 15,
    "tolerant-size-ratio": 5,
    "low-space-ratio": 0.8,
    "high-space-ratio": 0.6,
    "scheduler-max-waiting-operator": 3,
    "disable-raft-learner": "false",
    "disable-remove-down-replica": "false",
    "disable-replace-offline-replica": "false",
    "disable-make-up-replica": "false",
    "disable-remove-extra-replica": "false",
    "disable-location-replacement": "false",
    "disable-namespace-relocation": "false",
    "schedulers-v2": [
      {
        "type": "balance-region",
        "args": null,
        "disable": false
      },
      {
        "type": "balance-leader",
        "args": null,
        "disable": false
      },
      {
        "type": "hot-region",
        "args": null,
        "disable": false
      },
      {
        "type": "label",
        "args": null,
        "disable": false
      }
    ]
  },
  "replication": {
    "max-replicas": 3,
    "location-labels": "",
    "strictly-match-label": "false"
  },
  "namespace": {},
  "pd-server": {
    "use-region-storage": "true"
  },
  "cluster-version": "3.0.5",
  "quota-backend-bytes": "0B",
  "auto-compaction-mode": "periodic",
  "auto-compaction-retention-v2": "1h",
  "TickInterval": "500ms",
  "ElectionInterval": "3s",
  "PreVote": true,
  "security": {
    "cacert-path": "",
    "cert-path": "",
    "key-path": ""
  },
  "label-property": {},
  "WarningMsgs": null,
  "namespace-classifier": "table",
  "LeaderPriorityCheckInterval": "1m0s"
}

[tidb@dev10 ~]$

store

[tidb@dev10 ~]$  /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 store
{
  "count": 3,
  "stores": [
    {
      "store": {
        "id": 72015,
        "address": "192.168.180.51:20160",
        "version": "3.0.5",
        "state_name": "Up"
      },
      "status": {
        "capacity": "392.7GiB",
        "available": "249.9GiB",
        "leader_count": 242,
        "leader_weight": 1,
        "leader_score": 8344,
        "leader_size": 8344,
        "region_count": 6202,
        "region_weight": 1,
        "region_score": 24883,
        "region_size": 24883,
        "start_ts": "2020-01-07T17:53:36+08:00",
        "last_heartbeat_ts": "2020-01-08T16:02:22.388962962+08:00",
        "uptime": "22h8m46.388962962s"
      }
    },
    {
      "store": {
        "id": 33067,
        "address": "192.168.180.52:20160",
        "version": "3.0.5",
        "state_name": "Up"
      },
      "status": {
        "capacity": "392.7GiB",
        "available": "345.1GiB",
        "leader_count": 2641,
        "leader_weight": 1,
        "leader_score": 8250,
        "leader_size": 8250,
        "region_count": 6202,
        "region_weight": 1,
        "region_score": 24883,
        "region_size": 24883,
        "start_ts": "2020-01-07T17:52:21+08:00",
        "last_heartbeat_ts": "2020-01-08T16:02:25.188217127+08:00",
        "uptime": "22h10m4.188217127s"
      }
    },
    {
      "store": {
        "id": 72014,
        "address": "192.168.180.53:20160",
        "version": "3.0.5",
        "state_name": "Up"
      },
      "status": {
        "capacity": "392.7GiB",
        "available": "230.5GiB",
        "leader_count": 3319,
        "leader_weight": 1,
        "leader_score": 8289,
        "leader_size": 8289,
        "region_count": 6202,
        "region_weight": 1,
        "region_score": 24883,
        "region_size": 24883,
        "start_ts": "2020-01-07T17:54:28+08:00",
        "last_heartbeat_ts": "2020-01-08T16:02:30.092476143+08:00",
        "uptime": "22h8m2.092476143s"
      }
    }
  ]
}

[tidb@dev10 ~]$

region 数量

[tidb@dev10 ~]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 -d region | jq ".regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length"  
5792
[tidb@dev10 ~]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 -d region | jq ".regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length"
5792
[tidb@dev10 ~]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 -d region | jq ".regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length"
5792
[tidb@dev10 ~]$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u http://192.168.180.33:2379 -d region | jq ".regions | map(select(.approximate_size < 2 and .approximate_keys < 2000)) | length"
5792
[tidb@dev10 ~]$

可以试下调整以下配置: namespace-classifier改为default split-merge-interval改为10m

老师 namespace-classifier "default" 改不掉是什么原因

升级为v3.0.8版本再按之前的操作处理下,我这边测试环境有20w empty-region-count都已经降下来了

请问现在数据库里面的 table 个数很多么 ?

库太多了,tidb 有语句可以直接查吗?