【TiDB 4.0 PCTA 学习笔记】- 2.5.3 Usage of PD control(PD Control 的典型使用场景)@2班+李响

课程名称:2.5.3 Usage of PD control(PD Control 的典型使用场景)

学习时长:

50分钟

课程收获:

熟练使用 pd-ctl 工具

课程内容:

一、回顾PD的概念

  1. PD全称为Placement Driver,负责调度的组件
  2. PD Control
  • 在PD不能按人们想要的方法控制的时候,可以通过PD Control进行相关配置调度策略
  • 可以获取到集群里面所需要的的信息,包括PD members 和TiKV store的所有信息
  1. PD Control
  • Launch method
    tiup ctl pd -u http://<pd_ip>:<pd_port>[-i](新版 -i为交互模式)
    tiup-ansible/resources/bin/pd-ctl-u http://<pd_ip>:<pd_port>[-i] (to be deprecated)(老版)
  • Most useful parameters
    –detach, -d Single-command mode(default mode)
    –interact,-i Interactive mode
    –pd,u Specifies the PD address(default address:http://127.0.0.1:23279)
  • Get pd information from process
    bin/tidb-server…–path=172.16.4.71:2379,172.16.4.66:2379,172.16.4.60:2379
    bin/tikv-server…–pd 172.16.4.71:2379,172.16.4.66:2379,172.16.4.60:2379
    tiup cluster display <cluster_name>

二、通过详细的操作步骤获取PD信息

  1. Cluster info
    cluster
  2. Member info
    member
  3. Member leader info
    member leader show
  4. Member health info
    health
  5. Query TSO info
    tso 34233247795216239
  6. Move leader away from the current member
    member leader resign
  7. Migrate leader to a specified member
    member leader transfer <pd_name>
  8. Set the priority to be elected as leader
    member leader_priority <pd_name>
  9. Delete the specified member
    member delete name<pd_name>
    member delete id<pd_id> //use delete with official support
  10. KV Cluster info
    store
  11. List all Regions info
    region
  12. List the specific Region info
    region<region_id>
  13. List all Regions of a specific store
    region store<store_id>
  14. List store label(DC/Zone/Rack/Host)info
    label
    label store dc<dc_name>
  15. Set the weight of a specific store
    store weight <store_id><leader_weight><region_weight>
  16. Set the label kv pair of a specific store
    store label 1 //recommend using tiup cluster edit-config to config lable
  17. Delete the specific store
    store delete <store_id> //use delete wiht official support
  18. PD调度的主要功能
  • leader的平衡
  • region数量的平衡
  • 热点的调度和驱逐leader的操作
  1. Scheduler 配置
  • 性能相关
    Schedule Performance:
    leader-schedule-limit:控制同一时间生成的leader调度的tasks数量
    replica-schedule-limit:控制在同一时间有多少个副本发生调度
    hot-region-schedule-limit:控制在同一时间hot region调度数量
    merge-schedule-limit:控制在同一时间有多少个region发生merge,有多少个region merge tasks在工作
    max-snapshot-count:控制某一个TiKV上面最大的同时发生pending
    tolerant-size-ratio:为region balance做一个缓冲,允许RegionSize有一个差值,减少调度
    Region Merge Limit
    max-merge-region-keys:控制Region小于多少个key发生Merge(默认二十万)
    max-merge-region-size:控制Region小于多少个兆发生Merge(默认二十万)
    Rebalance Timer
    max-store-down-time:默认30分钟,宕机期间小于此时间是,重新拉起是只需要补齐新的数据(不建议太大)
  1. 修改配置
  • Modify the config
    config set
  • Modify these configs related to Region Merge
    config set region-schedule-limit 28
    config set replica-schedule-limit 32
    config set merge-schedule-limit 24
    config set tolerant-size-ratio 50
    config set max-merge-region-size 20
    config set max-merge-region-keys 200000
  1. Control Scheduling Strategy
  • 使当前的TiKV上的所有region全部成为Leader
    scheduler add grant-leader-scheduler<store_id>
  • 使当前的TiKV上的所有Leader状态的region全部清除
    scheduler add evict-leader-scheduler<store_id>
  • 相对leader和region比较集中时进行打散
    scheduler add shuffle-leader-schedule
    scheduler add shuffle-region-schedule
  • 看到当前所有schedulers
    scheduler show
  • 移除某条scheduler
    schedule remove grant-leader-scheduler
  • 常用操作
    TransferLeader
    AddPeer / RemovePeer
    AddLearner / PromoteLearner
    SplitRegion
  1. Operation
  • Display operators
    operator show
    operator show admin
    operator show leader
    operator show Region
  • Add / Remove a replica of the specific Region on the specific store
    operator add add-peer<region_id><store_id>
    operator add remove-peer<region_id><store_id>
  • Schedule the leader of the specific Region to the specific store
    operator add transfer-leader<region_id><store_id>
  • Schedule the specific Region to the specific store
    operator add transfer-region <region_id><store_id1><store_id2><store_id3>
  • Schedule the replica of the specific Region on the specific store 2 to another store
    operator add transfer-peer <region_id><orig_store_id><target_store_id>
  • Merge the specific Region with another Region
    operator add merge-region <region_id1><region_id2>(将两个region合并)
  • Split one Region into two Regions in halves, based on estimated / accurate value
    operator add split-region<region_id> --policy=approximate(根据region大小)
    operator add split-region<region_id> --policy=scan(扫描具体行在中间位置拆分)
  • Remove the scheduling operation of the specific Region
    operator remove <region_id>
  1. Store Limit Config
  • store limit config limits the consuming speed of operators
    The default value is 15
    Limiting the speed of two operations:adding learners/peers and deleting peers
    Store limit is a mapping in the memory, reset after the leader is switched or PD is restarted
  • Shows the speed limit of adding and deleting peers in all stores
    store limit
  • Shows the speed limit of adding peers in all stores
    store limit region-add / store limit add-peer(>=4.02)
  • Shows the speed limit of deleting peers in all stores
    store limit region-remove / store limit remove-peer(>=4.02)
  • Set the speed limit for a single store
    store limit<store_id>region-add / store limit<store_id>add-peer(>=4.02)
    store limit<store_id>region-remove / store limit<store_id>remove-peer(>=4.02)
  • Set the speed limit per minute for all stores
    store limit all region-add / store limit all add-peer(>=4.0.2)
    store limit all region-remove / store limit all remove-peer(>=4.0.2)
  • Set store-balance-rate to persist the modification
    config set store-balance-reat