课程名称:2.5.3 Usage of PD control(PD Control 的典型使用场景)
学习时长:
50分钟
课程收获:
熟练使用 pd-ctl 工具
课程内容:
一、回顾PD的概念
- PD全称为Placement Driver,负责调度的组件
- PD Control
- 在PD不能按人们想要的方法控制的时候,可以通过PD Control进行相关配置调度策略
- 可以获取到集群里面所需要的的信息,包括PD members 和TiKV store的所有信息
- PD Control
- Launch method
tiup ctl pd -u http://<pd_ip>:<pd_port>[-i](新版 -i为交互模式)
tiup-ansible/resources/bin/pd-ctl-u http://<pd_ip>:<pd_port>[-i] (to be deprecated)(老版) - Most useful parameters
–detach, -d Single-command mode(default mode)
–interact,-i Interactive mode
–pd,u Specifies the PD address(default address:http://127.0.0.1:23279) - Get pd information from process
bin/tidb-server…–path=172.16.4.71:2379,172.16.4.66:2379,172.16.4.60:2379
bin/tikv-server…–pd 172.16.4.71:2379,172.16.4.66:2379,172.16.4.60:2379
tiup cluster display <cluster_name>
二、通过详细的操作步骤获取PD信息
- Cluster info
cluster - Member info
member - Member leader info
member leader show - Member health info
health - Query TSO info
tso 34233247795216239 - Move leader away from the current member
member leader resign - Migrate leader to a specified member
member leader transfer <pd_name> - Set the priority to be elected as leader
member leader_priority <pd_name> - Delete the specified member
member delete name<pd_name>
member delete id<pd_id> //use delete with official support - KV Cluster info
store - List all Regions info
region - List the specific Region info
region<region_id> - List all Regions of a specific store
region store<store_id> - List store label(DC/Zone/Rack/Host)info
label
label store dc<dc_name> - Set the weight of a specific store
store weight <store_id><leader_weight><region_weight> - Set the label kv pair of a specific store
store label 1 //recommend using tiup cluster edit-config to config lable - Delete the specific store
store delete <store_id> //use delete wiht official support - PD调度的主要功能
- leader的平衡
- region数量的平衡
- 热点的调度和驱逐leader的操作
- Scheduler 配置
- 性能相关
Schedule Performance:
leader-schedule-limit:控制同一时间生成的leader调度的tasks数量
replica-schedule-limit:控制在同一时间有多少个副本发生调度
hot-region-schedule-limit:控制在同一时间hot region调度数量
merge-schedule-limit:控制在同一时间有多少个region发生merge,有多少个region merge tasks在工作
max-snapshot-count:控制某一个TiKV上面最大的同时发生pending
tolerant-size-ratio:为region balance做一个缓冲,允许RegionSize有一个差值,减少调度
Region Merge Limit
max-merge-region-keys:控制Region小于多少个key发生Merge(默认二十万)
max-merge-region-size:控制Region小于多少个兆发生Merge(默认二十万)
Rebalance Timer
max-store-down-time:默认30分钟,宕机期间小于此时间是,重新拉起是只需要补齐新的数据(不建议太大)
- 修改配置
- Modify the config
config set - Modify these configs related to Region Merge
config set region-schedule-limit 28
config set replica-schedule-limit 32
config set merge-schedule-limit 24
config set tolerant-size-ratio 50
config set max-merge-region-size 20
config set max-merge-region-keys 200000
- Control Scheduling Strategy
- 使当前的TiKV上的所有region全部成为Leader
scheduler add grant-leader-scheduler<store_id> - 使当前的TiKV上的所有Leader状态的region全部清除
scheduler add evict-leader-scheduler<store_id> - 相对leader和region比较集中时进行打散
scheduler add shuffle-leader-schedule
scheduler add shuffle-region-schedule - 看到当前所有schedulers
scheduler show - 移除某条scheduler
schedule remove grant-leader-scheduler - 常用操作
TransferLeader
AddPeer / RemovePeer
AddLearner / PromoteLearner
SplitRegion
- Operation
- Display operators
operator show
operator show admin
operator show leader
operator show Region - Add / Remove a replica of the specific Region on the specific store
operator add add-peer<region_id><store_id>
operator add remove-peer<region_id><store_id> - Schedule the leader of the specific Region to the specific store
operator add transfer-leader<region_id><store_id> - Schedule the specific Region to the specific store
operator add transfer-region <region_id><store_id1><store_id2><store_id3> - Schedule the replica of the specific Region on the specific store 2 to another store
operator add transfer-peer <region_id><orig_store_id><target_store_id> - Merge the specific Region with another Region
operator add merge-region <region_id1><region_id2>(将两个region合并) - Split one Region into two Regions in halves, based on estimated / accurate value
operator add split-region<region_id> --policy=approximate(根据region大小)
operator add split-region<region_id> --policy=scan(扫描具体行在中间位置拆分) - Remove the scheduling operation of the specific Region
operator remove <region_id>
- Store Limit Config
- store limit config limits the consuming speed of operators
The default value is 15
Limiting the speed of two operations:adding learners/peers and deleting peers
Store limit is a mapping in the memory, reset after the leader is switched or PD is restarted - Shows the speed limit of adding and deleting peers in all stores
store limit - Shows the speed limit of adding peers in all stores
store limit region-add / store limit add-peer(>=4.02) - Shows the speed limit of deleting peers in all stores
store limit region-remove / store limit remove-peer(>=4.02) - Set the speed limit for a single store
store limit<store_id>region-add / store limit<store_id>add-peer(>=4.02)
store limit<store_id>region-remove / store limit<store_id>remove-peer(>=4.02) - Set the speed limit per minute for all stores
store limit all region-add / store limit all add-peer(>=4.0.2)
store limit all region-remove / store limit all remove-peer(>=4.0.2) - Set store-balance-rate to persist the modification
config set store-balance-reat