tikv 修改参数滚动更新报错

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】4.0.10

【问题描述】
修改tikv配置文件如下
tiup cluster edit-config test-cluster

server_configs:

tikv:
# server.grpc-concurrency: 4
# raftstore.apply-pool-size: 2
raftstore.store-pool-size: 6
raftstore.sync-log: false
# rocksdb.max-sub-compactions: 1
# storage.block-cache.capacity: “16GB”
# readpool.unified.max-thread-count: 12
#readpool.storage.use-unified-pool: false
#readpool.coprocessor.use-unified-pool: true

tiup cluster reload test-cluster -R tikv
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster reload test-cluster -R tikv

  • [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=172.31.35.127
  • [Parallel] - UserSSH: user=tidb, host=172.31.42.205
  • [Parallel] - UserSSH: user=tidb, host=172.31.35.127
  • [Parallel] - UserSSH: user=tidb, host=172.31.43.8
  • [Parallel] - UserSSH: user=tidb, host=172.31.42.165
  • [Parallel] - UserSSH: user=tidb, host=172.31.41.225
  • [Parallel] - UserSSH: user=tidb, host=172.31.38.159
  • [Parallel] - UserSSH: user=tidb, host=172.31.42.249
  • [Parallel] - UserSSH: user=tidb, host=172.31.42.102
  • [Parallel] - UserSSH: user=tidb, host=172.31.35.127
  • [Parallel] - UserSSH: user=tidb, host=172.31.46.18
  • [Parallel] - UserSSH: user=tidb, host=172.31.43.197
  • [Parallel] - UserSSH: user=tidb, host=172.31.38.13
  • [Parallel] - UserSSH: user=tidb, host=172.31.35.127
  • [ Serial ] - UpdateTopology: cluster=test-cluster
  • Refresh instance configs
    • Regenerate config pd -> 172.31.43.8:2379 … Done
    • Regenerate config pd -> 172.31.42.205:2379 … Done
    • Regenerate config pd -> 172.31.35.127:2379 … Done
    • Regenerate config tikv -> 172.31.42.165:20160 … Done
    • Regenerate config tikv -> 172.31.42.249:20160 … Done
    • Regenerate config tikv -> 172.31.43.197:20160 … Done
    • Regenerate config tikv -> 172.31.46.18:20160 … Done
    • Regenerate config tikv -> 172.31.38.13:20160 … Done
    • Regenerate config tidb -> 172.31.41.225:4000 … Done
    • Regenerate config tidb -> 172.31.38.159:4000 … Done
    • Regenerate config tidb -> 172.31.42.102:4000 … Done
    • Regenerate config prometheus -> 172.31.35.127:9090 … Done
    • Regenerate config grafana -> 172.31.35.127:3000 … Done
    • Regenerate config alertmanager -> 172.31.35.127:9093 … Done
  • Refresh monitor configs
    • Refresh config node_exporter -> 172.31.42.205 … Error
    • Refresh config node_exporter -> 172.31.42.165 … Error
    • Refresh config node_exporter -> 172.31.43.197 … Error
    • Refresh config node_exporter -> 172.31.46.18 … Error
    • Refresh config node_exporter -> 172.31.42.102 … Error
    • Refresh config node_exporter -> 172.31.43.8 … Error
    • Refresh config node_exporter -> 172.31.35.127 … Error
    • Refresh config node_exporter -> 172.31.42.249 … Error
    • Refresh config node_exporter -> 172.31.38.13 … Error
    • Refresh config node_exporter -> 172.31.41.225 … Error
    • Refresh config node_exporter -> 172.31.38.159 … Error
    • Refresh config blackbox_exporter -> 172.31.46.18 … Error
    • Refresh config blackbox_exporter -> 172.31.42.102 … Error
    • Refresh config blackbox_exporter -> 172.31.42.205 … Error
    • Refresh config blackbox_exporter -> 172.31.42.165 … Error
    • Refresh config blackbox_exporter -> 172.31.43.197 … Error
    • Refresh config blackbox_exporter -> 172.31.38.13 … Error
    • Refresh config blackbox_exporter -> 172.31.41.225 … Error
    • Refresh config blackbox_exporter -> 172.31.38.159 … Error
    • Refresh config blackbox_exporter -> 172.31.43.8 … Error
    • Refresh config blackbox_exporter -> 172.31.35.127 … Error
    • Refresh config blackbox_exporter -> 172.31.42.249 … Error

Error: failed to scp /home/tidb/.tiup/storage/cluster/clusters/test-cluster/config-cache/blackbox_172.31.42.102.yaml to tidb@172.31.42.102:/home/tidb/deploy/monitor-9100/conf/blackbox.yml: Process exited with status 1

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2021-02-20-06-48-43.log.
Error: run /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster (wd:/home/tidb/.tiup/data/SPXPupT) failed: exit status 1


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

  1. 这个集群是 import 进来的吗
  2. 再执行一次也是相同的报错么

可以看下 /home/tidb/.tiup/logs/tiup-cluster-debug-2021-02-20-06-48-43.log 这个日志文件的内容

解决了,之前是ansible 安装的
vim /home/tidb/.tiup/storage/cluster/clusters/test-cluster/meta.yaml (修改到正确的目录)

monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
#deploy_dir: /home/tidb/deploy/monitor-9100
deploy_dir: /data/deploy
#data_dir: /home/tidb/deploy/monitor-9100/data/monitor-9100
data_dir: /data/deploy/data/monitor-9100
#log_dir: /home/tidb/deploy/monitor-9100/deploy/monitor-9100/log
log_dir: /data/deploy/deploy/monitor-9100/log

:+1::+1::+1:

您好,tikv 配置raftstore.store-pool-size: 6 ,如图监控线程占用cpu很高,我先了解一下 raftstore.store-pool-size 参数最大能调大多少?

原则上没有限制,是根据机器的 CPU 核数来设置的,设置为 6 表示最多能用到 6 核,从 TiKV-Details -> Thread CPU -> Raft store CPU 监控上看的话,其上限会是 600%。但是因为 raftstore 并不完全是 CPU 操作还会包含部分的 IO 操作,所以实际上不会打满到 600% 这么高。一般认为在达到 store-pool-size 的 75% 的话就可以认为 raft store 打满了,可以考虑调整参数。

32核机器整体cpu 只用到25%, 设置的store-pool-size :6 cpu已经满载了,我先调整到10 看看。

嗯可以调整看下