tiup upgrade from v4.0.8 to v4.0.9报错

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
v4.0.8
【问题描述】

[tidb@data11 ~]$ tiup cluster display jiuji-tidb-cluster-v2
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster display jiuji-tidb-cluster-v2
Cluster type:       tidb
Cluster name:       jiuji-tidb-cluster-v2
Cluster version:    v4.0.8
SSH type:           builtin
Dashboard URL:      http://192.168.254.32:9379/dashboard
ID                   Role          Host            Ports                            OS/Arch       Status  Data Dir                              Deploy Dir
--                   ----          ----            -----                            -------       ------  --------                              ----------
192.168.254.12:9095  alertmanager  192.168.254.12  9095/9096                        linux/x86_64  Up      /data/tidb-data-v2/alertmanager-9095  /data/tidb-deploy-v2/alertmanager-9095
192.168.254.12:3030  grafana       192.168.254.12  3030                             linux/x86_64  Up      -                                     /data/tidb-deploy-v2/grafana-3030
192.168.254.13:9379  pd            192.168.254.13  9379/9380                        linux/x86_64  Up      /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.31:9379  pd            192.168.254.31  9379/9380                        linux/x86_64  Up|L    /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.32:9379  pd            192.168.254.32  9379/9380                        linux/x86_64  Up|UI   /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.12:9080  prometheus    192.168.254.12  9080                             linux/x86_64  Up      /data/tidb-data-v2/prometheus-9080    /data/tidb-deploy-v2/prometheus-9080
192.168.254.13:9383  tidb          192.168.254.13  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.31:9383  tidb          192.168.254.31  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.32:9383  tidb          192.168.254.32  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.12:8000  tiflash       192.168.254.12  8000/7123/2930/20270/20192/7234  linux/x86_64  Up      /data/tidb-data-v2/tiflash-8000       /data/tidb-deploy-v2/tiflash-8000
192.168.254.13:9385  tikv          192.168.254.13  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
192.168.254.31:9385  tikv          192.168.254.31  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
192.168.254.32:9385  tikv          192.168.254.32  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
Total nodes: 13


[tidb@data11 ~]$ tiup update --self
download https://tiup-mirrors.pingcap.com/tiup-v1.3.2-linux-amd64.tar.gz 8.49 MiB / 8.49 MiB 100.00% 2.25 GiB p/s                                                  
Updated successfully!
[tidb@data11 ~]$ tiup update cluster
component cluster version v1.3.2 is already installed
Updated successfully!

执行

tiup cluster upgrade jiuji-tidb-cluster-v2 v4.0.9

错误日志如下(直接升级到v4.0.10 也一样的错误):

+ [ Serial ] - InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.12, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/grafana-3030.service, deploy_dir=/data/tidb-deploy-v2/grafana-3030, data_dir=[], log_dir=/data/tidb-deploy-v2/grafana-3030/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache
+ [ Serial ] - InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.31, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tikv-9385.service, deploy_dir=/data/tidb-deploy-v2/tikv-9385, data_dir=[/data/tidb-data-v2/tikv-9385], log_dir=/data/tidb-deploy-v2/tikv-9385/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache
+ [ Serial ] - InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.32, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tikv-9385.service, deploy_dir=/data/tidb-deploy-v2/tikv-9385, data_dir=[/data/tidb-data-v2/tikv-9385], log_dir=/data/tidb-deploy-v2/tikv-9385/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache
+ [ Serial ] - InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.12, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tiflash-8000.service, deploy_dir=/data/tidb-deploy-v2/tiflash-8000, data_dir=[/data/tidb-data-v2/tiflash-8000], log_dir=/data/tidb-deploy-v2/tiflash-8000/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache

Error: init config failed: 192.168.254.31:9385: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@192.168.254.31:22' {ssh_stderr: invalid configuration: default rocksdb not exist, buf raftdb exist
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin /data/tidb-deploy-v2/tikv-9385/bin/tikv-server --config-check --config=/data/tidb-deploy-v2/tikv-9385/conf/tikv.toml --pd=""}, cause: Process exited with status 1: check config failed

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2021-02-02-01-41-00.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster` (wd:/home/tidb/.tiup/data/SNmy76V) failed: exit status 1
2021-02-02T01:40:59.859+0800	DEBUG	TaskFinish	{"task": "BackupComponent: component=tiflash, currentVersion=v4.0.8, remote=192.168.254.12:/data/tidb-deplo
y-v2/tiflash-8000\
CopyComponent: component=tiflash, version=v4.0.9, remote=192.168.254.12:/data/tidb-deploy-v2/tiflash-8000 os=linux, arch=amd64\
InitConfig: clus
ter=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.12, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tiflash-8000.service, 
deploy_dir=/data/tidb-deploy-v2/tiflash-8000, data_dir=[/data/tidb-data-v2/tiflash-8000], log_dir=/data/tidb-deploy-v2/tiflash-8000/log, cache_dir=/home/tidb/.tiup
/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component=pd, currentVersion=v4.0.8, remote=192.168.254.13:/data/tidb-deploy-v2/pd-9
379\
CopyComponent: component=pd, version=v4.0.9, remote=192.168.254.13:/data/tidb-deploy-v2/pd-9379 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v
2, user=tidb, host=192.168.254.13, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/pd-9379.service, deploy_dir=/data/tidb-deploy-
v2/pd-9379, data_dir=[/data/tidb-data-v2/pd-9379], log_dir=/data/tidb-deploy-v2/pd-9379/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster
-v2/config-cache\
BackupComponent: component=pd, currentVersion=v4.0.8, remote=192.168.254.31:/data/tidb-deploy-v2/pd-9379\
CopyComponent: component=pd, version=v4
.0.9, remote=192.168.254.31:/data/tidb-deploy-v2/pd-9379 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.31, path=/hom
e/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/pd-9379.service, deploy_dir=/data/tidb-deploy-v2/pd-9379, data_dir=[/data/tidb-data-v2/pd-
9379], log_dir=/data/tidb-deploy-v2/pd-9379/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component
=pd, currentVersion=v4.0.8, remote=192.168.254.32:/data/tidb-deploy-v2/pd-9379\
CopyComponent: component=pd, version=v4.0.9, remote=192.168.254.32:/data/tidb-deplo
y-v2/pd-9379 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.32, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-
tidb-cluster-v2/config-cache/pd-9379.service, deploy_dir=/data/tidb-deploy-v2/pd-9379, data_dir=[/data/tidb-data-v2/pd-9379], log_dir=/data/tidb-deploy-v2/pd-9379/
log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component=tikv, currentVersion=v4.0.8, remote=192.168
.254.13:/data/tidb-deploy-v2/tikv-9385\
CopyComponent: component=tikv, version=v4.0.9, remote=192.168.254.13:/data/tidb-deploy-v2/tikv-9385 os=linux, arch=amd64\
I
nitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.13, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tikv-93
85.service, deploy_dir=/data/tidb-deploy-v2/tikv-9385, data_dir=[/data/tidb-data-v2/tikv-9385], log_dir=/data/tidb-deploy-v2/tikv-9385/log, cache_dir=/home/tidb/.t
iup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component=tikv, currentVersion=v4.0.8, remote=192.168.254.31:/data/tidb-deploy-v2
/tikv-9385\
CopyComponent: component=tikv, version=v4.0.9, remote=192.168.254.31:/data/tidb-deploy-v2/tikv-9385 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tid
b-cluster-v2, user=tidb, host=192.168.254.31, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tikv-9385.service, deploy_dir=/data
/tidb-deploy-v2/tikv-9385, data_dir=[/data/tidb-data-v2/tikv-9385], log_dir=/data/tidb-deploy-v2/tikv-9385/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters
/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component=tikv, currentVersion=v4.0.8, remote=192.168.254.32:/data/tidb-deploy-v2/tikv-9385\
CopyComponent: c
omponent=tikv, version=v4.0.9, remote=192.168.254.32:/data/tidb-deploy-v2/tikv-9385 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, hos
t=192.168.254.32, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tikv-9385.service, deploy_dir=/data/tidb-deploy-v2/tikv-9385, d
ata_dir=[/data/tidb-data-v2/tikv-9385], log_dir=/data/tidb-deploy-v2/tikv-9385/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/confi
g-cache\
BackupComponent: component=tidb, currentVersion=v4.0.8, remote=192.168.254.13:/data/tidb-deploy-v2/tidb-9383\
CopyComponent: component=tidb, version=v4.0.
9, remote=192.168.254.13:/data/tidb-deploy-v2/tidb-9383 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.13, path=/home
/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tidb-9383.service, deploy_dir=/data/tidb-deploy-v2/tidb-9383, data_dir=[], log_dir=/data/ti
db-deploy-v2/tidb-9383/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component=tidb, currentVersion
=v4.0.8, remote=192.168.254.31:/data/tidb-deploy-v2/tidb-9383\
CopyComponent: component=tidb, version=v4.0.9, remote=192.168.254.31:/data/tidb-deploy-v2/tidb-9383 
os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.31, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-
v2/config-cache/tidb-9383.service, deploy_dir=/data/tidb-deploy-v2/tidb-9383, data_dir=[], log_dir=/data/tidb-deploy-v2/tidb-9383/log, cache_dir=/home/tidb/.tiup/s
torage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupComponent: component=tidb, currentVersion=v4.0.8, remote=192.168.254.32:/data/tidb-deploy-v2/tidb
-9383\
CopyComponent: component=tidb, version=v4.0.9, remote=192.168.254.32:/data/tidb-deploy-v2/tidb-9383 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-clu
ster-v2, user=tidb, host=192.168.254.32, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/tidb-9383.service, deploy_dir=/data/tidb
-deploy-v2/tidb-9383, data_dir=[], log_dir=/data/tidb-deploy-v2/tidb-9383/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cac
he\
BackupComponent: component=prometheus, currentVersion=v4.0.8, remote=192.168.254.12:/data/tidb-deploy-v2/prometheus-9080\
CopyComponent: component=prometheus, 
version=v4.0.9, remote=192.168.254.12:/data/tidb-deploy-v2/prometheus-9080 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168
.254.12, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/prometheus-9080.service, deploy_dir=/data/tidb-deploy-v2/prometheus-9080
, data_dir=[/data/tidb-data-v2/prometheus-9080], log_dir=/data/tidb-deploy-v2/prometheus-9080/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-c
luster-v2/config-cache\
BackupComponent: component=grafana, currentVersion=v4.0.8, remote=192.168.254.12:/data/tidb-deploy-v2/grafana-3030\
CopyComponent: componen
t=grafana, version=v4.0.9, remote=192.168.254.12:/data/tidb-deploy-v2/grafana-3030 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host
=192.168.254.12, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/grafana-3030.service, deploy_dir=/data/tidb-deploy-v2/grafana-30
30, data_dir=[], log_dir=/data/tidb-deploy-v2/grafana-3030/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache\
BackupCompo
nent: component=alertmanager, currentVersion=v4.0.8, remote=192.168.254.12:/data/tidb-deploy-v2/alertmanager-9095\
CopyComponent: component=alertmanager, version=v
0.17.0, remote=192.168.254.12:/data/tidb-deploy-v2/alertmanager-9095 os=linux, arch=amd64\
InitConfig: cluster=jiuji-tidb-cluster-v2, user=tidb, host=192.168.254.1
2, path=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb-cluster-v2/config-cache/alertmanager-9095.service, deploy_dir=/data/tidb-deploy-v2/alertmanager-9095, 
data_dir=[/data/tidb-data-v2/alertmanager-9095], log_dir=/data/tidb-deploy-v2/alertmanager-9095/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/jiuji-tidb
-cluster-v2/config-cache", "error": "init config failed: 192.168.254.31:9385: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@192.168.254
.31:22' {ssh_stderr: invalid configuration: default rocksdb not exist, buf raftdb exist\
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin 
/data/tidb-deploy-v2/tikv-9385/bin/tikv-server --config-check --config=/data/tidb-deploy-v2/tikv-9385/conf/tikv.toml --pd=\"\"}, cause: Process exited with status 
1: check config failed", "errorVerbose": "check config failed\
executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@192.168.254.31:22' {ssh_st
derr: invalid configuration: default rocksdb not exist, buf raftdb exist\
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin /data/tidb-depl
oy-v2/tikv-9385/bin/tikv-server --config-check --config=/data/tidb-deploy-v2/tikv-9385/conf/tikv.toml --pd=\"\"}, cause: Process exited with status 1\
github.com/p
ingcap/tiup/pkg/cluster/spec.checkConfig\
\tgithub.com/pingcap/tiup@/pkg/cluster/spec/server_config.go:268\
github.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance
).InitConfig\
\tgithub.com/pingcap/tiup@/pkg/cluster/spec/tikv.go:272\
github.com/pingcap/tiup/pkg/cluster/task.(*InitConfig).Execute\
\tgithub.com/pingcap/tiup@/p
kg/cluster/task/init_config.go:49\
github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\
\tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:196\
github.com/
pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\
\tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:241\
runtime.goexit\
\truntime/asm_amd64.s:1357\
init 
config failed: 192.168.254.31:9385"}
2021-02-02T01:40:59.860+0800	INFO	Execute command finished	{"code": 1, "error": "init config failed: 192.168.254.31:9385: executor.ssh.execute_failed:
 Failed to execute command over SSH for 'tidb@192.168.254.31:22' {ssh_stderr: invalid configuration: default rocksdb not exist, buf raftdb exist\
, ssh_stdout: , s
sh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin /data/tidb-deploy-v2/tikv-9385/bin/tikv-server --config-check --config=/data/tidb-deploy-v2/tikv-9385/conf
/tikv.toml --pd=\"\"}, cause: Process exited with status 1: check config failed", "errorVerbose": "check config failed\
executor.ssh.execute_failed: Failed to exec
ute command over SSH for 'tidb@192.168.254.31:22' {ssh_stderr: invalid configuration: default rocksdb not exist, buf raftdb exist\
, ssh_stdout: , ssh_command: exp
ort LANG=C; PATH=$PATH:/usr/bin:/usr/sbin /data/tidb-deploy-v2/tikv-9385/bin/tikv-server --config-check --config=/data/tidb-deploy-v2/tikv-9385/conf/tikv.toml --pd
=\"\"}, cause: Process exited with status 1\
github.com/pingcap/tiup/pkg/cluster/spec.checkConfig\
\tgithub.com/pingcap/tiup@/pkg/cluster/spec/server_config.go:268
\
github.com/pingcap/tiup/pkg/cluster/spec.(*TiKVInstance).InitConfig\
\tgithub.com/pingcap/tiup@/pkg/cluster/spec/tikv.go:272\
github.com/pingcap/tiup/pkg/cluster
/task.(*InitConfig).Execute\
\tgithub.com/pingcap/tiup@/pkg/cluster/task/init_config.go:49\
github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\
\tgithub.co
m/pingcap/tiup@/pkg/cluster/task/task.go:196\
github.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1\
\tgithub.com/pingcap/tiup@/pkg/cluster/task/task.
go:241\
runtime.goexit\
\truntime/asm_amd64.s:1357\
init config failed: 192.168.254.31:9385"}

上面的日志中提示报如下错误:

invalid configuration: default rocksdb not exist, buf raftdb exist

1.麻烦看下这台 tikv 192.168.254.31:9385 日志中具体报什么错误;
2.请提供一下这个集群的拓扑文件,看下是否有配置项设置的有问题。

拓扑文件

[tidb@data11 ~]$ cat topology_v2.yaml
# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/data/tidb-deploy-v2"
  data_dir: "/data/tidb-data-v2"

# # Monitored variables are applied to all the machines.
monitored:
  node_exporter_port: 9105
  blackbox_exporter_port: 9120
  # deploy_dir: "/tidb-deploy/monitored-9100"
  # data_dir: "/tidb-data/monitored-9100"
  # log_dir: "/tidb-deploy/monitored-9100/log"

# # Server configs are used to specify the runtime configuration of TiDB components.
# # All configuration items can be found in TiDB docs:
# # - TiDB: https://pingcap.com/docs/stable/reference/configuration/tidb-server/configuration-file/
# # - TiKV: https://pingcap.com/docs/stable/reference/configuration/tikv-server/configuration-file/
# # - PD: https://pingcap.com/docs/stable/reference/configuration/pd-server/configuration-file/
# # All configuration items use points to represent the hierarchy, e.g:
# #   readpool.storage.use-unified-pool
# #      
# # You can overwrite this configuration via the instance-level `config` field.

server_configs:
  tidb:
    log.slow-threshold: 300
    binlog.enable: false
    binlog.ignore-error: false
    # 用于处理v3.0.7和以前版本升级中的兼容性问题(为了兼容联合索引长度超长的问题,原(3072) 这里改为4倍)
    #max-index-length: 12288
    # 开启支持大小写不敏感, 只有在集群初始化时配置才生效, 默认 false
    new_collations_enabled_on_first_bootstrap: true
  tikv:
    # server.grpc-concurrency: 4
    # raftstore.apply-pool-size: 2
    # raftstore.store-pool-size: 2
    # rocksdb.max-sub-compactions: 1
    # storage.block-cache.capacity: "16GB"
    # readpool.unified.max-thread-count: 12
    storage.block-cache.capacity: 20GB
    readpool.storage.use-unified-pool: false
    readpool.coprocessor.use-unified-pool: true
  pd:
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
    replication.enable-placement-rules: true
  tiflash:
    logger.level: "info"
  # pump:
  #   gc: 7

pd_servers:
  - host: 192.168.254.13
    # ssh_port: 22
    # name: "pd-1"
    client_port: 9379
    peer_port: 9380
    # deploy_dir: "/tidb-deploy/pd-2379"
    # data_dir: "/tidb-data/pd-2379"
    # log_dir: "/tidb-deploy/pd-2379/log"
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.pd` values.
    # config:
    #   schedule.max-merge-region-size: 20
    #   schedule.max-merge-region-keys: 200000
  - host: 192.168.254.31
    client_port: 9379
    peer_port: 9380
  - host: 192.168.254.32
    client_port: 9379
    peer_port: 9380


tidb_servers:
  - host: 192.168.254.13
    # ssh_port: 22
    port: 9383
    status_port: 9384
    # deploy_dir: "/tidb-deploy/tidb-4000"
    # log_dir: "/tidb-deploy/tidb-4000/log"
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.tidb` values.
    # config:
    #   log.slow-query-file: tidb-slow-overwrited.log
  - host: 192.168.254.31
    port: 9383
    status_port: 9384
  - host: 192.168.254.32
    port: 9383
    status_port: 9384

tikv_servers:
  - host: 192.168.254.13
    # ssh_port: 22
    port: 9385
    status_port: 9386
    # deploy_dir: "/tidb-deploy-v2/tikv-9385"
    # data_dir: "/tidb-data-v2/tikv-9385"
    # log_dir: "/tidb-deploy-v2/tikv-9385/log"
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.tikv` values.
    # config:
    #   server.grpc-concurrency: 4
    #   server.labels: { zone: "zone1", dc: "dc1", host: "host1" }
  - host: 192.168.254.31
    port: 9385
    status_port: 9386
  - host: 192.168.254.32
    port: 9385
    status_port: 9386

tiflash_servers:
  - host: 192.168.254.12
    ssh_port: 22
    tcp_port: 8000
    http_port: 7123
    flash_service_port: 2930
    flash_proxy_port: 20270
    flash_proxy_status_port: 20192
    metrics_port: 7234
    # deploy_dir: /tidb-deploy/tiflash-9000
    # data_dir: /tidb-data/tiflash-9000
    # log_dir: /tidb-deploy/tiflash-9000/log
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.tiflash` values.
    # config:
    #   logger.level: "info"
    # learner_config:
    #   log-level: "info"
  # - host: 10.0.1.15
  # - host: 10.0.1.16

# pump_servers:
#   - host: 10.0.1.17
#     ssh_port: 22
#     port: 8250
#     deploy_dir: "/tidb-deploy/pump-8249"
#     data_dir: "/tidb-data/pump-8249"
#     log_dir: "/tidb-deploy/pump-8249/log"
#     numa_node: "0,1"
#     # The following configs are used to overwrite the `server_configs.drainer` values.
#     config:
#       gc: 7
#   - host: 10.0.1.18
#   - host: 10.0.1.19

# drainer_servers:
#   - host: 10.0.1.17
#     port: 8249
#     data_dir: "/tidb-data/drainer-8249"
#     # If drainer doesn't have a checkpoint, use initial commitTS as the initial checkpoint.
#     # Will get a latest timestamp from pd if commit_ts is set to -1 (the default value).
#     commit_ts: -1
#     deploy_dir: "/tidb-deploy/drainer-8249"
#     log_dir: "/tidb-deploy/drainer-8249/log"
#     numa_node: "0,1"
#     # The following configs are used to overwrite the `server_configs.drainer` values.
#     config:
#       syncer.db-type: "mysql"
#       syncer.to.host: "127.0.0.1"
#       syncer.to.user: "root"
#       syncer.to.password: ""
#       syncer.to.port: 3306
#   - host: 10.0.1.19

# cdc_servers:
#   - host: 10.0.1.20
#     ssh_port: 22
#     port: 8300
#     deploy_dir: "/tidb-deploy/cdc-8300"
#     log_dir: "/tidb-deploy/cdc-8300/log"
#     numa_node: "0,1"
#   - host: 10.0.1.21
#   - host: 10.0.1.22

monitoring_servers:
  - host: 192.168.254.12
    # ssh_port: 22
    port: 9080
    #deploy_dir: "/tidb-deploy-v2/prometheus-9080"
    #data_dir: "/tidb-data-v2/prometheus-9080"
    #log_dir: "/tidb-deploy-v2/prometheus-9080/log"

grafana_servers:
  - host: 192.168.254.12
    port: 3030
    #deploy_dir: "/tidb-deploy-v2/grafana-3030"

alertmanager_servers:
  - host: 192.168.254.12
    # ssh_port: 22
    web_port: 9095
    cluster_port: 9096
    #deploy_dir: "/tidb-deploy-v2/alertmanager-9095"
    #data_dir: "/tidb-data-v2/alertmanager-9095"
    #log_dir: "/tidb-deploy-v2/alertmanager-9095/log"

[tidb@data11 ~]$ 

tikv 31 9385 日志

[tidb@saas1 ~]$ cat /data/tidb-deploy-v2/tikv-9385/log/tikv_stderr.log 
[tidb@saas1 ~]$ cat /data/tidb-deploy-v2/tikv-9385/log/tikv.log | grep ERROR
[tidb@saas1 ~]$ 


升级时段日志

192.168.254.31_tikv.log (83.6 KB)

[tidb@saas1 ~]$ cat /data/tidb-deploy-v2/tikv-9385/conf/tikv.toml 
# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
#   tikv:
#     aa.b1.c3: value
#     aa.b2.c4: value
[readpool]
[readpool.coprocessor]
use-unified-pool = true
[readpool.storage]
use-unified-pool = false

[storage]
[storage.block-cache]
capacity = "20GB"
[tidb@saas1 ~]$ cat /data/tidb-deploy-v2/tikv-9385/scripts/run_tikv.sh 
#!/bin/bash
set -e

# WARNING: This file was auto-generated. Do not edit!
#          All your edit might be overwritten!
cd "/data/tidb-deploy-v2/tikv-9385" || exit 1

echo -n 'sync ... '
stat=$(time sync || sync)
echo ok
echo $stat
exec bin/tikv-server \
    --addr "0.0.0.0:9385" \
    --advertise-addr "192.168.254.31:9385" \
    --status-addr "0.0.0.0:9386" \
    --advertise-status-addr "192.168.254.31:9386" \
    --pd "192.168.254.13:9379,192.168.254.31:9379,192.168.254.32:9379" \
    --data-dir "/data/tidb-data-v2/tikv-9385" \
    --config conf/tikv.toml \
    --log-file "/data/tidb-deploy-v2/tikv-9385/log/tikv.log" 2>> "/data/tidb-deploy-v2/tikv-9385/log/tikv_stderr.log"
[tidb@saas1 ~]$ 

感觉升级运行的时候 带的PD 不对

export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin /data/tidb-deploy-v2/tikv-9385/bin/tikv-server --config-check --config=/data/tidb-deploy-v2/tikv-9385/conf/tikv.toml --pd
=\"\"}, 
cause: Process exited with status 1
github.com/pingcap/tiup/pkg/cluster/spec.checkConfig
git

–pd=""

这个不是当前最新的拓扑文件,麻烦反馈下 tiup cluster edit-config {cluster-name} 里的配置文件信息。

global:
  user: tidb
  ssh_port: 22
  ssh_type: builtin
  deploy_dir: /data/tidb-deploy-v2
  data_dir: /data/tidb-data-v2
  os: linux
  arch: amd64
monitored:
  node_exporter_port: 9105
  blackbox_exporter_port: 9120
  deploy_dir: /data/tidb-deploy-v2/monitor-9105
  data_dir: /data/tidb-data-v2/monitor-9105
  log_dir: /data/tidb-deploy-v2/monitor-9105/log
server_configs:
  tidb:
    binlog.enable: false
    binlog.ignore-error: false
    log.slow-threshold: 300
    new_collations_enabled_on_first_bootstrap: true
  tikv:
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: false
    storage.block-cache.capacity: 20GB
  pd:
    replication.enable-placement-rules: true
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
  tiflash:
    logger.level: info
  tiflash-learner: {}
  pump: {}
  drainer: {}
  cdc: {}
tidb_servers:
- host: 192.168.254.13
  ssh_port: 22
  port: 9383
  status_port: 9384
  deploy_dir: /data/tidb-deploy-v2/tidb-9383
  arch: amd64
  os: linux
- host: 192.168.254.31
  ssh_port: 22
  port: 9383
  status_port: 9384
  deploy_dir: /data/tidb-deploy-v2/tidb-9383
  arch: amd64
  os: linux
- host: 192.168.254.32
  ssh_port: 22
  port: 9383
  status_port: 9384
  deploy_dir: /data/tidb-deploy-v2/tidb-9383
  arch: amd64
  os: linux
tikv_servers:
- host: 192.168.254.13
  ssh_port: 22
  port: 9385
  status_port: 9386
  deploy_dir: /data/tidb-deploy-v2/tikv-9385
  data_dir: /data/tidb-data-v2/tikv-9385
  arch: amd64
  os: linux
- host: 192.168.254.31
  ssh_port: 22
  port: 9385
  status_port: 9386
  deploy_dir: /data/tidb-deploy-v2/tikv-9385
  data_dir: /data/tidb-data-v2/tikv-9385
  arch: amd64
  os: linux
- host: 192.168.254.32
  ssh_port: 22
  port: 9385
  status_port: 9386
  deploy_dir: /data/tidb-deploy-v2/tikv-9385
  data_dir: /data/tidb-data-v2/tikv-9385
  arch: amd64
  os: linux
tiflash_servers:
- host: 192.168.254.12
  ssh_port: 22
  tcp_port: 8000
  http_port: 7123
  flash_service_port: 2930
  flash_proxy_port: 20270
  flash_proxy_status_port: 20192
  metrics_port: 7234
  deploy_dir: /data/tidb-deploy-v2/tiflash-8000
  data_dir: /data/tidb-data-v2/tiflash-8000
  arch: amd64
  os: linux
pd_servers:
- host: 192.168.254.13
  ssh_port: 22
  name: pd-192.168.254.13-9379
  client_port: 9379
  peer_port: 9380
  deploy_dir: /data/tidb-deploy-v2/pd-9379
  data_dir: /data/tidb-data-v2/pd-9379
  arch: amd64
  os: linux
- host: 192.168.254.31
  ssh_port: 22
  name: pd-192.168.254.31-9379
  client_port: 9379
  peer_port: 9380
  deploy_dir: /data/tidb-deploy-v2/pd-9379
  data_dir: /data/tidb-data-v2/pd-9379
  arch: amd64
  os: linux
- host: 192.168.254.32
  ssh_port: 22
  name: pd-192.168.254.32-9379
  client_port: 9379
  peer_port: 9380
  deploy_dir: /data/tidb-deploy-v2/pd-9379
  data_dir: /data/tidb-data-v2/pd-9379
  arch: amd64
  os: linux
monitoring_servers:
- host: 192.168.254.12
  ssh_port: 22
  port: 9080
  deploy_dir: /data/tidb-deploy-v2/prometheus-9080
  data_dir: /data/tidb-data-v2/prometheus-9080
  arch: amd64
  os: linux
grafana_servers:
- host: 192.168.254.12
  ssh_port: 22
  port: 3030
  deploy_dir: /data/tidb-deploy-v2/grafana-3030
  arch: amd64
  os: linux
  username: admin
  password: admin
alertmanager_servers:
- host: 192.168.254.12
  ssh_port: 22
  web_port: 9095
  cluster_port: 9096
  deploy_dir: /data/tidb-deploy-v2/alertmanager-9095
  data_dir: /data/tidb-data-v2/alertmanager-9095
  arch: amd64
  os: linux

从 tikv 日志和集群配置文件中没有看出明显的异常,你这边方便 reload 集群后再重新升级下集群吗?

只有今天晚上凌晨搞了, 我也是觉得很奇怪, 都是对的, 我担心reload 或者restart 不起来了, 集群上数据比较多

可以直接从v4.0.8 升级到v4.0.10 么 我之前试了一次也是不行

可以的,4.x 内小版本升级 tiup 都是支持的。

执行了reload 然后在执行 upgrade to v4.0.10 还是报错失败, 我就restart了整个集群, 现在状态有点奇怪,
tiup 显示是4.0.8, 但是打开 dashboard, 查看集群信息全部是4.0.10 了所有组件。 现在怎么解决?

[tidb@data11 ~]$ tiup cluster display jiuji-tidb-cluster-v2
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster display jiuji-tidb-cluster-v2
Cluster type:       tidb
Cluster name:       jiuji-tidb-cluster-v2
Cluster version:    v4.0.8
SSH type:           builtin
Dashboard URL:      http://192.168.254.32:9379/dashboard
ID                   Role          Host            Ports                            OS/Arch       Status  Data Dir                              Deploy Dir
--                   ----          ----            -----                            -------       ------  --------                              ----------
192.168.254.12:9095  alertmanager  192.168.254.12  9095/9096                        linux/x86_64  Up      /data/tidb-data-v2/alertmanager-9095  /data/tidb-deploy-v2/alertmanager-9095
192.168.254.12:3030  grafana       192.168.254.12  3030                             linux/x86_64  Up      -                                     /data/tidb-deploy-v2/grafana-3030
192.168.254.13:9379  pd            192.168.254.13  9379/9380                        linux/x86_64  Up      /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.31:9379  pd            192.168.254.31  9379/9380                        linux/x86_64  Up|L    /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.32:9379  pd            192.168.254.32  9379/9380                        linux/x86_64  Up|UI   /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.12:9080  prometheus    192.168.254.12  9080                             linux/x86_64  Up      /data/tidb-data-v2/prometheus-9080    /data/tidb-deploy-v2/prometheus-9080
192.168.254.13:9383  tidb          192.168.254.13  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.31:9383  tidb          192.168.254.31  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.32:9383  tidb          192.168.254.32  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.12:8000  tiflash       192.168.254.12  8000/7123/2930/20270/20192/7234  linux/x86_64  Up      /data/tidb-data-v2/tiflash-8000       /data/tidb-deploy-v2/tiflash-8000
192.168.254.13:9385  tikv          192.168.254.13  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
192.168.254.31:9385  tikv          192.168.254.31  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
192.168.254.32:9385  tikv          192.168.254.32  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
Total nodes: 13
[tidb@data11 ~]$ 

192.168.254.13:9379 在线 今天 00:35 v4.0.10 560df52710293d9d67bd7b32503de0e53addfa11 /data/tidb-deploy-v2/pd-9379/bin
192.168.254.31:9379 在线 今天 00:35 v4.0.10 560df52710293d9d67bd7b32503de0e53addfa11
/data/tidb-deploy-v2/pd-9379/bin
192.168.254.32:9379
在线
今天 00:35
v4.0.10
560df52710293d9d67bd7b32503de0e53addfa11
/data/tidb-deploy-v2/pd-9379/bin
192.168.254.13:9383
在线
今天 00:38
v4.0.10
dbade8cda4c5a329037746e171449e0a1dfdb8b3
/data/tidb-deploy-v2/tidb-9383/bin
192.168.254.31:9383
在线
今天 00:38
v4.0.10
dbade8cda4c5a329037746e171449e0a1dfdb8b3
/data/tidb-deploy-v2/tidb-9383/bin
192.168.254.32:9383
在线
今天 00:38
v4.0.10
dbade8cda4c5a329037746e171449e0a1dfdb8b3
/data/tidb-deploy-v2/tidb-9383/bin
192.168.254.13:9385
在线
今天 00:36
v4.0.10
2ea4e608509150f8110b16d6e8af39284ca6c93a
/data/tidb-deploy-v2/tikv-9385/bin
192.168.254.31:9385
在线
今天 00:37
v4.0.10
2ea4e608509150f8110b16d6e8af39284ca6c93a
/data/tidb-deploy-v2/tikv-9385/bin
192.168.254.32:9385
在线
今天 00:38
v4.0.10
2ea4e608509150f8110b16d6e8af39284ca6c93a
/data/tidb-deploy-v2/tikv-9385/bin
192.168.254.12:2930
在线
今天 00:35
v4.0.10
6665d6e906d0cba745a2ec726c1fc892843e4cbb
/data/tidb-deploy-v2/tiflash-8000/bin/tiflash

请问下升级时具体报什么错误?

基本和之前一样的

tiup-cluster-debug-2021-02-04-00-34-34.log (124.2 KB)

升级时间 2021-02-04 00:34:34, 对应报错tikv 服务器日志
tikv.log.2021-02-04-15.tar.gz (4.2 MB)

没有解决思路么?

1.请问下这个集群原先上线的时候就是 v4.0.8 版本吗?还是从低版本升级上来的?
2.麻烦检查下各个 tikv 版本是否都是一致的,之前发生过版本不一致导致这个问题:https://github.com/tikv/tikv/issues/2749

我们建立集群的时候是 v4.0.5, 现在用dashboard 看全部都是 4.0.10 了, 但是用tiup 集群还是v4.0.8

[tidb@data11 bin]$ ./tikv-server -V
TiKV 
Release Version:   4.0.10
Edition:           Community
Git Commit Hash:   2ea4e608509150f8110b16d6e8af39284ca6c93a
Git Commit Branch: heads/refs/tags/v4.0.10
UTC Build Time:    2021-01-15 03:16:35
Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
Profile:           dist_release
[tidb@data11 bin]$ 

[tidb@saas1 bin]$ ./tikv-server -V
TiKV 
Release Version:   4.0.10
Edition:           Community
Git Commit Hash:   2ea4e608509150f8110b16d6e8af39284ca6c93a
Git Commit Branch: heads/refs/tags/v4.0.10
UTC Build Time:    2021-01-15 03:16:35
Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
Profile:           dist_release

[tidb@saas2 bin]$ ./tikv-server -V
TiKV 
Release Version:   4.0.10
Edition:           Community
Git Commit Hash:   2ea4e608509150f8110b16d6e8af39284ca6c93a
Git Commit Branch: heads/refs/tags/v4.0.10
UTC Build Time:    2021-01-15 03:16:35
Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
Profile:           dist_release
[tidb@saas2 bin]$ 



现在就这里的信息不对

[tidb@data11 ~]$ tiup cluster display jiuji-tidb-cluster-v2
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster display jiuji-tidb-cluster-v2
Cluster type:       tidb
Cluster name:       jiuji-tidb-cluster-v2
Cluster version:    v4.0.8  //这里不对, 网页dashboard 都是v4.0.10 了
SSH type:           builtin
Dashboard URL:      http://192.168.254.32:9379/dashboard
ID                   Role          Host            Ports                            OS/Arch       Status  Data Dir                              Deploy Dir
--                   ----          ----            -----                            -------       ------  --------                              ----------
192.168.254.12:9095  alertmanager  192.168.254.12  9095/9096                        linux/x86_64  Up      /data/tidb-data-v2/alertmanager-9095  /data/tidb-deploy-v2/alertmanager-9095
192.168.254.12:3030  grafana       192.168.254.12  3030                             linux/x86_64  Up      -                                     /data/tidb-deploy-v2/grafana-3030
192.168.254.13:9379  pd            192.168.254.13  9379/9380                        linux/x86_64  Up      /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.31:9379  pd            192.168.254.31  9379/9380                        linux/x86_64  Up|L    /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.32:9379  pd            192.168.254.32  9379/9380                        linux/x86_64  Up|UI   /data/tidb-data-v2/pd-9379            /data/tidb-deploy-v2/pd-9379
192.168.254.12:9080  prometheus    192.168.254.12  9080                             linux/x86_64  Up      /data/tidb-data-v2/prometheus-9080    /data/tidb-deploy-v2/prometheus-9080
192.168.254.13:9383  tidb          192.168.254.13  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.31:9383  tidb          192.168.254.31  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.32:9383  tidb          192.168.254.32  9383/9384                        linux/x86_64  Up      -                                     /data/tidb-deploy-v2/tidb-9383
192.168.254.12:8000  tiflash       192.168.254.12  8000/7123/2930/20270/20192/7234  linux/x86_64  Up      /data/tidb-data-v2/tiflash-8000       /data/tidb-deploy-v2/tiflash-8000
192.168.254.13:9385  tikv          192.168.254.13  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
192.168.254.31:9385  tikv          192.168.254.31  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385
192.168.254.32:9385  tikv          192.168.254.32  9385/9386                        linux/x86_64  Up      /data/tidb-data-v2/tikv-9385          /data/tidb-deploy-v2/tikv-9385

1.请问当时从 v4.0.5 升级到 v4.0.8 过程中有什么异常报错?还是升级全部都正常?
2.我这边本地暂时没有复现你这个升级问题,你这边方便重启下集群,然后重新升级到 v4.0.10 再试下吗?