tidb4.0.11升级到5.0.1

当前集群版本是4.0.11,离线升级到5.0.1
已下载离线安装tar包
按照官网的方法,我应该是使用离线升级的方式。
当我检查集群健康状况的时候tiup cluster check tidb-okaydev --cluster
很多failed的状态,其中端口占用应该是正常的吧,集群是启动状态,对应端口肯定是up的。
Node Check Result Message


10.60.0.75 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.60.0.75 cpu-cores Pass number of CPU cores / threads: 16
10.60.0.75 swap Fail swap is enabled, please disable it for best performance
10.60.0.75 memory Pass memory size is 31442MB
10.60.0.75 epoll-exclusive Fail epoll exclusive is not supported
10.60.0.75 disk Warn mount point / does not have ‘noatime’ option set
10.60.0.75 disk Warn mount point / does not have ‘noatime’ option set
10.60.0.75 listening-port Fail port 2379 is already in use
10.60.0.75 listening-port Fail port 2380 is already in use
10.60.0.75 listening-port Fail port 20160 is already in use
10.60.0.75 listening-port Fail port 20180 is already in use
10.60.0.75 listening-port Fail port 4000 is already in use
10.60.0.75 listening-port Fail port 10080 is already in use
10.60.0.75 limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.75 limits Fail hard limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.75 limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
10.60.0.75 sysctl Fail net.core.somaxconn = 128, should be greater than 32768
10.60.0.75 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.60.0.75 sysctl Fail net.ipv4.tcp_tw_recycle = 1, should be 0
10.60.0.75 selinux Pass SELinux is disabled
10.60.0.75 thp Fail THP is enabled, please disable it for best performance
10.60.0.75 command Fail numactl not usable, bash: numactl: command not found
10.60.0.76 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.60.0.76 cpu-cores Pass number of CPU cores / threads: 16
10.60.0.76 swap Fail swap is enabled, please disable it for best performance
10.60.0.76 memory Pass memory size is 31442MB
10.60.0.76 epoll-exclusive Fail epoll exclusive is not supported
10.60.0.76 disk Warn mount point / does not have ‘noatime’ option set
10.60.0.76 disk Warn mount point / does not have ‘noatime’ option set
10.60.0.76 listening-port Fail port 2379 is already in use
10.60.0.76 listening-port Fail port 2380 is already in use
10.60.0.76 listening-port Fail port 20160 is already in use
10.60.0.76 listening-port Fail port 20180 is already in use
10.60.0.76 listening-port Fail port 4000 is already in use
10.60.0.76 listening-port Fail port 10080 is already in use
10.60.0.76 limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.76 limits Fail hard limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.76 limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
10.60.0.76 sysctl Fail net.core.somaxconn = 128, should be greater than 32768
10.60.0.76 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.60.0.76 sysctl Fail net.ipv4.tcp_tw_recycle = 1, should be 0
10.60.0.76 selinux Pass SELinux is disabled
10.60.0.76 thp Pass THP is disabled
10.60.0.76 command Fail numactl not usable, bash: numactl: command not found
10.60.0.78 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.60.0.78 cpu-cores Pass number of CPU cores / threads: 16
10.60.0.78 swap Fail swap is enabled, please disable it for best performance
10.60.0.78 memory Pass memory size is 31442MB
10.60.0.78 epoll-exclusive Fail epoll exclusive is not supported
10.60.0.78 disk Warn mount point / does not have ‘noatime’ option set
10.60.0.78 disk Warn mount point / does not have ‘noatime’ option set
10.60.0.78 listening-port Fail port 2379 is already in use
10.60.0.78 listening-port Fail port 2380 is already in use
10.60.0.78 listening-port Fail port 20160 is already in use
10.60.0.78 listening-port Fail port 20180 is already in use
10.60.0.78 listening-port Fail port 4000 is already in use
10.60.0.78 listening-port Fail port 10080 is already in use
10.60.0.78 limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.78 limits Fail hard limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.78 limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
10.60.0.78 sysctl Fail net.core.somaxconn = 128, should be greater than 32768
10.60.0.78 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.60.0.78 sysctl Fail net.ipv4.tcp_tw_recycle = 1, should be 0
10.60.0.78 selinux Pass SELinux is disabled
10.60.0.78 thp Pass THP is disabled
10.60.0.78 command Fail numactl not usable, bash: numactl: command not found
10.60.0.188 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.60.0.188 cpu-cores Pass number of CPU cores / threads: 16
10.60.0.188 swap Fail swap is enabled, please disable it for best performance
10.60.0.188 memory Pass memory size is 32321MB
10.60.0.188 epoll-exclusive Fail epoll exclusive is not supported
10.60.0.188 disk Fail mount point /xdfapp does not have ‘nodelalloc’ option set
10.60.0.188 disk Warn mount point /xdfapp does not have ‘noatime’ option set
10.60.0.188 listening-port Fail port 3930 is already in use
10.60.0.188 listening-port Fail port 20170 is already in use
10.60.0.188 listening-port Fail port 20292 is already in use
10.60.0.188 listening-port Fail port 8234 is already in use
10.60.0.188 listening-port Fail port 8001 is already in use
10.60.0.188 listening-port Fail port 8223 is already in use
10.60.0.188 limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.188 limits Fail hard limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.188 limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
10.60.0.188 sysctl Fail net.core.somaxconn = 128, should be greater than 32768
10.60.0.188 sysctl Fail net.ipv4.tcp_tw_recycle = 1, should be 0
10.60.0.188 selinux Pass SELinux is disabled
10.60.0.188 thp Fail THP is enabled, please disable it for best performance
10.60.0.188 command Fail numactl not usable, bash: numactl: command not found
10.60.0.74 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.60.0.74 cpu-cores Pass number of CPU cores / threads: 16
10.60.0.74 swap Fail swap is enabled, please disable it for best performance
10.60.0.74 memory Pass memory size is 32000MB
10.60.0.74 listening-port Fail port 9093 is already in use
10.60.0.74 listening-port Fail port 9094 is already in use
10.60.0.74 listening-port Fail port 8090 is already in use
10.60.0.74 listening-port Fail port 3000 is already in use
10.60.0.74 limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.74 limits Fail hard limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.60.0.74 limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
10.60.0.74 sysctl Fail net.core.somaxconn = 128, should be greater than 32768
10.60.0.74 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.60.0.74 sysctl Fail net.ipv4.tcp_tw_recycle = 1, should be 0
10.60.0.74 selinux Pass SELinux is disabled
10.60.0.74 thp Fail THP is enabled, please disable it for best performance
10.60.0.74 command Fail numactl not usable, bash: numactl: command not found


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

是的,如果是启动状态,占用是正常的

  1. 文件系统是xfs的,mount point /xdfapp does not have ‘nodelalloc’ option set这种是不是可以忽略
  2. epoll-exclusive Fail epoll exclusive is not supported这个咋解决,我的内核是3.10.0-123.el7.x86_64,有的人说升级内核,这个不太现实吧,而且部署文档上好像没有说要求内核是多少的
  3. limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
    limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
    这些要设置成多少的
  4. swap Fail swap is enabled, please disable it for best performance这个可以用swapoff -a关闭,但是文档上也没建议说关闭
  5. sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
    sysctl Fail net.ipv4.tcp_tw_recycle = 1, should be 0
    sysctl Fail net.core.somaxconn = 128, should be greater than 32768
    这参数一定要设置吗?

综上,状态是fail和warning的是否可以忽略

如果可以修改尽量修改下,参考你们环境和业务影响来判断。

好,谢谢,那开发环境的我忽略这些fail,直接升级可以升上去吗?

我把离线tar包上传了,然后解压,按照文档,是直接执行tiup cluster upgrade v5.0.1
我的解压包是tidb-community-server-v5.0.1-linux-amd64
根目录中也包含4.0.11版本的
-bash-4.2$ ls
etc tidb-community-server-v4.0.11-linux-amd64 tidb-community-server-v5.0.1-linux-amd64 tidb-data tidb-deploy
但是升级时报错如下:
2021-05-20T11:02:48.003+0800 INFO Execute command finished {“code”: 1, “error”: “version v5.0.1 on linux/amd64 for component grafana not found”, “errorVerbose”: “version v5.0.1 on linux/amd64 for component grafana not foun
d\ngithub.com/pingcap/errors.AddStack\ \tgithub.com/pingcap/errors@v0.11.5-0.20200820035142-66eb5bf1d1cd/errors.go:174\ github.com/pingcap/errors.Trace\ \tgithub.com/pingcap/errors@v0.11.5-0.20200820035142-66eb5bf1d1cd/juju_adaptor.go:
15\ngithub.com/pingcap/tiup/pkg/cluster/manager.(*Manager).Upgrade\ \tgithub.com/pingcap/tiup@/pkg/cluster/manager/upgrade.go:177\ github.com/pingcap/tiup/components/cluster/command.newUpgradeCmd.func1\ \tgithub.com/pingcap/tiup@/compo
nents/cluster/command/upgrade.go:38\ngithub.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:842\ github.com/spf13/cobra.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.0.0/command.go:950\ github.com/spf1
3/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:887\ github.com/pingcap/tiup/components/cluster/command.Execute\ \tgithub.com/pingcap/tiup@/components/cluster/command/root.go:247\ main.main\ \tgithub.com/pingcap/
tiup@/components/cluster/main.go:23\ runtime.main\ \truntime/proc.go:203\ runtime.goexit\ \truntime/asm_amd64.s:1357”}

问一下,咱们是按照这个方式进行的升级操作吗?(有没有更新本地的TiUP 离线镜像)https://docs.pingcap.com/zh/tidb/stable/upgrade-tidb-using-tiup-offline#使用-tiup-离线镜像升级-tidb

是的,您好,升级成功了,自己总结了升级步骤,谢谢了

总结的升级步骤方便分享出来么

就是把官网中容易混淆的去掉了而已,都是摘抄的文档中的,可能只适合我自己,大致如下:
此处基于v4.0.11升级到v5.0.1,所有升级操作均在中控机执行
Refer to: https://docs.pingcap.com/zh/tidb/stable/upgrade-tidb-using-tiup
1. 检查集群健康状态
tiup cluster check tidb-okaydev --cluster
对fail的项可进行修复,测试环境可以选择忽略,线上建议修复

2. 下载离线版本包5.0.1并解压
tidb-community-server-v5.0.1-linux-amd64

3. 升级tiup(建议tiup版本不低于1.4.0)
tiup update --self
tiup --version

4. 升级tiup cluster(建议tiup cluster版本不低于1.4.0)
tiup upgrade cluster
tiup cluster --version

5. 更新tiup离线镜像
执行新镜像目录下的local_install.sh脚本会完成覆盖升级
sh tidb-community-server-${version}-linux-amd64/local_install.sh --此处version为5.0.1
如:sh tidb-community-server-5.0.1-linux-amd64/local_install.sh
再根据提示执行环境变量更新
source /xdfapp/tidb/.profile
升级cluster组件
tiup update cluster
注:此时离线镜像已经更新成功。如果覆盖后发现 TiUP 运行报错,可能是 manifest 未更新导致,可尝试 rm -rf ~/.tiup/manifests/* 后再使用。

6. 升级tidb集群
注:升级的方式有两种:不停机升级和停机升级。TiUP Cluster 默认的升级 TiDB 集群的方式是不停机升级,即升级过程中集群仍然可以对外提供服务。升级时会对各节点逐个迁移 leader 后再升级和重启,因此对于大规模集群需要较长时间才能完成整个升级操作。如果业务有维护窗口可供数据库停机维护,则可以使用停机升级的方式快速进行升级操作。
a. 不停机升级即在线升级
tiup cluster upgrade v5.0.1
滚动升级会逐个升级所有的组件。升级 TiKV 期间,会逐个将 TiKV 上的所有 leader 切走再停止该 TiKV 实例。默认超时时间为 5 分钟(300 秒),超时后会直接停止该实例。
如果不希望驱逐 leader,而希望快速升级集群至新版本,可以在上述命令中指定 --force,该方式会造成性能抖动,不会造成数据损失。
如果希望保持性能稳定,则需要保证 TiKV 上的所有 leader 驱逐完成后再停止该 TiKV 实例,可以指定 --transfer-timeout 为一个更大的值,如 --transfer-timeout 3600,单位为秒。
b. 停服升级
先停止整个集群
tiup cluster stop
再通过upgrade添加–offline参数来停机升级
tiup cluster upgrade --offline
启动集群(停服升级的升级后不会自动启动集群,需手动启动)
tiup cluster start

7. 验证集群版本
tiup cluster display

【附:FAQ

  1. 升级期间报错中断的,处理完报错之后,如何继续升级
    重新执行 tiup cluster upgrade 命令进行升级,升级操作会重启之前已经升级完成的节点。如果不希望重启已经升级过的节点,可以使用 replay 子命令来重试操作,具体方法如下:
    a. 使用 tiup cluster audit 命令查看操作记录
    在其中找到失败的升级操作记录,并记下该操作记录的 ID,下一步中将使用 表示操作记录 ID 的值
    b. 使用 tiup cluster replay 命令重试对应操作

  2. 升级过程中 evict leader 等待时间过长,如何跳过该步骤快速升级
    可以指定 --force,升级时会跳过 PD transfer leader 和 TiKV evict leader 过程,直接重启并升级版本,对线上运行的集群性能影响较大。命令如下:
    tiup cluster upgrade --force

  3. 升级完成后,如何更新 pd-ctl 等周边工具版本
    可通过 TiUP 安装对应版本的 ctl 组件来更新相关工具版本:
    tiup install ctl:v5.0.1
    附End】

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。