tiup reload 配置失败

Hacker_URrvEGHH · 2021 年7 月 26 日 02:54

【 TiDB 使用环境】

【概述】场景 + 问题概述
在使用tiup cluster edit-config 修改tidb配置后，使用tiup cluster reload tidb-test -R tidb重新加载配置失败。

【背景】做过哪些操作

【现象】业务和数据库现象

返回的错误信息如下：

[ Serial ] - UpdateTopology: cluster=tidb-test
{“level”:“warn”,“ts”:“2021-07-26T10:45:38.760+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.0/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc000497dc0/#initially=[10.10.10.207:2379;10.10.10.208:2379;10.10.10.209:2379]”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed”}

Error: context deadline exceeded

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2021-07-26-10-45-43.log.
Error: run /root/.tiup/components/cluster/v1.5.2/tiup-cluster (wd:/root/.tiup/data/SeEahiP) failed: exit status 1

日志文件tiup-cluster-debug-2021-07-26-10-45-43.log中包含的错误信息如下：
2021-07-26T10:45:28.759+0800 DEBUG TaskBegin {“task”: “UpdateTopology: cluster=tidb-test”}
2021-07-26T10:45:38.760+0800 DEBUG TaskFinish {“task”: “UpdateTopology: cluster=tidb-test”, “error”: “context deadline exceeded”}
2021-07-26T10:45:38.760+0800 INFO Execute command finished {“code”: 1, “error”: “context deadline exceeded”, “errorVerbose”: “context deadline exceeded\ngithub.com/pingcap/errors.AddStack\ \tgithub.com/pingcap/errors@v0.11.4/errors.go:174\ github.com/pingcap/errors.Trace\ \tgithub.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ github.com/pingcap/tiup/pkg/cluster/manager.(*Manager).Reload\ \tgithub.com/pingcap/tiup/pkg/cluster/manager/reload.go:121\ github.com/pingcap/tiup/components/cluster/command.newReloadCmd.func1\ \tgithub.com/pingcap/tiup/components/cluster/command/reload.go:40\ github.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.1.3/command.go:852\ github.com/spf13/cobra.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.1.3/command.go:960\ github.com/spf13/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.1.3/command.go:897\ github.com/pingcap/tiup/components/cluster/command.Execute\ \tgithub.com/pingcap/tiup/components/cluster/command/root.go:264\ main.main\ \tgithub.com/pingcap/tiup/components/cluster/main.go:23\ runtime.main\ \truntime/proc.go:225\ runtime.goexit\ \truntime/asm_amd64.s:1371”}

执行tiup cluster display输出的结果为：
Starting component cluster: /root/.tiup/components/cluster/v1.5.2/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v5.1.0
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://10.10.10.209:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir

10.10.10.207:9093 alertmanager 10.10.10.207 9093/9094 linux/x86_64 Up /tidb-data/alertmanager-9093 /tidb-deploy/alertmanager-9093
10.10.10.207:3000 grafana 10.10.10.207 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
10.10.10.207:2379 pd 10.10.10.207 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.208:2379 pd 10.10.10.208 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.209:2379 pd 10.10.10.209 2379/2380 linux/x86_64 Up|L|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.207:9090 prometheus 10.10.10.207 9090 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
10.10.10.207:4000 tidb 10.10.10.207 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.208:4000 tidb 10.10.10.208 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.209:4000 tidb 10.10.10.209 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.207:9000 tiflash 10.10.10.207 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /tiflash_data/tiflash-9000 /tidb-deploy/tiflash-9000
10.10.10.207:20160 tikv 10.10.10.207 20160/20180 linux/x86_64 Up /tikv_data/tikv-20160 /tidb-deploy/tikv-20160
10.10.10.208:20160 tikv 10.10.10.208 20160/20180 linux/x86_64 Up /tikv_data/tikv-20160 /tidb-deploy/tikv-20160
10.10.10.209:20160 tikv 10.10.10.209 20160/20180 linux/x86_64 Up /tikv_data/tikv-20160 /tidb-deploy/tikv-20160
Total nodes: 13

【问题】当前遇到的问题

【业务影响】
tidb参数无法修改。

【TiDB 版本】
5.1

这道题我不会 · 2021 年7 月 26 日 07:29

请问具体是通过 tiup cluster edit-config 修改什么参数失败了？具体参数是如何设置的？

Billmay表妹 · 2021 年7 月 27 日 11:46

请问你的问题解决了吗？

Hacker_URrvEGHH · 2021 年7 月 28 日 01:17

我是准备增加下面几个参数：

server_configs:
tidb:
log.level: warn
performance.gogc: 1000
performance.max-procs: 384

但这个问题我觉得似乎和具体的参数值关系不大。

这道题我不会 · 2021 年7 月 28 日 02:05

你当时设置的参数格式和下面截图一致吗？我测试了下没有问题，另外你可以看下 reload tidb 时 tidb.log 中有什么报错信息。

Hacker_URrvEGHH · 2021 年7 月 28 日 05:43

日志中的错误如下：

我看了一下reload.go 121行对应的代码：
tlsCfg, err := topo.TLSConfig(m.specManager.Path(name, spec.TLSCertKeyDir))
if err != nil {
return err
}
似乎和TLS相关的配置有关？

这道题我不会 · 2021 年7 月 28 日 05:54

默认情况下集群是没有启用 TLS 的，关系应该不大，除非你的集群中显示开启了；另外检查下 tiup 中控机和 tidb 节点之间的通信是否都正常，以及 tidb.log 中是否有报错信息。

Hacker_URrvEGHH · 2021 年7 月 28 日 23:55

我使用tiup启动或停止tidb节点是可以的，这能说明tiup中控机和tidb节点间的通信是正常的吗？
在tidb的日志中，我能看到下面的错误：

[2021/07/29 07:52:07.921 +08:00] [INFO] [grpclogger.go:77] [“ClientConn switching balancer to “pick_first””] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:25.942 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:30.634 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:35.014 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:55.943 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:00.635 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:05.014 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:25.945 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:30.635 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:35.016 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:55.946 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:00.635 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:05.017 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:25.948 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:30.634 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:35.017 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]

db_user · 2021 年7 月 29 日 01:30

应该不是参数的问题，我一组测试环境v5.0.2是成功的

Hacker_URrvEGHH · 2021 年7 月 29 日 01:35

是的，我也觉得不是参数的问题。我是想试一下这两个参数对性能的影响，有什么办法不使用tiup reload而使这两个参数生效吗？

db_user · 2021 年7 月 29 日 01:39

这是启动的报错吧，reload也是启动这块出问题了，我的理解是重启就可以，但你这貌似重启不来啊

Hacker_URrvEGHH · 2021 年7 月 29 日 01:41

启动是没问题的。所有节点都是正常的。就是在修改配置后，reload出问题。

db_user · 2021 年7 月 29 日 01:42

那可以试着直接修改对应节点的配置文件，然后重启下

Hacker_URrvEGHH · 2021 年7 月 29 日 01:45

是指的/tidb-deploy/tidb-4000/conf/tidb.toml文件吗？还是其它的文件？我看这个文件里写着，所以我担心编辑了也不会生效。

WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!

You can use ‘tiup cluster edit-config’ and ‘tiup cluster reload’ to update the configuration

All configuration items you want to change can be added to:

db_user · 2021 年7 月 29 日 01:48

/data/tidb/.tiup/storage/cluster/clusters/cluster-name/meta.yaml
那个文件加上这个文件
不过这种方式是不建议的方式，但你要是测试环境，仅为了测试参数也无所谓，因为你这个失败的问题短时间还排查不出来

Hacker_URrvEGHH · 2021 年7 月 29 日 01:54

好的。我看了一下，meta.yaml已经改过了。如果修改tidb.toml文件也是按照下面的格式修改吗？另外，我怎么检查参数修改成功了呢？
server_configs:
tidb:
log.level: warn
performance.gogc: 500
performance.max-procs: 384

db_user · 2021 年7 月 29 日 01:56

我刚刚就想说，你这个格式不太对啊，你要和我那样写，要有空格的，然后你修改完之后edit-config和你修改后的结果一样就是成功了

Hacker_URrvEGHH · 2021 年7 月 29 日 01:59

修改是有空格的，但贴过来的时候，空格自动给去掉了。实际的格式是这样子的。

db_user · 2021 年7 月 29 日 02:01

哦哦。那就没问题，另外再注意下:是不是英文的，可以先按我那种方法修改着看看

Hacker_URrvEGHH · 2021 年7 月 29 日 02:07

嗯，改好了。按照这个格式修改tidb.toml文件是可以成功的。如果还是按照meta.yaml文件的格式，启动tidb会失败。最后的问题是，怎么判断修改的参数生效了呢？