tiup reload 配置失败

【 TiDB 使用环境】

【概述】 场景 + 问题概述
在使用tiup cluster edit-config 修改tidb配置后,使用tiup cluster reload tidb-test -R tidb重新加载配置失败。

【背景】 做过哪些操作

【现象】 业务和数据库现象

返回的错误信息如下:

  • [ Serial ] - UpdateTopology: cluster=tidb-test
    {“level”:“warn”,“ts”:“2021-07-26T10:45:38.760+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.0/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc000497dc0/#initially=[10.10.10.207:2379;10.10.10.208:2379;10.10.10.209:2379]”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed”}

Error: context deadline exceeded

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2021-07-26-10-45-43.log.
Error: run /root/.tiup/components/cluster/v1.5.2/tiup-cluster (wd:/root/.tiup/data/SeEahiP) failed: exit status 1

日志文件tiup-cluster-debug-2021-07-26-10-45-43.log中包含的错误信息如下:
2021-07-26T10:45:28.759+0800 DEBUG TaskBegin {“task”: “UpdateTopology: cluster=tidb-test”}
2021-07-26T10:45:38.760+0800 DEBUG TaskFinish {“task”: “UpdateTopology: cluster=tidb-test”, “error”: “context deadline exceeded”}
2021-07-26T10:45:38.760+0800 INFO Execute command finished {“code”: 1, “error”: “context deadline exceeded”, “errorVerbose”: “context deadline exceeded\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/tiup/pkg/cluster/manager.(*Manager).Reload\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/reload.go:121\ngithub.com/pingcap/tiup/components/cluster/command.newReloadCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/reload.go:40\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:852\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:897\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:264\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:225\nruntime.goexit\n\truntime/asm_amd64.s:1371”}

执行tiup cluster display输出的结果为:
Starting component cluster: /root/.tiup/components/cluster/v1.5.2/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v5.1.0
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://10.10.10.209:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


10.10.10.207:9093 alertmanager 10.10.10.207 9093/9094 linux/x86_64 Up /tidb-data/alertmanager-9093 /tidb-deploy/alertmanager-9093
10.10.10.207:3000 grafana 10.10.10.207 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
10.10.10.207:2379 pd 10.10.10.207 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.208:2379 pd 10.10.10.208 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.209:2379 pd 10.10.10.209 2379/2380 linux/x86_64 Up|L|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.207:9090 prometheus 10.10.10.207 9090 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
10.10.10.207:4000 tidb 10.10.10.207 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.208:4000 tidb 10.10.10.208 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.209:4000 tidb 10.10.10.209 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.207:9000 tiflash 10.10.10.207 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /tiflash_data/tiflash-9000 /tidb-deploy/tiflash-9000
10.10.10.207:20160 tikv 10.10.10.207 20160/20180 linux/x86_64 Up /tikv_data/tikv-20160 /tidb-deploy/tikv-20160
10.10.10.208:20160 tikv 10.10.10.208 20160/20180 linux/x86_64 Up /tikv_data/tikv-20160 /tidb-deploy/tikv-20160
10.10.10.209:20160 tikv 10.10.10.209 20160/20180 linux/x86_64 Up /tikv_data/tikv-20160 /tidb-deploy/tikv-20160
Total nodes: 13

【问题】 当前遇到的问题

【业务影响】
tidb参数无法修改。

【TiDB 版本】
5.1

1赞

请问具体是通过 tiup cluster edit-config 修改什么参数失败了?具体参数是如何设置的?

请问你的问题解决了吗?

我是准备增加下面几个参数:

server_configs:
tidb:
log.level: warn
performance.gogc: 1000
performance.max-procs: 384

但这个问题我觉得似乎和具体的参数值关系不大。

你当时设置的参数格式和下面截图一致吗?我测试了下没有问题,另外你可以看下 reload tidb 时 tidb.log 中有什么报错信息。

日志中的错误如下:

2021-07-26T10:45:28.759+0800 DEBUG TaskBegin {“task”: “UpdateTopology: cluster=tidb-test”}
2021-07-26T10:45:38.760+0800 DEBUG TaskFinish {“task”: “UpdateTopology: cluster=tidb-test”, “error”: “context deadline exceeded”}
2021-07-26T10:45:38.760+0800 INFO Execute command finished {“code”: 1, “error”: “context deadline exceeded”, “errorVerbose”: “context deadline exceeded[ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/tiup/pkg/cluster/manager.(*Manager](http://ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/tiup/pkg/cluster/manager.(*Manager)).Reload\n[tgithub.com/pingcap/tiup/pkg/cluster/manager/reload.go:121\ngithub.com/pingcap/tiup/components/cluster/command.newReloadCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/reload.go:40\ngithub.com/spf13/cobra.(*Command](http://tgithub.com/pingcap/tiup/pkg/cluster/manager/reload.go:121\ngithub.com/pingcap/tiup/components/cluster/command.newReloadCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/reload.go:40\ngithub.com/spf13/cobra.(*Command)).execute\n[tgithub.com/spf13/cobra@v1.1.3/command.go:852\ngithub.com/spf13/cobra.(*Command](http://tgithub.com/spf13/cobra@v1.1.3/command.go:852\ngithub.com/spf13/cobra.(*Command)).ExecuteC\n[tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command](http://tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command)).Execute\n[tgithub.com/spf13/cobra@v1.1.3/command.go:897\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:264\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:225\nruntime.goexit\n\truntime/asm_amd64.s:1371](http://tgithub.com/spf13/cobra@v1.1.3/command.go:897\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:264\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:225\nruntime.goexit\n\truntime/asm_amd64.s:1371)”}

我看了一下reload.go 121行对应的代码:
tlsCfg, err := topo.TLSConfig(m.specManager.Path(name, spec.TLSCertKeyDir))
if err != nil {
return err
}
似乎和TLS相关的配置有关?

默认情况下集群是没有启用 TLS 的,关系应该不大,除非你的集群中显示开启了;另外检查下 tiup 中控机和 tidb 节点之间的通信是否都正常,以及 tidb.log 中是否有报错信息。

我使用tiup启动或停止tidb节点是可以的,这能说明tiup中控机和tidb节点间的通信是正常的吗?
在tidb的日志中, 我能看到下面的错误:

[2021/07/29 07:52:07.921 +08:00] [INFO] [grpclogger.go:77] [“ClientConn switching balancer to “pick_first””] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:25.942 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:30.634 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:35.014 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:52:55.943 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:00.635 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:05.014 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:25.945 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:30.635 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:35.016 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:53:55.946 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:00.635 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:05.017 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:25.948 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:30.634 +08:00] [WARN] [grpclogger.go:85] [“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams failed to receive the preface from client: EOF””] [system=grpc] [grpc_log=true]
[2021/07/29 07:54:35.017 +08:00] [INFO] [grpclogger.go:77] [“ccResolverWrapper: sending new addresses to cc: [{http://10.10.10.208:2379 0 } {http://10.10.10.209:2379 0 } {http://10.10.10.207:2379 0 }]”] [system=grpc] [grpc_log=true]

应该不是参数的问题,我一组测试环境v5.0.2是成功的
image

是的,我也觉得不是参数的问题。我是想试一下这两个参数对性能的影响,有什么办法不使用tiup reload而使这两个参数生效吗?

这是启动的报错吧,reload也是启动这块出问题了,我的理解是重启就可以,但你这貌似重启不来啊

启动是没问题的。所有节点都是正常的。就是在修改配置后,reload出问题。

那可以试着直接修改对应节点的配置文件,然后重启下

是指的/tidb-deploy/tidb-4000/conf/tidb.toml文件吗?还是其它的文件?我看这个文件里写着,所以我担心编辑了也不会生效。

WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!

You can use ‘tiup cluster edit-config’ and ‘tiup cluster reload’ to update the configuration

All configuration items you want to change can be added to:

/data/tidb/.tiup/storage/cluster/clusters/cluster-name/meta.yaml
那个文件加上这个文件
不过这种方式是不建议的方式,但你要是测试环境,仅为了测试参数也无所谓,因为你这个失败的问题短时间还排查不出来

好的。我看了一下,meta.yaml已经改过了。如果修改tidb.toml文件也是按照下面的格式修改吗?另外,我怎么检查参数修改成功了呢?
server_configs:
tidb:
log.level: warn
performance.gogc: 500
performance.max-procs: 384

我刚刚就想说,你这个格式不太对啊,你要和我那样写,要有空格的,然后你修改完之后edit-config和你修改后的结果一样就是成功了

修改是有空格的,但贴过来的时候,空格自动给去掉了。实际的格式是这样子的。
image

哦哦。那就没问题,另外再注意下:是不是英文的,可以先按我那种方法修改着看看

嗯,改好了。按照这个格式修改tidb.toml文件是可以成功的。如果还是按照meta.yaml文件的格式,启动tidb会失败。最后的问题是,怎么判断修改的参数生效了呢?