Hi 顧問好,
版本: v4.0.8
各節點角色:
172.31.13.101 TiDB/PD
172.31.13.102 TiKV
172.31.13.103 TiKV
172.31.13.104 TiKV
172.31.13.105 Grafana,Promethesus,Alertmanager
我們升級到v4.0.9過程中,因出現不認識以下參數返回錯誤
config file conf/tidb.toml contained unknown configuration options: performance.memory-usage-alarm-ratio, performance.server-memory-quota
因此透過tiup cluster edit-config tidbcluster
將上述兩個參數先拿掉,然後重啟tidb節點
tiup cluster restart tidbcluster -R tidb
出現以下錯誤而無法啟動TiDB角色
/data/tidb-deploy/tidb-4000/log/tidb.log錯誤訊息內容:
[2020/12/24 15:23:45.286 +08:00] [ERROR] [http_status.go:354] [“start status/rpc server error”] [error=“accept tcp [::]:10080: use of closed network connection”]
[2020/12/24 15:23:45.286 +08:00] [ERROR] [http_status.go:344] [“grpc server error”] [error=“mux: listener closed”]
[2020/12/24 15:23:45.286 +08:00] [ERROR] [http_status.go:349] [“http server error”] [error=“http: Server closed”]
/data/tidb-deploy/tidb-4000/log/tidb_stderr.log 錯誤訊息內容:
查看tidb.toml內容如下:
[log]
slow-query-file = “tidb-slow-overwrited.log”
slow-threshold = 300
[log.file]
max-days = 7
[tikv-client]
[tikv-client.copr-cache]
admission-max-result-mb = 10
admission-min-process-ms = 5
capacity-mb = 1000
北京大爷
(北京大爷)
3
看报错信息 应该是对应服务器的 tidb.toml
在 check-config 时候没有通过。
您可以登录到对应的 tidb server 上 对 tidb.toml进行检查。
另外看返回的报错信息 是 tidb 4.0.8 版本。和升级 4.0.9 是同一个问题吗?
Hi 北京大爺 顧問好,
[此環境因是migration real production data,再麻煩顧問幫忙看是否有機會修復,感謝!]
各節點角色:
172.31.13.101 TiDB/PD
172.31.13.102 TiKV
172.31.13.103 TiKV
172.31.13.104 TiKV
172.31.13.105 TiFlash,Grafana,Promethesus,Alertmanager
172.31.13.106 TiSpark
我們佈署的架構如下圖:
原本使用的TiDB版本為v4.0.8
之前因為查詢導致的OOM問題,嘗試過兩種方式加入以下參數:
performance.memory-usage-alarm-ratio: 0.8
performance.server-memory-quota: 34359738368
-
tiup cluster edit-config tidbcluster
-
編輯tidb.toml
透過tiup cluster reload tidbcluster -R tidb,載入設定值。
昨天因看到有新版v4.0.9,才進行升級作業
tiup cluster upgrade tidbcluster v4.0.9
執行過程中,提示以下錯誤:
config file confg/tidb.toml contained unknown options: memory-usage-alarm-ratio, server-memory-quota
因此我們透過edit-config將此兩個參數拿掉,並重啟tidb角色節點
就導致TiDB無法成功啟動,查看LOG的訊息如下:
/data/tidb-deploy/tidb-4000/log/tidb.log錯誤訊息內容:
[2020/12/24 15:23:45.286 +08:00] [ERROR] [http_status.go:354] [“start status/rpc server error”] [error=“accept tcp [::]:10080: use of closed network connection”]
[2020/12/24 15:23:45.286 +08:00] [ERROR] [http_status.go:344] [“grpc server error”] [error=“mux: listener closed”]
[2020/12/24 15:23:45.286 +08:00] [ERROR] [http_status.go:349] [“http server error”] [error=“http: Server closed”]
最後實在想不出辦法,只好先將tidb角色節點先移除,想說是否可透過scale-out,新增一個乾淨的tidb節點進到cluster, 才出現以下錯誤:
Error: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@172.31.13.101:22’ {ssh_stderr: , ssh_stdout: [2020/12/24 17:34:25.670 +08:00] [FATAL] [terror.go:257] [“unexpected error”] [error=“toml: cannot load TOML value of type int64 into a Go float”] [stack=“github.com/pingcap/parser/terror.MustNil
\t/home/jenkins/agent/workspace/tidb_v4.0.8/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20201022083903-fbe80b0c40bb/terror/terror.go:257
github.com/pingcap/tidb/config.InitializeConfig
\t/home/jenkins/agent/workspace/tidb_v4.0.8/go/src/github.com/pingcap/tidb/config/config.go:759
main.main
\t/home/jenkins/agent/workspace/tidb_v4.0.8/go/src/github.com/pingcap/tidb/tidb-server/main.go:165
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”] [stack=“github.com/pingcap/parser/terror.MustNil
\t/home/jenkins/agent/workspace/tidb_v4.0.8/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20201022083903-fbe80b0c40bb/terror/terror.go:257
github.com/pingcap/tidb/config.InitializeConfig
\t/home/jenkins/agent/workspace/tidb_v4.0.8/go/src/github.com/pingcap/tidb/config/config.go:759
main.main
\t/home/jenkins/agent/workspace/tidb_v4.0.8/go/src/github.com/pingcap/tidb/tidb-server/main.go:165
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”]
, ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin /data/tidb-deploy/tidb-4000/bin/tidb-server --config-check --config=/data/tidb-deploy/tidb-4000/conf/tidb.toml }, cause: Process exited with status 1: check config failed
tidb-scale-out.yaml內容如下:
tidb_servers:
- host: 172.31.13.101
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: "/data/tidb-deploy/tidb-4000"
log_dir: "/data/tidb-deploy/tidb-4000/log"
#numa_node: "0,1"
#The following configs are used to overwrite the `server_configs.tidb` values.
config:
log.slow-threshold: 300
log.slow-query-file: "tidb-slow-overwrited.log"
tidb.toml內容如下:
實在是看不太出來是哪個參數有轉型別問題
[log]
slow-query-file = “tidb-slow-overwrited.log”
slow-threshold = 300
[log.file]
max-days = 7
[tikv-client]
[tikv-client.copr-cache]
admission-max-result-mb = 10
admission-min-process-ms = 5
capacity-mb = 1000
@北京大爷
Hi 顧問好,
我們後來有找到問題了,應該是這兩個參數要為float,但目前TiUP scale-out xxx.yaml會設定
default值為int,才會導致 [error=“toml: cannot load TOML value of type int64 into a Go float”]
這部份可能要請你們看看是否能修改一下。
官網參數說明:
[log]
slow-query-file = “tidb-slow-overwrited.log”
slow-threshold = 300
[log.file]
max-days = 7
[tikv-client]
[tikv-client.copr-cache]
admission-max-result-mb = 10 => 10.0
admission-min-process-ms = 5
capacity-mb = 1000 => 1000.0
解決方式:
tidb-scale-out.yaml內容要將這兩個參數值指定為float
並且端口要先改為4001 & 10081
不然會噴以下錯誤
tidb_servers:
- host: 172.31.13.101
ssh_port: 22
port: **4001**
status_port: 10081
deploy_dir: **"/data/tidb-deploy/tidb-4001"**
log_dir: **"/data/tidb-deploy/tidb-4001/log"**
#numa_node: "0,1"
#The following configs are used to overwrite the `server_configs.tidb` values.
config:
log.slow-threshold: 300
log.slow-query-file: "tidb-slow-overwrited.log"
tikv-client.copr-cache.admission-max-result-mb = 10.0
tikv-client.copr-cache.capacity-mb = 1000.0
tiup cluster scale-out tidbcluster tidb-scale-out.yaml
這樣就能成功scale-out TiDB節點(port:4001,10081)
若要維持原本的 4000端口,則必須再寫一個scale-out.yaml是4000端口 拿掉原本4001端口的TiDB節點
最後我們再執行升級指令
tiup cluster upgrade tidbcluster v4.0.9
執行成功畫面:
北京大爷
(北京大爷)
6
关于端口冲突检测这块
Tiup 会在 TIup 中存储的 元数据进行 IP 与端口的检测。如果元数据中存在 同 ip 同端口的部署,那么会阻止后续 的扩容操作
关于 float 相关的问题 ,我这边会再确认下
感谢您的反馈
1 个赞
来了老弟
9
文档问题已经反馈,请关注 issue:https://github.com/pingcap/docs-cn/pull/5182
1 个赞
system
(system)
关闭
10
此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。