TiUP佈署 ([FATAL] [lib.rs:483] ["couldn't find the OPTIONS file"]) - v4.0.6

Hi TiDB顧問們 你們好,

 根據官網以下TiKV配置文件參數做配置: (yaml檔案在附件)
 https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file

在進行TiUP佈署時正常如下,但在啟動時會無法啟動TiKV,查看log,錯誤訊息為:

[FATAL] [lib.rs:483] [“couldn’t find the OPTIONS file”]

[tidb@tidb01 ~]$ tiup cluster deploy tidb-test v4.0.6 ./topology.yaml --user tidb -i /home/tidb/.ssh/id_rsa
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.1.2/tiup-cluster deploy tidb-test v4.0.6 ./topology.yaml --user tidb -i /home/tidb/.ssh/id_rsa
Please confirm your topology:
tidb Cluster: tidb-test
tidb Version: v4.0.6
Type Host Ports OS/Arch Directories


pd 192.168.33.11 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
pd 192.168.33.12 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
pd 192.168.33.13 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
tikv 192.168.33.11 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tikv 192.168.33.12 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tikv 192.168.33.13 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tidb 192.168.33.11 4000/10080 linux/x86_64 /tidb-deploy/tidb-4000
tidb 192.168.33.12 4000/10080 linux/x86_64 /tidb-deploy/tidb-4000
tidb 192.168.33.13 4000/10080 linux/x86_64 /tidb-deploy/tidb-4000
tiflash 192.168.33.14 9000/8123/3930/20170/20292/8234 linux/x86_64 /tidb-deploy/tiflash-9000,/tidb-data/tiflash-9000
prometheus 192.168.33.14 9090 linux/x86_64 /tidb-deploy/prometheus-8249,/tidb-data/prometheus-8249
grafana 192.168.33.14 3000 linux/x86_64 /tidb-deploy/grafana-3000
alertmanager 192.168.33.14 9093/9094 linux/x86_64 /tidb-deploy/alertmanager-9093,/tidb-data/alertmanager-9093
Attention:
1. If the topology is not what you expected, check your yaml file.
2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: y

  • Generate SSH keys … Done
  • Download TiDB components
    • Download pd:v4.0.6 (linux/amd64) … Done
    • Download tikv:v4.0.6 (linux/amd64) … Done
    • Download tidb:v4.0.6 (linux/amd64) … Done
    • Download tiflash:v4.0.6 (linux/amd64) … Done
    • Download prometheus:v4.0.6 (linux/amd64) … Done
    • Download grafana:v4.0.6 (linux/amd64) … Done
    • Download alertmanager:v0.17.0 (linux/amd64) … Done
    • Download node_exporter:v0.17.0 (linux/amd64) … Done
    • Download blackbox_exporter:v0.12.0 (linux/amd64) … Done
  • Initialize target host environments
    • Prepare 192.168.33.11:22 … Done
    • Prepare 192.168.33.12:22 … Done
    • Prepare 192.168.33.13:22 … Done
    • Prepare 192.168.33.14:22 … Done
  • Copy files
    • Copy pd → 192.168.33.11 … Done
    • Copy pd → 192.168.33.12 … Done
    • Copy pd → 192.168.33.13 … Done
    • Copy tikv → 192.168.33.11 … Done
    • Copy tikv → 192.168.33.12 … Done
    • Copy tikv → 192.168.33.13 … Done
    • Copy tidb → 192.168.33.11 … Done
    • Copy tidb → 192.168.33.12 … Done
    • Copy tidb → 192.168.33.13 … Done
    • Copy tiflash → 192.168.33.14 … Done
    • Copy prometheus → 192.168.33.14 … Done
    • Copy grafana → 192.168.33.14 … Done
    • Copy alertmanager → 192.168.33.14 … Done
    • Copy node_exporter → 192.168.33.11 … Done
    • Copy node_exporter → 192.168.33.12 … Done
    • Copy node_exporter → 192.168.33.13 … Done
    • Copy node_exporter → 192.168.33.14 … Done
    • Copy blackbox_exporter → 192.168.33.11 … Done
    • Copy blackbox_exporter → 192.168.33.12 … Done
    • Copy blackbox_exporter → 192.168.33.13 … Done
    • Copy blackbox_exporter → 192.168.33.14 … Done
  • Check status
    Deployed cluster tidb-test successfully, you can start the cluster via tiup cluster start tidb-test

tiup cluster start tidb-test

Error: failed to start tikv: 	tikv 192.168.33.11:20160 failed to start: timed out waiting for port 20160 to be started after 2m0s, please check the log of the instance: timed out waiting for port 20160 to be started after 2m0s

查看TiKV log,訊息如下:

[2020/09/29 02:17:20.280 +00:00] [FATAL] [lib.rs:483] [“couldn’t find the OPTIONS file”] [backtrace="stack backtrace:
0: tikv_util::set_panic_hook::{{closure}}
at components/tikv_util/src/lib.rs:482
1: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:475
2: std::panicking::begin_panic
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panicking.rs:404
3: engine::rocks::util::new_engine_opt::{{closure}}
at components/engine/src/rocks/util/mod.rs:159
core::option::Option::unwrap_or_else
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/option.rs:422
engine::rocks::util::new_engine_opt
at components/engine/src/rocks/util/mod.rs:157
4: cmd::server::TiKVServer::init_engines
at cmd/src/server.rs:356
cmd::server::run_tikv
at cmd/src/server.rs:98
5: tikv_server::main
at cmd/src/bin/tikv-server.rs:166
6: std::rt::lang_start::{{closure}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/rt.rs:67
7: main
8: __libc_start_main
9:
"] [location=components/engine/src/rocks/util/mod.rs:159] [thread_name=main]

topology.yaml (52.8 KB)

你好,
根据配置文件可以注意以下信息,

  1. tikv labels 标签完成之后,需要在 pd 配置文件中添加 localtion_labels 标签。
    对于 tikv 无法启动问题:
  2. 能否提供下 tikv 192.168.33.11:20160 完整的 tikv-deploy-dir/log/*,我们需要清楚该报错的上下文信息。

Hi 顧問你好,

  1. 若我的TiKV節點為192.168.33.11、192.168.33.12、192.168.33.13
    能請教一下在PD文件中要如何配置location_labels?

  2. 詳細的tikv-deploy-dir/log/*檔案如附件
    tikv.log (2.7 MB)

p.s. tikv_stderr.log為空

根据日志信息:

[2020/09/29 01:56:41.554 +00:00] [INFO] [mod.rs:46] [“memory limit in bytes: 1927102464, cpu cores quota: 2”]
可否换一个服务器内存,cpu 稍大些的服务器在测试看下是否可以部署成功,或者将架构调整为如下架构(准则是最好的服务器留给 tikv,让其单独部署在服务器上):。
tidb:
192.168.33.11
tikv:
192.168.33.12
pd:
192.168.33.13

可以进行如下操作完成这个架构的变更:

  1. tiup cluster display tidb-test,查看当前集群节点信息
  2. tiup cluster destroy tidb-test,销毁集群
  3. vi topoloy.yaml 将架构信息变更下
  4. 执行 deploy 步骤,成功后 start 集群。

Hi 顧問 你好,

 已根據你的建議
 1. 將機器CPU Cores :  2 => 4
  (由於Dev環境資源有限 才無法分別單獨部屬)

 2. 在topology.yaml PD區塊設定location-labels
 -  完整的topoloy.yaml

topology.yaml (52.8 KB)

 pd:
      replication.location-labels: ["zone","dc","host"]
 
 tikv_servers:
      - host: 192.168.33.11
        config:
              server.grpc-concurrency: 4
              server.labels: { zone: "zone1", dc: "dc1", host: "tidb01" }
      - host: 192.168.33.12
        config:
              server.grpc-concurrency: 4
              server.labels: { zone: "zone1", dc: "dc1", host: "tidb02" }
      - host: 192.168.33.13
        config:
              server.grpc-concurrency: 4
              server.labels: { zone: "zone1", dc: "dc1", host: "tidb03" }

 3. 將cluster destroy並根據新的topology.yaml重新部屬後 出現另外一個錯誤
** invalid configuration: default rocksdb not exist, buf raftdb exist **

  **是否設定檔還有哪邊需要做調整的地方呢?**

 完整的log如下:

tikv.log (211.0 KB)

p.s. tikv_stderr.log為空

对于 tidb 每个组件,除了一些必要的配置需要手动配置下,其他会使用默认值,比如 tikv 的相关配置,tiup 会根据当前服务器环境配置这些值,会动态调整,其默认参数我们可以不去配置,看你的习惯应该像是 mysql 的配置文件的类型,到 github 关注即可:

建议你可以将 topoloy 文件优化成类似这样的:

Hi 顧問你好,

後來我們把TiKV部份參數留空,全部採用default值後是可以啟動TiKV節點了,但還是很想知道到底是TiKV區塊配置了哪個參數後在啟動TiKV時會有以下錯誤,畢竟參數檔案也都是按照官網default值去設定的,理論上不應該有差別才是,還是這是v4.0.6版本的bug?

[FATAL] [lib.rs:483] [“couldn’t find the OPTIONS file”]

詳細參數檔如下連結:
topology.yaml (64.1 KB)

錯誤訊息:
tikv.log (1009.8 KB)

感谢反馈。我们这边先测试一下。

再麻煩顧問了!感謝!

请问具体是哪个参数导致的?能否按照默认参数和您的参数,依次修改每个和默认值不同的参数,帮忙定位下具体是哪个参数导致的,多谢。

使用了你提供的 tiup 的配置文件中的 TiKV 相关的参数在本地环境测试了一下没有相关的报错。看错误栈的位置应该是 Rocksdb 启动的时候返回的报错。请问一下相关的路径是否之前有测试的文件残留没有清理干净?

Hi 顧問們好,

 我找時間測試一下 再回報給你們 感謝幫忙^_^

ok~