搭建新集群时,deploy.yml阶段无法生成tikv的配置文件

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:V3.0.9
  • 【问题描述】:搭建新集群时,执行ansible-playbook deploy.yml 命令,tidb-ansible/log/fail.log文件 报错:

[TiKV3-e01]: Ansible FAILED! => playbook: deploy.yml; TASK: tikv : create config file; message: {“changed”: false, “msg”: “IndexError: list index out of range”}

[TiKV3-e02]: Ansible FAILED! => playbook: deploy.yml; TASK: tikv : create config file; message: {“changed”: false, “msg”: “IndexError: list index out of range”}

[TiKV3-e03]: Ansible FAILED! => playbook: deploy.yml; TASK: tikv : create config file; message: {“changed”: false, “msg”: “IndexError: list index out of range”}

tidb-ansible/inventory.ini文件中,TiKV Server的配置文件如下:

[tikv_servers]

TiKV3-e01 ansible_host=“test-tikv-e01” deploy_dir=/data1/deploy_tidb3/tikv3 tikv_port=20891 tikv_status_port=20181 labels=“test-tikv-e01”

TiKV3-e02 ansible_host=“test-tikv-e02” deploy_dir=/data1/deploy_tidb3/tikv3 tikv_port=20891 tikv_status_port=20181 labels=“test-tikv-e02”

TiKV3-e03 ansible_host=“test-tikv-e03” deploy_dir=/data1/deploy_tidb3/tikv3 tikv_port=20891 tikv_status_port=20181 labels=“test-tikv-e03”

到目标机器上观察,无法生成tikv的配置文件,且deploy_dir文件夹是空的,如下: image

以上搭建过程,没有修改过tidb-ansible/conf/tikv.yml的相关配置。

ansible-playbook bootstrap.yml

bootstrap 命令执行成功了吗?

执行了两次。

第一次,不注释tikv磁盘的iops检测,会失败。

第二次,注释后,执行成功.

这是注释后:

  • name: tikv_servers machine benchmark

hosts: tikv_servers

gather_facts: false

roles:

#- { role: machine_benchmark, when: not dev_mode|default(false) }

麻烦上传一下完整的 inventory.ini 文件看下

排版格式有点问题,放在文件中了

deploy_ini.ini (3.4 KB)

除 tikv_servers 以外,其他的配置不需要配置 labels,将其他的 labels 去掉再试一下

额,我是按照你们之前给的模板配置的。

去掉其他的labels,不应该啊,pd、tidb server、监控的配置文件都是全的

(不过,正在老老实实的尝试…

想知道这个报错是说什么,在源码的哪个位置? 是bug嘛?

按照去掉其他labels的方法。还是报错,如下:

TASK [tikv : create config file] **********************************************************************************

fatal: [TiKV3-e01]: FAILED! => {“changed”: false, “msg”: “IndexError: list index out of range”}

fatal: [TiKV3-e02]: FAILED! => {“changed”: false, “msg”: “IndexError: list index out of range”}

fatal: [TiKV3-e03]: FAILED! => {“changed”: false, “msg”: “IndexError: list index out of range”} to retry, use: --limit @/home/tidb3/tidb-ansible/retry_files/deploy.retry

PLAY RECAP ********************************************************************************************************

PD3-e01 : ok=31 changed=6 unreachable=0 failed=0

PD3-e02 : ok=20 changed=1 unreachable=0 failed=0

PD3-e03 : ok=20 changed=1 unreachable=0 failed=0

TiDB3 : ok=15 changed=5 unreachable=0 failed=0

TiKV3-e01 : ok=28 changed=5 unreachable=0 failed=1

TiKV3-e02 : ok=16 changed=0 unreachable=0 failed=1

TiKV3-e03 : ok=16 changed=0 unreachable=0 failed=1

grafana3 : ok=21 changed=5 unreachable=0 failed=0

importer3_e02 : ok=3 changed=0 unreachable=0 failed=0

lightning3_e03 : ok=3 changed=0 unreachable=0 failed=0

localhost : ok=7 changed=4 unreachable=0 failed=0

nodeblack-e01 : ok=36 changed=3 unreachable=0 failed=0

nodeblack-e02 : ok=35 changed=3 unreachable=0 failed=0

nodeblack-e03 : ok=35 changed=3 unreachable=0 failed=0

prometheus3 : ok=29 changed=5 unreachable=0 failed=0

ERROR MESSAGE SUMMARY *********************************************************************************************

[TiKV3-e01]: Ansible FAILED! => playbook: deploy.yml; TASK: tikv : create config file; message: {“changed”: false, “msg”: “IndexError: list index out of range”}

[TiKV3-e02]: Ansible FAILED! => playbook: deploy.yml; TASK: tikv : create config file; message: {“changed”: false, “msg”: “IndexError: list index out of range”}

[TiKV3-e03]: Ansible FAILED! => playbook: deploy.yml; TASK: tikv : create config file; message: {“changed”: false, “msg”: “IndexError: list index out of range”}

Ask for help: Contact us: support@pingcap.com It seems that you encounter some problems. You can send an email to the above email address, attached with the tidb-ansible/inventory.ini and tidb-ansible/log/ansible.log files and the error message, or new issue on https://github.com/pingcap/tidb-ansible/issues. We’ll try our best to help you deploy a TiDB cluster. Thanks. :slight_smile:

使用 tikv labels 时需要在 pd 的 location_labels 配置对应的 lables

已配置,还是同样的问题哈~

报错同上

附文件 deploy_ini.ini (3.2 KB)

用下面的 inventory.ini 配置文件 deploy 一下

## TiDB Cluster Part
[tidb_servers]
TiDB3 ansible_host="test-tikv-e01" deploy_dir=/data1/deploy_tidb3/tidb3 tidb_port=4003 tidb_status_port=10003

[tikv_servers]
TiKV3-e01 ansible_host="test-tikv-e01" deploy_dir=/data1/deploy_tidb3/tikv3 tikv_port=20891 tikv_status_port=20181 labels="host=test-tikv-e01"
TiKV3-e02 ansible_host="test-tikv-e02" deploy_dir=/data1/deploy_tidb3/tikv3 tikv_port=20891 tikv_status_port=20181 labels="host=test-tikv-e02"
TiKV3-e03 ansible_host="test-tikv-e03" deploy_dir=/data1/deploy_tidb3/tikv3 tikv_port=20891 tikv_status_port=20181 labels="host=test-tikv-e03"


[pd_servers]
PD3-e01 ansible_host="test-tikv-e01" deploy_dir=/data1/deploy_tidb3/pd3 pd_client_port=2581 pd_peer_port=2591
PD3-e02 ansible_host="test-tikv-e02" deploy_dir=/data1/deploy_tidb3/pd3 pd_client_port=2581 pd_peer_port=2591
PD3-e03 ansible_host="test-tikv-e03" deploy_dir=/data1/deploy_tidb3/pd3 pd_client_port=2581 pd_peer_port=2591


[spark_master]

[spark_slaves]

[lightning_server]
lightning3_e03 ansible_host="test-tikv-e03" deploy_dir=/data1/deploy_tidb3/lightning3 tidb_lightning_pprof_port=70089

[importer_server]
importer3_e02 ansible_host="test-tikv-e02" deploy_dir=/data1/deploy_tidb3/importer3 tikv_importer_port=20189

## Monitoring Part
# prometheus and pushgateway servers
[monitoring_servers]
prometheus3 ansible_host="test-tikv-e03" prometheus_port=8001 pushgateway_port=8002

[grafana_servers]
grafana3 ansible_host="test-tikv-e03" grafana_port=8003 grafana_collector_port=8004

# node_exporter and blackbox_exporter servers
[monitored_servers]
nodeblack-e01 ansible_host="test-tikv-e01" node_exporter_port=7101 blackbox_exporter_port=7111
nodeblack-e02 ansible_host="test-tikv-e02" node_exporter_port=7102 blackbox_exporter_port=7112
nodeblack-e03 ansible_host="test-tikv-e03" node_exporter_port=7103 blackbox_exporter_port=7113

[alertmanager_servers]

[kafka_exporter_servers]

## Binlog Part
[pump_servers]

[drainer_servers]

## Group variables
[pd_servers:vars]
# location_labels = ["zone","rack","host"]
location_labels = ["host"]

## Global variables
[all:vars]
deploy_dir = /data1/deploy_tidb3


## Connection
# ssh via normal user
ansible_user = tidb3

cluster_name = test-tidb3

tidb_version = v3.0.9

# process supervision, [systemd, supervise]
process_supervision = systemd

timezone = Asia/Shanghai

enable_firewalld = False
# check NTP service
enable_ntpd = True
set_hostname = False

## binlog trigger
enable_binlog = False

# kafka cluster address for monitoring, example:
# kafka_addrs = "192.168.0.11:9092,192.168.0.12:9092,192.168.0.13:9092"
kafka_addrs = ""

# zookeeper address of kafka cluster for monitoring, example:
# zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181"
zookeeper_addrs = ""

# enable TLS authentication in the TiDB cluster
enable_tls = False

# KV mode
deploy_without_tidb = False

# wait for region replication complete before start tidb-server.
wait_replication = True

# Optional: Set if you already have a alertmanager server.
# Format: alertmanager_host:alertmanager_port
alertmanager_target = ""

grafana_admin_user = "admin"
grafana_admin_password = "admin"


### Collect diagnosis
collect_log_recent_hours = 2

enable_bandwidth_limit = True
# default: 10Mb/s, unit: Kbit/s
collect_bandwidth_limit = 10000
1赞

问题已解决,的确是inventory.ini 关于tikv的配置有问题,需要加 “host=”

想问下,这个配置的读取代码是在源码的哪个位置,有文章链接嘛?

这个内容不在代码中,这个是 ansible 部署过程中遇到的问题,建议按照官方文档提供的配置模板填写进行部署