3.1版本已有的tidb集群上安装tiflash,启动时报错 ansible-playbook -t tiflash start.ym

FAILED - RETRYING: wait until the TiFlash status page is available (3 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (3 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (2 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (2 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (2 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (1 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (1 retries left). FAILED - RETRYING: wait until the TiFlash status page is available (1 retries left). fatal: [xxx.xxx.10.102]: FAILED! => {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “http://xxx.xxx.10.102:8123/?query=select%20version()”} fatal: [xxx.xxx.10.104]: FAILED! => {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “http://xxx.xxx.10.104:8123/?query=select%20version()”} fatal: [xxx.xxx.10.103]: FAILED! => {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “http://xxx.xxx.10.103:8123/?query=select%20version()”} to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/start.retry

PLAY RECAP ******************************************************************************************************************************************************************************************************************************************************************* xxx.xxx.10.101 : ok=3 changed=0 unreachable=0 failed=0
xxx.xxx.10.102 : ok=5 changed=1 unreachable=0 failed=1
xxx.xxx.10.103 : ok=5 changed=1 unreachable=0 failed=1
xxx.xxx.10.104 : ok=5 changed=1 unreachable=0 failed=1
localhost : ok=7 changed=4 unreachable=0 failed=0

ERROR MESSAGE SUMMARY ******************************************************************************************************************************************************************************************************************************************************** [xxx.xxx.10.102]: Ansible FAILED! => playbook: start.yml; TASK: wait until the TiFlash status page is available; message: {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “http://xxx.xxx.10.102:8123/?query=select%20version()”}

[xxx.xxx.10.104]: Ansible FAILED! => playbook: start.yml; TASK: wait until the TiFlash status page is available; message: {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “http://xxx.xxx.10.104:8123/?query=select%20version()”}

[xxx.xxx.10.103]: Ansible FAILED! => playbook: start.yml; TASK: wait until the TiFlash status page is available; message: {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “http://xxx.xxx.10.103:8123/?query=select%20version()”}

============================= tiflash.log 中重复出现以下日志 2020.05.19 18:16:49.963942 [ 6 ] Application: Start tiflash proxy 2020.05.19 18:16:49.964429 [ 1 ] Application: Listening http://0.0.0.0:8123 2020.05.19 18:16:49.966478 [ 1 ] Application: Listening tcp: 0.0.0.0:9000 2020.05.19 18:16:49.967042 [ 1 ] Application: Available RAM = 31.26 GiB; physical cores = 4; threads = 8. 2020.05.19 18:16:49.967063 [ 1 ] Application: Ready for connections. 2020.05.19 18:16:49.967279 [ 1 ] Prometheus: Config: status.metrics_interval = 15 2020.05.19 18:16:49.967309 [ 1 ] Prometheus: Disable sending metrics to prometheus, cause status.metrics_addr is not set! 2020.05.19 18:16:49.967633 [ 1 ] Prometheus: Metrics Port = 8234 2020.05.19 18:16:49.967813 [ 1 ] ClusterManagerService: Registered timed cluster manager task at rate 5 seconds

============================= sytemtct 启动tiflash 和 ./run_tiflash 也无效

============================= tiflash配置采用默认配置 ,没有修改任何参数

看报错, error [Errno 111] Connection refused ,请检查端口是否放通,是否有防火墙,多谢。

tiflash 3台机器上没有防火墙,没有端口限制,在tiflash机器上执行 ./run_tiflash 无法启动8123 端口服务,应该不是端口问题

通过 sytectl status flash服务状态为running 过一会就down了,然后周期性地 running ~down ~running ~down

  1. 麻烦使用 ansible 启动时 加上 -vvv 参数,反馈中控机 部署目录/log下的ansible.log 日志,多谢
  2. 手工尝试启动tiflash,长传 tiflash.log 日志,多谢

ansible.log (71.7 KB) tiflash.log (1.0 MB)

  1. 请问是在什么环境安装的,是docker吗?自己单机测试吗? 看 ip 好像都是 0.0.0.0,似乎链接不通

Prometheus: Disable sending metrics to prometheus, cause status.metrics_addr is not set

  1. 扩容时 tiflash 配置如何? 能否上传配置文件,多谢

把tiflsh.toml中的listen_port改为服务器的ip也不行

tiflash.toml (1.3 KB) tiflash-learner.toml (521 字节)

阿里云服务器 配置 32G 4核8线程 SAS磁盘

请问您有配置 enable-placement-rules 参数吗?

https://pingcap.com/docs-cn/stable/tiflash/deploy-tiflash/#在原有-tidb-集群上新增-tiflash-组件

这两个文件在哪啊

你好,此问题久远,可以开新帖讨论下