tiflash 3930端口无法启动

【TiDB 版本】
v5.0.0-rc

【环境】
腾讯云主机
centos 7.x
x86_64
32g8c 200ssd

【问题描述】
按照官网部署单机集群,无法使用tiflash,数据可以正常同步,分析时会提示 9012



使用 tiup 命令将 tiflash 停止,然后通过 systemctl 命令重启一下,查看 3930 端口是否有异常。

cannot assign requested address 说明了可能有端口占用。
如果使用 netstat 没有发现端口占用,能否换一个端口。然后重启 tiflash。
日志的报错是 syscall 里面要binding 这个端口,但是失败了。

感谢您的回答,我尝试替换过其它端口也是失败的

可以检查一下用户及目录权限,如果都ok,可以把 tiflash 的日志发一下,另外,可以还是检查一下 tiflash 和 集群 tidb/pd等的端口通不通吧:[FAQ] bind: cannot assign requested address

以下是截取的tiflash.log片段,目录权限尝试放到最大也没用

[2021/02/10 10:54:07.215 +08:00] [INFO] [] [“SchemaBuilder: No schema change detected for table mysql(3).global_priv(7), not altering”] [thread_id=1]
[2021/02/10 10:54:07.215 +08:00] [INFO] [] [“SchemaBuilder: Altering table mysql(3).db(9)”] [thread_id=1]
[2021/02/10 10:54:07.215 +08:00] [INFO] [] [“SchemaBuilder: No schema change detected for table mysql(3).db(9), not altering”] [thread_id=1]
[2021/02/10 10:54:07.215 +08:00] [INFO] [] [“SchemaBuilder: Loaded all schemas.”] [thread_id=1]
[2021/02/10 10:54:07.215 +08:00] [INFO] [] [“SchemaSyncer: end sync schema, version has been updated to 24”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“RegionPersister: RegionPersister running in normal mode”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“PageStorage: RegionPersister begin to restore data from disk. [path=/tidb-data/tiflash-9000/kvstore] [num_writers=4]”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“PageStorage: RegionPersister restore 0 pages, write batch sequence: 0, 0 puts and 0 refs and 0 deletes and 0 upserts”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“KVStore: Restored 0 regions. “] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“RegionTable: Start to restore”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“RegionTable: Restore 0 tables”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“BackgroundService: Configuration raft.disable_bg_flush is set to true, background flush tasks are disabled.”] [thread_id=1]
[2021/02/10 10:54:07.216 +08:00] [INFO] [] [“FlashService: Use a thread pool with 16 threads to handle cop requests.”] [thread_id=1]
[2021/02/10 10:54:07.220 +08:00] [INFO] [] [“FlashService: Use a thread pool with 16 threads to handle batch cop requests.”] [thread_id=1]
[2021/02/10 10:54:07.224 +08:00] [INFO] [] [“Application: Flash service registered”] [thread_id=1]
[2021/02/10 10:54:07.224 +08:00] [INFO] [] [“Application: Diagnostics service registered”] [thread_id=1]
[2021/02/10 10:54:07.225 +08:00] [INFO] [] [“grpc: /root/grpc/src/cpp/server/server_builder.cc, line number : 309, log msg : Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000”] [thread_id=1]
[2021/02/10 10:54:07.225 +08:00] [ERROR] [] [“grpc: /root/grpc/src/core/ext/transport/chttp2/server/insecure/server_chttp2.cc, line number : 40, log msg : {“created”:”@1612925647.225161693”,“description”:“No address added out of total 1 resolved”,“file”:”/root/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc",“file_line”:394,“referenced_errors”:[{“created”:"@1612925647.225160064",“description”:“Unable to configure socket”,“fd”:57,“file”:"/root/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc",“file_line”:217,“referenced_errors”:[{“created”:"@1612925647.225156584",“description”:“Cannot assign requested address”,“errno”:99,“file”:"/root/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc",“file_line”:190,“os_error”:“Cannot assign requested address”,“syscall”:“bind”}]}]}"] [thread_id=1]
[2021/02/10 10:54:07.225 +08:00] [INFO] [] [“Application: Flash grpc server listening on [xxx.xxx.xxx.xxx:4005]”] [thread_id=1]
[2021/02/10 10:54:07.225 +08:00] [INFO] [] [“Application: Listening http://0.0.0.0:8123”] [thread_id=1]
[2021/02/10 10:54:07.227 +08:00] [INFO] [] [“Application: Listening tcp: 0.0.0.0:9000”] [thread_id=1]
[2021/02/10 10:54:07.228 +08:00] [INFO] [] [“Application: Available RAM = 31.26 GiB; physical cores = 8; threads = 8.”] [thread_id=1]
[2021/02/10 10:54:07.228 +08:00] [INFO] [] [“Application: Ready for connections.”] [thread_id=1]
[2021/02/10 10:54:07.229 +08:00] [INFO] [] [“Prometheus: Config: status.metrics_interval = 15”] [thread_id=1]
[2021/02/10 10:54:07.229 +08:00] [INFO] [] [“Prometheus: Disable prometheus push mode, cause status.metrics_addr is not set!”] [thread_id=1]
[2021/02/10 10:54:07.230 +08:00] [INFO] [] [“Prometheus: Enable prometheus pull mode; Metrics Port = 8234”] [thread_id=1]
[2021/02/10 10:54:07.231 +08:00] [INFO] [] [“ClusterManagerService: Registered timed cluster manager task at rate 5 seconds”] [thread_id=1]
[2021/02/10 10:54:07.231 +08:00] [INFO] [] [“Application: let tiflash proxy start all services”] [thread_id=1]
[2021/02/10 10:54:07.831 +08:00] [INFO] [] [“Application: proxy is ready to serve, try to wake up all region leader by sending read index request”] [thread_id=1]
[2021/02/10 10:54:07.832 +08:00] [INFO] [] [“Application: start to wait for terminal signal”] [thread_id=1]

目前来说只有以下这个端口是没有起来的,其它都可以telnet,我按楼上说法换成了其它端口4005也起不来,感谢您的回复

可以看一下集群的服务器有没有开防火墙之类的操作,检查端口,是看 tiflash 到 tidb 及 pd tikv 的端口通不通

感谢您的回复,防火墙什么的在调试环境下全部关闭,您说的tiflash到tidb之间的端口通不通如何检查,这些端口在一台机子上,目前是可以telnet的

这个错误日志提示很明显,所以建议你看 系统日志,或着看看有没有 tiflash—tikv,log 这个日志,看看这个日志里的内容是什么

tiflash_tikv.log 并没有出现error相关的东西,就是一些连接到pd的操作

:joy:根据你的回答是 tiflash 与 tidb/pd/tikv 之间的网络端口都是通的,权限也不会有问题,你这个是新部署的 tiflash 吗?能整理一下你的部署环境及部署时的配置文件吗

感谢您的回答~
腾讯云主机 单机模拟集群
8c32g 200ssd
架构 x86_64
centos-release-7-5.1804.el7.centos.x86_64
防火墙已关闭,安全配置端口已全部开发,可公网访问

topo.yaml 配置与 https://docs.pingcap.com/zh/tidb/v5.0/quick-start-with-tidb#第二种使用-tiup-cluster-在单机上模拟生产环境部署步骤
此官网部署文档完全一致,ip换了而已。

我之前试过虚拟机部署4.0.9的是可以的配置比这个还低

可以看下系统日志吗?message日志是否有告警信息。

感谢您的回复,从日志看好像没啥告警或错误

辛苦使用 TiDB 检查一下当前 TiFlash 记录的副本情况,是否包含 MySQL schema 下的元信息。

SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'mysql' ;

感谢您的回复,表副本是正常的

本人可提供一台小的云主机方便您这边排查, 问题是可以复现的,如需请加下微信 MOHN581

想确认一下,是只有 test.student 设置了 tiflash 副本嘛?还有其他的 schema 吗?

是的,我只是测试,只建了一个表