Tidb二进制部署一直失败

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

  • 系统版本 & kernel 版本

Ubuntu 16.04.6 LTS \n \l 4.4.0-166-generic

  • TiDB 版本】 2.1.6 3.0.0-beta.1 latest
  • 磁盘型号】 nvme ssd
  • 集群节点分布】 一体机,不是集群
  • 数据量 & region 数量 & 副本数
  • 问题描述(我做了什么)
  • 关键词】 ./bin/pd-server --client-urls=http://127.0.0.1:2379 --data-dir=/data/pd --log-file=/data/logs/pd.log & ./bin/tikv-server --pd=127.0.0.1:2379 -A=127.0.0.1:20160 -s=/data/tikv --log-file=/data/logs/tikv.log & ./bin/tidb-server -P=4000 --store=tikv --path=127.0.0.1:2379 --host=127.0.0.1 --advertise-address=127.0.0.1 --log-file=/data/logs/tidb.log & 这三个命令启动的server,一直有问题,互联不上 vim /data/logs/tikv.log [2019/11/06 16:07:35.869 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:35.874 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:36.174 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:36.179 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:36.479 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:36.483 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:36.784 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:36.789 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:37.089 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:37.093 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:37.394 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:37.398 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:37.698 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:37.703 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.003 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.007 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.007 +08:00] [WARN] [client.rs:55] [“validate PD endpoints failed”] [err=“Other(”[src/pd/util.rs:388]: PD cluster failed to respond")"] [2019/11/06 16:07:38.307 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.314 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.614 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.619 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.919 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:38.923 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:39.224 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:39.229 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:39.529 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:39.533 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:39.833 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:39.838 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:40.138 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:40.143 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:40.443 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379] [2019/11/06 16:07:40.448 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379]

2.tidb.log [2019/11/06 16:02:41.286 +08:00] [WARN] [backoff.go:313] [“pdRPC backoffer.maxSleep 20000ms is exceeded, errors:\nregion not found for key “mBootstra\xffpKey\x00\x00\x00\x00\xfb\x00\x00\x00\x00\x00\x00\x00s” at 2019-11-06T16:02:36.890887658+08:00\nregion not found for key “mBootstra\xffpKey\x00\x00\x00\x00\xfb\x00\x00\x00\x00\x00\x00\x00s” at 2019-11-06T16:02:38.707355819+08:00\nregion not found for key “mBootstra\xffpKey\x00\x00\x00\x00\xfb\x00\x00\x00\x00\x00\x00\x00s” at 2019-11-06T16:02:41.286892515+08:00”] [2019/11/06 16:02:41.287 +08:00] [FATAL] [session.go:1657] [“check bootstrapped failed”] [error="[tikv:9001]PD server timeout"] [errorVerbose="[tikv:9001]PD server timeout\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/workspace/build_tidb_master/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/jenkins/workspace/build_tidb_master/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/tidb/store/tikv.(*RegionCache).loadRegion\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:585\ngithub.com/pingcap/tidb/store/tikv.(*RegionCache).findRegionByKey\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:339\ngithub.com/pingcap/tidb/store/tikv.(*RegionCache).LocateKey\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/store/tikv/region_cache.go:299\ngithub.com/pingcap/tidb/store/tikv.(*tikvSnapshot).get\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:251\ngithub.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Get\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:226\ngithub.com/pingcap/tidb/kv.(*unionStore).Get\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/kv/union_store.go:194\ngithub.com/pingcap/tidb/store/tikv.(*tikvTxn).Get\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/store/tikv/txn.go:133\ngithub.com/pingcap/tidb/structure.(*TxStructure).Get\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/structure/string.go:35\ngithub.com/pingcap/tidb/structure.(*TxStructure).GetInt64\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/structure/string.go:44\ngithub.com/pingcap/tidb/meta.(*Meta).GetBootstrapVersion\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/meta/meta.go:691\ngithub.com/pingcap/tidb/session.getStoreBootstrapVersion.func1\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1652\ngithub.com/pingcap/tidb/kv.RunInNewTxn\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/kv/txn.go:50\ngithub.com/pingcap/tidb/session.getStoreBootstrapVersion\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1649\ngithub.com/pingcap/tidb/session.BootstrapSession\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1493\nmain.createStoreAndDomain\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/tidb-server/main.go:205\nmain.main\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/tidb-server/main.go:171\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"] [stack=“github.com/pingcap/tidb/session.getStoreBootstrapVersion\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1657\ngithub.com/pingcap/tidb/session.BootstrapSession\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1493\nmain.createStoreAndDomain\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/tidb-server/main.go:205\nmain.main\n\t/home/jenkins/workspace/build_tidb_master/go/src/github.com/pingcap/tidb/tidb-server/main.go:171\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200”]

3.pd能正常启动

求助怎么解决?上述的三个版本都试了,都不行

集群的拓扑结构是什么样的?几个 PD/TIVK/TIDB? 看错误是连不上 PD。PD 的日志贴一下(可否将日志格式排版一下。谢谢。)另外,从版本上看2.1.6 3.0.0-beta.1 latest 是一个beta 版本,不是 release 版本。如果是测试用,建议用最新版本。

1.我重新安装的操作体统Ubuntu16.04.6 64bit kernel:4.15.0-45-generic(系统自带) 用的是最新版本:4.0.0-alpha 直接运行的命令如下:
wget http://download.pingcap.org/tidb-latest-linux-amd64.tar.gz
wget http://download.pingcap.org/tidb-latest-linux-amd64.sha256
sha256sum -c tidb-latest-linux-amd64.sha256
tar -xzf tidb-latest-linux-amd64.tar.gz
cd tidb-latest-linux-amd64
启动PD
./bin/pd-server --data-dir=/data/pd --log-file=/data/logs/pd.log &
启动tikv
./bin/tikv-server --pd=“127.0.0.1:2379” --data-dir=/data/tikv --log-file=/data/logs/tikv.log &
启动tidb
./bin/tidb-server --store=tikv --path=“127.0.0.1:2379” --log-file=/data/logs/tidb.log &
搭建的是一体机,不是集群,一台PC上启动1个PD,1个tikv,1个tidb.
2.PD 没有error:log信息如下: [2019/11/07 11:17:32.398 +08:00] [INFO] [server.go:175] [“create etcd v3 client”] [endpoints="[http://127.0.0.1:2379]"]
[2019/11/07 11:17:32.398 +08:00] [INFO] [capability.go:75] [“enabled capabilities for version”] [cluster-version=3.3]
[2019/11/07 11:17:32.398 +08:00] [INFO] [server.go:2327] [“cluster version is updated”] [cluster-version=3.3] [2019/11/07 11:17:32.398 +08:00] [INFO] [serve.go:139] [“serving client traffic insecurely; this is strongly discouraged!”] [address=127.0.0.1:2379]
[2019/11/07 11:17:32.399 +08:00] [INFO] [server.go:215] [“init cluster id”] [cluster-id=6756398677212733607]
[2019/11/07 11:17:32.401 +08:00] [WARN] [history_buffer.go:138] [“load history index failed”] [error=“leveldb: not found”]
[2019/11/07 11:17:32.401 +08:00] [INFO] [history_buffer.go:146] [“start from history index”] [start-index=0] [2019/11/07 11:17:32.401 +08:00] [INFO] [namespace_classifier.go:461] [“load namespaces information”] [namespace-count=0] [cost=190.616µs]
[2019/11/07 11:17:32.401 +08:00] [INFO] [server.go:847] [“start to campaign leader”] [campaign-leader-name=pd-sqa-OptiPlex-9020]
[2019/11/07 11:17:32.402 +08:00] [INFO] [server.go:864] [“campaign leader ok”] [campaign-leader-name=pd-sqa-OptiPlex-9020]
[2019/11/07 11:17:32.402 +08:00] [INFO] [tso.go:131] [“sync and save timestamp”] [last=1754/08/30 22:43:41.128 +00:00] [save=2019/11/07 11:17:35.402 +08:00] [next=2019/11/07 11:17:32.402 +08:00] [2019/11/07 11:17:32.402 +08:00] [INFO] [server.go:946] [“server enable region storage”]
[2019/11/07 11:17:32.402 +08:00] [INFO] [util.go:90] [“load cluster version”] [cluster-version=0.0.0]
[2019/11/07 11:17:32.402 +08:00] [INFO] [server.go:890] [“PD cluster leader is ready to serve”] [leader-name=pd-sqa-OptiPlex-9020]
3.tikv-server 的log信息如下:
[2019/11/07 11:17:47.970 +08:00] [INFO] [util.rs:396] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379]
[2019/11/07 11:17:47.970 +08:00] [INFO] [] [“Connecting to server 127.0.0.1:2379 via HTTP proxy ipv4:109.105.113.200:8080”]
[2019/11/07 11:17:47.973 +08:00] [INFO] [] [“New connected subchannel at 0x7fcd4f036600 for subchannel 0x7fcd4ec27000”]
[2019/11/07 11:17:47.974 +08:00] [INFO] [util.rs:356] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: RpcStatusCode(14), details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379]
4.试了好多次了,我几个同事的情况和我一样,望能解答,多谢

我已经排版了,你们这貌似对于回车换行符不识别,一提交就是乱的

你们的帖子只识别空格,换行符不识别,一个个空格打出来的,望你们前端工程师改下哦

在部署时,是一套 tidb 集群中只有一个 pd 节点,一个 tikv 节点和一个 tidb 节点吗?

如果日志粘贴比较费力,可以上传脚本文件的哈,可以把完整的 tidb,tikv 以及 pd 的日志上传下谢谢

对的,我们是一台机器上部署的
我们公司安保不能上传文件,

想问下,上述是什么问题,全都是默认的配置,没修改任何东西,

1、一套 tidb 集群中只有一个 pd 节点,一个 tikv 节点和一个 tidb 节点,可用于测试。生产环境建议 3个 pd,3个 tikv 以及 2个 tidb 节点

2、4.0 版本尚未 正式发布,建议使用 3.0.5 版本,以及通过 ansible 进行部署,如果有网络安全要求,可以在连网的服务器上下载目标安装包后,离线安装

3、请用 pd-ctl 检查下 health 和 leader show,pd-ctl 的用法参照下述链接:

3.0.5的二进制包怎么下载?

网上只有下载最近二进制包的链接

可以根据需求下载指定的版本,下述截图中的 $tag 可以替换为相应版本,形如 v3.0.2,则为 3.0.2 版本

1。我这边试了3.0.5的版本一样的问题:
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:26] [“Welcome to TiKV.”] [2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:28] []
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:28] [“Release Version: 3.0.5”]
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:28] [“Git Commit Hash: 01c872bf105dc68dda346ceda087e994c56e2702”]
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:28] [“Git Commit Branch: HEAD”]
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:28] [“UTC Build Time: 2019-10-25 01:03:08”]
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:28] [“Rust Version: rustc 1.37.0-nightly (0e4a56b4b 2019-06-13)”]
[2019/11/07 15:04:45.557 +08:00] [INFO] [mod.rs:30] [] @
[2019/11/07 15:09:29.770 +08:00] [INFO] [util.rs:397] [“connecting to PD endpoint”] [endpoints=127.0.0.1:2379]
[2019/11/07 15:09:29.772 +08:00] [INFO] [http_connect_handshaker.cc:300] [“Connecting to server 127.0.0.1:2379 via HTTP proxy ipv4:109.105.113.200:8080”]
[2019/11/07 15:09:29.774 +08:00] [INFO] [subchannel.cc:841] [“New connected subchannel at 0x7f27a1c352d0 for subchannel 0x7f27a1825800”]
[2019/11/07 15:09:29.776 +08:00] [INFO] [util.rs:357] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { status: Unavailable, details: Some(“Trying to connect an http1.x server”) }))”] [endpoints=127.0.0.1:2379]

2.你发的链接:https://pingcap.com/docs-cn/v3.0/how-to/deploy/orchestrated/offline-ansible/
该离线包仅支持 CentOS 7 系统
而我们用的是Ubuntu
3.继续求解~~

不仅是离线部署,在线部署也强烈建议使用 CentOS 7.3 及也是版本。因为没有在 Ubuntu 测试过,也没有针对这个平台专门做过优化,性能可能达不到预期。

如果还报那个错,请用 pd-ctl 检查下 health 和 leader show,pd-ctl 的用法参照下述链接,可以看下往来记录~~~