k8s version: v1.12.2
operator: v1.1.8
tidb version: v4.0.2
查看了operator-controller日志,pd日志和discovery日志,一步步定位,但是还是不太清楚哪里哪里不对,麻烦看下,谢谢!
controller.log
E1224 10:48:10.832995 1 pd_member_manager.go:183] failed to sync TidbCluster: [gejb-test/test-001-01]'s status, error: Get "http://test-001-01-pd.gejb-test:2379/pd/health": dial tcp 10.233.36.113:2379: connect: connection refused
E1224 10:48:11.458573 1 tidb_cluster_controller.go:123] TidbCluster: gejb-test/test-001-01, sync failed TidbCluster: gejb-test/test-001-01's pd status sync failed, can't failover, requeuing
pd.log
Address 1: 10.233.95.97 test-001-01-pd-0.test-001-01-pd-peer.gejb-test.svc.cluster.local
nslookup domain test-001-01-pd-0.test-001-01-pd-peer.gejb-test.svc.svc success
waiting for discovery service to return start args ...
waiting for discovery service to return start args ...
waiting for discovery service to return start args ...
waiting for discovery service to return start args ...
waiting for discovery service to return start args ...
waiting for discovery service to return start args ...
waiting for discovery service to return start args ...
discovery.log
I1224 09:44:57.897042 1 discovery.go:62] advertisePeerUrl is: test-001-01-pd-0.test-001-01-pd-peer.gejb-test.svc:2380
E1224 09:45:00.902199 1 server.go:68] failed to discover: test-001-01-pd-0.test-001-01-pd-peer.gejb-test.svc:2380
, Get "https://10.233.0.1:443/apis/pingcap.com/v1alpha1/namespaces/gejb-test/tidbclusters/test-001-01": dial tcp 10.233.0.1:443: connect: connection timed out
E1224 09:45:00.902258 1 server.go:70] failed to writeError: Get "https://10.233.0.1:443/apis/pingcap.com/v1alpha1/namespaces/gejb-test/tidbclusters/test-001-01": dial tcp 10.233.0.1:443: connect: connection timed out
我这边登录discovery的pod,执行
]# kubectl exec -it -n gejb-test test-001-01-discovery-5d9c476d87-hzxtw -- /bin/sh
/ # nc -nvv 10.233.0.1:443
nc: 10.233.0.1:443 (10.233.0.1:443): Operation timed out
sent 0, rcvd 0
/ # nc -nvv 10.226.132.106 6443
10.226.132.106 (10.226.132.106:6443) open #10.226.132.106是10.233.0.1clusterIP的endpoint
k8s的service信息
# kubectl describe service kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.233.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.226.132.106:6443,10.226.132.107:6443,10.226.132.108:6443
Session Affinity: None
Events: <none>
是不是哪里配置的不对,导致实例创建卡住了。