h5n1
(H5n1)
2023 年1 月 10 日 13:21
1
接着 上一个帖子 k8s部署tidb pvc不能绑定 - #4,来自 h5n1 的问题:
在apply 集群配置文件后 pd 和tikv running 了,但是tidb pod 没有创建,该如何分析?
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: tidb-test-cluster
namespace: default
spec:
timezone: UTC
configUpdateStrategy: RollingUpdate
imagePullPolicy: Always
helper:
image: alpine:3.16.0
pvReclaimPolicy: Retain
discovery: {}
enableDynamicConfiguration: true
pd:
baseImage: 10.172.49.246/zongbu-sre/pd-arm64:v6.1.2
config: |
[dashboard]
internal-proxy = true
replicas: 3
maxFailoverCount: 0
requests:
cpu: 2000m
memory: 12Gi
storage: 100Gi
limits:
cpu: 2000m
memory: 12Gi
storage: 100Gi
storageClassName: "pd-ssd-storage"
tidb:
baseImage: 10.172.49.246/zongbu-sre/tidb-arm64:v6.1.2
config: |
[performance]
tcp-keep-alive = true
replicas: 3
maxFailoverCount: 0
requests:
cpu: 2000m
memory: 16Gi
storage: 100Gi
limits:
cpu: 2000m
memory: 16Gi
storage: 100Gi
service:
type: "ClusterIP"
storageClassName: "tidb-storage"
tikv:
baseImage: 10.172.49.246/zongbu-sre/tikv-arm64:v6.1.2
config: |
log-level = "info"
replicas: 6
maxFailoverCount: 0
requests:
cpu: 2000m
memory: 32Gi
storage: 500Gi
limits:
cpu: 2000m
memory: 32Gi
storage: 500Gi
storageClassName: "tikv-ssd-storage"
I0110 13:49:42.667328 1 tidb_cluster_controller.go:131] TidbCluster: default/tidb-test-cluster, still need sync: TidbCluster: [default/tidb-test-cluster], waiting for TiKV cluster running, requeuing
I0110 13:49:49.794007 1 tikv_member_manager.go:834] TiKV of Cluster default/tidb-test-cluster not bootstrapped yet
I0110 13:49:49.799414 1 tikv_member_manager.go:938] TiKV of Cluster default/tidb-test-cluster is not bootstrapped yet, no need to set store labels
I0110 13:49:49.800033 1 tidb_cluster_controller.go:131] TidbCluster: default/tidb-test-cluster, still need sync: TidbCluster: [default/tidb-test-cluster], waiting for TiKV cluster running, requeuing
I0110 13:50:19.811670 1 tikv_member_manager.go:834] TiKV of Cluster default/tidb-test-cluster not bootstrapped yet
I0110 13:50:19.818649 1 tikv_member_manager.go:938] TiKV of Cluster default/tidb-test-cluster is not bootstrapped yet, no need to set store labels
I0110 13:50:19.819283 1 tidb_cluster_controller.go:131] TidbCluster: default/tidb-test-cluster, still need sync: TidbCluster: [default/tidb-test-cluster], waiting for TiKV cluster running, requeuing
xfworld
(魔幻之翼)
2023 年1 月 10 日 14:14
2
kubectl describe tidbclusters -n ${namespace} ${cluster_name}
h5n1
(H5n1)
2023 年1 月 11 日 01:25
4
看上去像是tikv找不到pd
— tikv pod log:
[2023/01/11 01:16:20.611 +00:00] [INFO] [util.rs:587] ["connecting to PD endpoint"] [endpoints=http://tidb-test-cluster-pd:2379]
[2023/01/11 01:16:22.611 +00:00] [INFO] [util.rs:549] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://tidb-test-cluster-pd:2379]
[2023/01/11 01:16:22.611 +00:00] [WARN] [client.rs:163] ["validate PD endpoints failed"] [err="Other(\"[components/pd_client/src/util.rs:582]: PD cluster failed to respond\")"]
[2023/01/11 01:16:22.912 +00:00] [INFO] [util.rs:587] ["connecting to PD endpoint"] [endpoints=http://tidb-test-cluster-pd:2379]
./pd-ctl store
Failed to get store: [500] "[PD:cluster:ErrNotBootstrapped]TiKV cluster not bootstrapped, please start TiKV first"
ps -ef|grep tikv
root 1 0 0 01:13 ? 00:00:00 /tikv-server --pd=http://tidb-test-cluster-pd:2379 --advertise-addr=tidb-test-cluster-tikv-5.tidb-test-cluster-tikv-peer.default.svc:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --advertise-status-addr=tidb-test-cluster-tikv-5.tidb-test-cluster-tikv-peer.default.svc:20180 --data-dir=/var/lib/tikv --capacity=500GB --config=/etc/tikv/tikv.toml
kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 192.168.0.1 <none> 443/TCP 2d17h
default tidb-test-cluster-discovery ClusterIP 192.168.104.90 <none> 10261/TCP,10262/TCP 11h
default tidb-test-cluster-pd ClusterIP 192.168.153.231 <none> 2379/TCP 11h
default tidb-test-cluster-pd-peer ClusterIP None <none> 2380/TCP,2379/TCP 11h
default tidb-test-cluster-tikv-peer ClusterIP None <none> 20160/TCP 11h
kube-system kube-dns ClusterIP 192.168.0.222 <none> 53/UDP,53/TCP,9153/TCP 5d13h
kubectl describe svc tidb-test-cluster-pd
Name: tidb-test-cluster-pd
Namespace: default
Labels: app.kubernetes.io/component=pd
app.kubernetes.io/instance=tidb-test-cluster
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
app.kubernetes.io/used-by=end-user
Annotations: pingcap.com/last-applied-configuration:
{"ports":[{"name":"client","protocol":"TCP","port":2379,"targetPort":2379}],"selector":{"app.kubernetes.io/component":"pd","app.kubernetes...
Selector: app.kubernetes.io/component=pd,app.kubernetes.io/instance=tidb-test-cluster,app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/name=tidb-cluster
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 192.168.153.231
IPs: 192.168.153.231
Port: client 2379/TCP
TargetPort: 2379/TCP
Endpoints: 172.16.112.142:2379,172.16.228.152:2379,172.16.252.53:2379
Session Affinity: None
Events: <none>
xfworld
(魔幻之翼)
2023 年1 月 11 日 05:10
5
参考K8S 的网络配置和故障排查的部分看看,这个本身就十分复杂了…
h5n1
(H5n1)
2023 年1 月 11 日 08:32
6
已解决,动作很大 ,清理了iptable 重启主机 之前的pv/cluster删了重新弄一遍好,应该还是网络问题
h5n1
(H5n1)
关闭
2023 年3 月 12 日 08:32
7
此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。