K8s手动扩容tikv,但新增tikv处于running却没有分担写入流量

  • 系统版本 & kernel 版本 】Linux gke-tidb-default-pool-c644815e-mfdm 4.14.137+ #1 SMP Thu Aug 8 02:47:02 PDT 2019 x86_64 Intel® Xeon® CPU @ 2.20GHz GenuineIntel GNU/Linux

  • TiDB 版本 】Container image “pingcap/tikv:v3.0.4”

  • 磁盘型号 】单个虚拟机的启动磁盘1个100G,SSD磁盘两个,分别为1G和10G

  • 集群节点分布 】3个vm节点扩展至4个vm节点

    NAME READY STATUS RESTARTS AGE

demo-discovery-76d999b748-bgsk9 1/1 Running 0 25h

demo-monitor-648447d8f-sj7pp      3/3     Running   0          25h
demo-pd-0                         1/1     Running   0          25h
demo-pd-1                         1/1     Running   1          25h
demo-pd-2                         1/1     Running   0          25h
demo-tidb-0                       2/2     Running   0          25h
demo-tikv-0                       1/1     Running   0          17h
demo-tikv-1                       1/1     Running   0          6h15m
demo-tikv-2                       1/1     Running   0          25h
demo-tikv-3                       1/1     Running   0          12h
  • 数据量 & region 数量 & 副本数 】只有单库单表3亿记录
  • 问题描述(我做了什么) 】 通过GKE手动将tikv副本数增加1(3->4),然后通过loader新增数据。但新tikv没有接收数据,存储目录空间占用几乎为0。

这个可以通过pd-ctl工具查看一下store的信息以及scheluler的信息

pd-ctl使用手册:

kubectl exec -it demo-pd-0 -n tidb ./pd-ctl scheduler show

[ “balance-region-scheduler”, “balance-leader-scheduler”, “balance-hot-region-scheduler”, “label-scheduler” ]

kubectl exec -it demo-pd-0 -n tidb ./pd-ctl store

只见3个store,新增tikv的store没有出现?要怎样做呢?

{ “count”: 3, “stores”: [ { “store”: { “id”: 1, “address”: “demo-tikv-1.demo-tikv-peer.tidb.svc:20160”, “labels”: [ { “key”: “host”, “value”: “gke-tidb-default-pool-6dbef19b-m8c3” } ], “version”: “3.0.4”, “state_name”: “Up” }, “status”: { “capacity”: “98 GiB”, “available”: “37 GiB”, “leader_count”: 391, “leader_weight”: 1, “leader_score”: 32127, “leader_size”: 32127, “region_count”: 1119, “region_weight”: 1, “region_score”: 99938371.61398745, “region_size”: 97473, “start_ts”: “2019-10-30T19:49:04Z”, “last_heartbeat_ts”: “2019-10-31T02:22:37.393123213Z”, “uptime”: “6h33m33.393123213s” } }, { “store”: { “id”: 4, “address”: “demo-tikv-2.demo-tikv-peer.tidb.svc:20160”, “labels”: [ { “key”: “host”, “value”: “gke-tidb-default-pool-6dbef19b-tf9s” } ], “version”: “3.0.4”, “state_name”: “Up” }, “status”: { “capacity”: “98 GiB”, “available”: “36 GiB”, “leader_count”: 384, “leader_weight”: 1, “leader_score”: 32326, “leader_size”: 32326, “region_count”: 1119, “region_weight”: 1, “region_score”: 171986969.03162336, “region_size”: 97473, “start_ts”: “2019-10-30T00:15:48Z”, “last_heartbeat_ts”: “2019-10-31T02:22:41.437817882Z”, “uptime”: “26h6m53.437817882s” } }, { “store”: { “id”: 5, “address”: “demo-tikv-0.demo-tikv-peer.tidb.svc:20160”, “labels”: [ { “key”: “host”, “value”: “gke-tidb-default-pool-6dbef19b-scd3” } ], “version”: “3.0.4”, “state_name”: “Up” }, “status”: { “capacity”: “98 GiB”, “available”: “36 GiB”, “leader_count”: 344, “leader_weight”: 1, “leader_score”: 33020, “leader_size”: 33020, “region_count”: 1119, “region_weight”: 1, “region_score”: 193298218.4277382, “region_size”: 97473, “start_ts”: “2019-10-30T08:50:35Z”, “last_heartbeat_ts”: “2019-10-31T02:22:37.871190962Z”, “uptime”: “17h32m2.871190962s” } } ] }

能描述一下你扩容的具体步骤吗,是通过helm upgrade完成的吗

没有通过helm。

在GKE的管理界面上先扩容了k8s的主机(3->4),然后把tikv的replicas加一(3->4)。

必须通过helm方式扩容吗?

建议使用helm方式部署扩容TiDB集群,helm管理集群比较方便,正常TiKV扩容时需要会有一些rolling_update的操作,直接replicas加一的方式只是扩了数量,新TiKV节点并没有加入集群中

根据建议,使用helm upgrade,并作调整后,新增tikv可用。

步骤: 1、修改values.yaml中的replicas数量,执行helm upgrade 2、执行helm delete 3、执行helm rollback 4、删除新tikv的pvc 5、删除新tikv的pod,等系统重建pod完毕,恢复正常