【 TiDB 使用环境】生产环境 /测试/ Poc
测试, 单机playground
【 TiDB 版本】
8.1.1
tiup --version
1.16.1 tiup
【复现路径】做过哪些操作出现的问题
- 用tiup playground 设置一个3 pd, 3 tikv的环境
sudo tiup plyaground v8.1.1 --tag firsttest --db 2 --pd 3 --kv 3
- 使用pfctl 创建filter rule去阻止tikv-1给pd-0发heartbeat
【遇到的问题:问题现象及影响】
- tikv-1仍然能给pd发送heartbeat
【细节】
- pd-0 是 leader
"leader": {
"name": "pd-0",
"member_id": 3474484975246189105,
"peer_urls": [
"http://127.0.0.1:2380"
],
"client_urls": [
"http://127.0.0.1:2379"
],
"binary_version": "v8.1.1",
},
"etcd_leader": {
"name": "pd-0",
"member_id": 3474484975246189105,
"peer_urls": [
"http://127.0.0.1:2380"
],
"client_urls": [
"http://127.0.0.1:2379"
],
"binary_version": "v8.1.1",
}
- tikv-1 细节
"store": {
"id": 1,
"address": "127.0.0.1:20161",
"version": "8.1.1",
"peer_address": "127.0.0.1:20161",
"status_address": "127.0.0.1:20181",
"start_timestamp": 1736194461,
"last_heartbeat": 1736320282740911000,
"node_state": 1,
"state_name": "Up"
},
- 用
ps aux
找到process id 并用lsof
找到所有的connection
ps aux | grep tikv-1
user 33428 0.4 0.3 412985600 92144 s009 S+ Mon12PM 30:04.88 /.tiup/components/tikv/v8.1.1/tikv-server --addr=127.0.0.1:20161 --advertise-addr=127.0.0.1:20161 --status-addr=127.0.0.1:20181 --pd-endpoints=http://127.0.0.1:2379,http://127.0.0.1:2382,http://127.0.0.1:2384 --config=/.tiup/data/firsttest/tikv-1/tikv.toml ... (省去部分output)
lsof -i -n -P | grep 33428
tikv-serv 33428 user 18u IPv4 0x629a0579e5999c86 0t0 TCP 127.0.0.1:20181->127.0.0.1:49222 (ESTABLISHED)
tikv-serv 33428 user 55u IPv6 0xc9b1c270dedb1bec 0t0 TCP 127.0.0.1:20161 (LISTEN)
tikv-serv 33428 user 56u IPv6 0x452e11a7665d5336 0t0 TCP 127.0.0.1:20161 (LISTEN)
tikv-serv 33428 user 57u IPv6 0x48f1bfb5c9155498 0t0 TCP 127.0.0.1:20161 (LISTEN)
tikv-serv 33428 user 58u IPv6 0x376bbcbb10ebeca8 0t0 TCP 127.0.0.1:20161 (LISTEN)
tikv-serv 33428 user 59u IPv6 0xeeab296337f6f7fd 0t0 TCP 127.0.0.1:20161 (LISTEN)
tikv-serv 33428 user 64u IPv4 0x91c8d6e2d83f2dea 0t0 TCP 127.0.0.1:20181 (LISTEN)
tikv-serv 33428 user 69u IPv6 0xeeb0f9a688bd19e0 0t0 TCP 127.0.0.1:20161->127.0.0.1:51040 (ESTABLISHED)
tikv-serv 33428 user 70u IPv6 0x4498c360963fc658 0t0 TCP 127.0.0.1:20161->127.0.0.1:51011 (ESTABLISHED)
tikv-serv 33428 user 71u IPv6 0x3f857b732fd2d83a 0t0 TCP 127.0.0.1:20161->127.0.0.1:51044 (ESTABLISHED)
tikv-serv 33428 user 72u IPv6 0xc4ebeabfe8984179 0t0 TCP 127.0.0.1:20161->127.0.0.1:51042 (ESTABLISHED)
tikv-serv 33428 user 73u IPv6 0x1e34342da2a27806 0t0 TCP 127.0.0.1:20161->127.0.0.1:51073 (ESTABLISHED)
tikv-serv 33428 user 74u IPv6 0xbda78c9566cad99d 0t0 TCP 127.0.0.1:51074->127.0.0.1:20162 (ESTABLISHED)
tikv-serv 33428 user 75u IPv6 0x54ffb107eabfdb11 0t0 TCP 127.0.0.1:20161->127.0.0.1:51047 (ESTABLISHED)
tikv-serv 33428 user 76u IPv6 0x69c9fa27d1819051 0t0 TCP 127.0.0.1:20161->127.0.0.1:51048 (ESTABLISHED)
tikv-serv 33428 user 77u IPv6 0xd8b93c8691c59183 0t0 TCP 127.0.0.1:51004->127.0.0.1:20162 (ESTABLISHED)
tikv-serv 33428 user 78u IPv6 0x4b2902bac3ed041b 0t0 TCP 127.0.0.1:51075->127.0.0.1:20160 (ESTABLISHED)
tikv-serv 33428 user 79u IPv6 0xb93bb62a7bd84782 0t0 TCP 127.0.0.1:20161->127.0.0.1:58136 (ESTABLISHED)
tikv-serv 33428 user 80u IPv6 0x97b4f9c124e3c08d 0t0 TCP 127.0.0.1:20161->127.0.0.1:58137 (ESTABLISHED)
tikv-serv 33428 user 81u IPv6 0x9ce90a5d190d55be 0t0 TCP 127.0.0.1:20161->127.0.0.1:51013 (ESTABLISHED)
tikv-serv 33428 user 82u IPv6 0x3d06e64321070d6e 0t0 TCP 127.0.0.1:20161->127.0.0.1:51035 (ESTABLISHED)
tikv-serv 33428 user 83u IPv6 0xc6da2051c9b5676f 0t0 TCP 127.0.0.1:51000->127.0.0.1:20160 (ESTABLISHED)
tikv-serv 33428 user 84u IPv6 0xf3b8d024c4578210 0t0 TCP 127.0.0.1:20161->127.0.0.1:51076 (ESTABLISHED)
tikv-serv 33428 user 85u IPv6 0x884a2e0e9d5e6bc1 0t0 TCP 127.0.0.1:20161->127.0.0.1:50998 (ESTABLISHED)
tikv-serv 33428 user 86u IPv4 0xe4888f09ea030252 0t0 TCP 127.0.0.1:20181->127.0.0.1:65178 (ESTABLISHED)
tikv-serv 33428 user 87u IPv6 0x3e240794137fe7f1 0t0 TCP 127.0.0.1:63606->127.0.0.1:2379 (ESTABLISHED)
tikv-serv 33428 user 88u IPv6 0xfc76d549b5e98997 0t0 TCP 127.0.0.1:20161->127.0.0.1:50999 (ESTABLISHED)
- 暴力地对所有的
ESTABLISHED
connection都加了filter rules,比如
block drop quick on lo0 inet proto tcp from 127.0.0.1 port = 63606 to 127.0.0.1 port = 2379
block drop quick on lo0 inet proto tcp from 127.0.0.1 port = 2379 to 127.0.0.1 port = 63606
block drop quick on lo0 inet6 proto tcp from ::1 port = 63606 to ::1 port = 2379
block drop quick on lo0 inet6 proto tcp from ::1 port = 2379 to ::1 port = 63606
(限于篇幅,不复制所有内容)
【描述】
我通过其他帖子学习到了tikv的状态变化,所以想在local环境下,通过模拟tikv断线,来深入学习disconnected, down,rejoin cluster这类的行为,但不知道为什么pf rule并不起作用,dashboard里面store仍然是up。我猜测要么是pf rule在这个情况下完全不起作用,要么是一些我还不了解的tikv/pd grpc 行为会自动recover,所以感知不到. 请问这里是我什么地方搞错了?
我本想看看grpc是怎么配置的,但由于对代码还不够熟悉,一时半会找不到相关的源码,如果有建议阅读的代码,也请告知