tiproxy高可用配置问题

CAI001 · 2025 年10 月 14 日 09:28

【TiDB 使用环境】生产环境 /测试/ Poc
101和102服务器部署两个tiproxy节点，使用vip实现高可用，vip是124，阿里云服务器部署。
使用vip连接时不时会出现以下情况

mysql> show databases;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    40
Current database: *** NONE ***

101和102分别部署了tiproxy并安装keepalived，各自配置如下
tiproxy分别增加以下配置

[ha]
interface = "eth0"
virtual-ip = "******.124/24"

keepalived 配置
master

[root@MWDB-tiproxy1 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr
   #vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    nopreempt
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    unicast_src_ip ******.101     # 本实例的私网IP地址，本示例配置为192.168.0.25
    unicast_peer {
        ******.102          # 对端实例的私网IP地址，本示例配置为192.168.0.26；如有多台备用ECS实例，需声明所有对端实例的IP，每个地址单独占一行，无需逗号或其他分隔符。
    }
    virtual_ipaddress {
         ******.124         # 虚拟IP地址，配置为HaVip的IP地址，本示例为192.168.0.24
    }   
    garp_master_delay 1       # 当切为主实例后多久更新ARP缓存，单位为秒
    garp_master_refresh 5     # 发送ARP报文的时间间隔，单位为秒

    track_interface {
        eth0                  # 绑定VIP的网卡，本示例配置为eth0
    }
}

backup

[root@MWDB-tiproxy2 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr
   #vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 55
    nopreempt
    priority 50
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    unicast_src_ip ******.102     # 本实例的私网IP地址，本示例配置为192.168.0.25
    unicast_peer {
        ******.101          # 对端实例的私网IP地址，本示例配置为192.168.0.26；如有多台备用ECS实例，需声明所有对端实例的IP，每个地址单独占一行，无需逗号或其他分隔符。
    }
    virtual_ipaddress {
        ******.124          # 虚拟IP地址，配置为HaVip的IP地址，本示例为192.168.0.24
    }   
   garp_master_delay 1       # 当切为主实例后多久更新ARP缓存，单位为秒
    garp_master_refresh 5     # 发送ARP报文的时间间隔，单位为秒

    track_interface {
        eth0                  # 绑定VIP的网卡，本示例配置为eth0
    }
}

TiDBer_HErMeXDz · 2025 年10 月 14 日 09:39

现在的java,.net.go都是使用长连接。。开启keepalived 的会话保持，应该可以解决这个问题。

CAI001 · 2025 年10 月 14 日 10:50

没起作用，我配的有问题吗？

virtual_server ********.124 6000 {
    delay_loop 6
    lb_algo rr                    # 负载均衡算法：轮询
    lb_kind DR                    # 推荐使用 DR 模式（高性能），见下方说明
    persistence_timeout 300       # <<< 开启会话保持，5分钟内同一客户端IP定向到同一台

    # 后端真实服务器：本机和对端的 TiProxy 服务
    real_server ********.101 6000 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            retry 3
            delay_before_retry 3
            connect_port 6000
        }
    }

    real_server ********.102 6000 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            retry 3
            delay_before_retry 3
            connect_port 6000
        }
    }
}

Miracle · 2025 年10 月 14 日 12:34

这个应该是空闲连接超时被自动kill了吧，看看下面的两个变量的值
show variables like ‘interactive_timeout’
show variables like ‘wait_timeout’