pump连接已经下线的drainer

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】

【概述】场景+问题概述

【背景】做过哪些操作
下线drainer节点(10.16.41.115:8249),然后在新的机器节点上部署drainer节点(10.16.40.242:8249),然后reload集群配置文件

【现象】业务和数据库现象
利用tiup cluster display xxx可以看到老的drainer节点已经下线,新的节点已经上线
但是pump一直启动失败,查看pump的日志发现其还是一直尝试连接已经下线的drainer节点

【业务影响】
pump无法启动,导致drainer同步不到数据
【TiDB 版本】
4.0.12

【附件】
image

  • 对应模块日志(包含问题前后1小时日志)
    pump日志
    [2021/07/13 23:23:02.815 +08:00] [INFO] [version.go:50] [“Welcome to Pump”] [“Release Version”=v4.0.12] [“Git Commit Hash”=e28b75cac81bea82c2a89ad024d1a37bf3c9bee9] [“Build TS”=“2021-04-02 03:26:43”] [“Go Version”=go1.13] [“Go OS/Arch”=linux/amd64]
    [2021/07/13 23:23:02.815 +08:00] [INFO] [main.go:48] [“start pump…”] [config=“{"log-level":"info","node-id":"10.16.40.229:8250","addr":"http://0.0.0.0:8250","advertise-addr":"http://10.16.40.229:8250","socket":"","pd-urls":"http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379","EtcdDialTimeout":5000000000,"data-dir":"/DATA/tidb/pump-8250","heartbeat-interval":2,"gc":"7","log-file":"/opt/tidb/deploy/pump-8250/log/pump.log","security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null},"gen-binlog-interval":3,"MetricsAddr":"","MetricsInterval":15,"storage":{"sync-log":null,"kv_chan_cap":0,"slow_write_threshold":0,"kv":null,"stop-write-at-available-space":null}}”]
    [2021/07/13 23:23:02.815 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:23:02.818 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:02.818 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:02.818 +08:00] [INFO] [server.go:131] [“get clusterID success”] [clusterID=6960939055123431353]
    [2021/07/13 23:23:02.818 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:23:02.822 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:02.822 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:02.823 +08:00] [INFO] [store.go:68] [“new store”] [path=“tikv://10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379?disableGC=true”]
    [2021/07/13 23:23:02.823 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379]”]
    [2021/07/13 23:23:02.825 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:02.825 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:02.828 +08:00] [INFO] [store.go:74] [“new store with retry success”]
    [2021/07/13 23:23:02.828 +08:00] [INFO] [storage.go:135] [NewAppendWithResolver] [options=“{"ValueLogFileSize":524288000,"Sync":true,"KVChanCapacity":1048576,"SlowWriteThreshold":1,"StopWriteAtAvailableSpace":10737418240,"KVConfig":null}”]
    [2021/07/13 23:23:02.830 +08:00] [INFO] [storage.go:1397] [“open metadata db”] [config=“{"block-cache-capacity":8388608,"block-restart-interval":16,"block-size":4096,"compaction-L0-trigger":8,"compaction-table-size":67108864,"compaction-total-size":536870912,"compaction-total-size-multiplier":8,"write-buffer":67108864,"write-L0-pause-trigger":24,"write-L0-slowdown-trigger":17}”]
    [2021/07/13 23:23:02.839 +08:00] [INFO] [storage.go:217] [“Append info”] [gcTS=0] [maxCommitTS=426295890185551874] [headPointer=“{"Fid":0,"Offset":95466}”] [handlePointer=“{"Fid":0,"Offset":95466}”]
    [2021/07/13 23:23:02.862 +08:00] [INFO] [server.go:438] [“register success”] [NodeID=10.16.40.229:8250]
    [2021/07/13 23:23:02.864 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:23:02.864 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:23:12.843 +08:00] [INFO] [storage.go:384] [DBStats] [DBStats=“{"WriteDelayCount":0,"WriteDelayDuration":0,"WritePaused":false,"AliveSnapshots":0,"AliveIterators":0,"IOWrite":1423,"IORead":3159,"BlockCacheSize":3037,"OpenedTablesCount":6,"LevelSizes":[2044,36326],"LevelTablesCounts":[6,1],"LevelRead":[0,0],"LevelWrite":[0,0],"LevelDurations":[0,0]}”]
    [2021/07/13 23:23:12.848 +08:00] [INFO] [server.go:567] [“server info tick”] [writeBinlogCount=0] [alivePullerCount=0] [MaxCommitTS=426295896791580674]
    [2021/07/13 23:23:12.865 +08:00] [ERROR] [main.go:75] [“start pump server failed”] [error=“fail to notify all living drainer: connect drainer(10.16.41.115:8249): context deadline exceeded”] [errorVerbose=“context deadline exceeded
    connect drainer(10.16.41.115:8249)
    github.com/pingcap/tidb-binlog/pump.notifyDrainer
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/node.go:206
    github.com/pingcap/tidb-binlog/pump.(*pumpNode).Notify
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/node.go:188
    github.com/pingcap/tidb-binlog/pump.(*Server).Start
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/server.go:443
    main.main
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/cmd/pump/main.go:74
    runtime.main
    \t/usr/local/go/src/runtime/proc.go:203
    runtime.goexit
    \t/usr/local/go/src/runtime/asm_amd64.s:1357
    fail to notify all living drainer”]
    [2021/07/13 23:23:28.064 +08:00] [INFO] [version.go:50] [“Welcome to Pump”] [“Release Version”=v4.0.12] [“Git Commit Hash”=e28b75cac81bea82c2a89ad024d1a37bf3c9bee9] [“Build TS”=“2021-04-02 03:26:43”] [“Go Version”=go1.13] [“Go OS/Arch”=linux/amd64]
    [2021/07/13 23:23:28.065 +08:00] [INFO] [main.go:48] [“start pump…”] [config=“{"log-level":"info","node-id":"10.16.40.229:8250","addr":"http://0.0.0.0:8250","advertise-addr":"http://10.16.40.229:8250","socket":"","pd-urls":"http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379","EtcdDialTimeout":5000000000,"data-dir":"/DATA/tidb/pump-8250","heartbeat-interval":2,"gc":"7","log-file":"/opt/tidb/deploy/pump-8250/log/pump.log","security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null},"gen-binlog-interval":3,"MetricsAddr":"","MetricsInterval":15,"storage":{"sync-log":null,"kv_chan_cap":0,"slow_write_threshold":0,"kv":null,"stop-write-at-available-space":null}}”]
    [2021/07/13 23:23:28.065 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:23:28.068 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:28.068 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:28.068 +08:00] [INFO] [server.go:131] [“get clusterID success”] [clusterID=6960939055123431353]
    [2021/07/13 23:23:28.068 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:23:28.070 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:28.070 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:28.071 +08:00] [INFO] [store.go:68] [“new store”] [path=“tikv://10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379?disableGC=true”]
    [2021/07/13 23:23:28.071 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379]”]
    [2021/07/13 23:23:28.073 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:28.073 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:28.075 +08:00] [INFO] [store.go:74] [“new store with retry success”]
    [2021/07/13 23:23:28.075 +08:00] [INFO] [storage.go:135] [NewAppendWithResolver] [options=“{"ValueLogFileSize":524288000,"Sync":true,"KVChanCapacity":1048576,"SlowWriteThreshold":1,"StopWriteAtAvailableSpace":10737418240,"KVConfig":null}”]
    [2021/07/13 23:23:28.077 +08:00] [INFO] [storage.go:1397] [“open metadata db”] [config=“{"block-cache-capacity":8388608,"block-restart-interval":16,"block-size":4096,"compaction-L0-trigger":8,"compaction-table-size":67108864,"compaction-total-size":536870912,"compaction-total-size-multiplier":8,"write-buffer":67108864,"write-L0-pause-trigger":24,"write-L0-slowdown-trigger":17}”]
    [2021/07/13 23:23:28.085 +08:00] [INFO] [storage.go:217] [“Append info”] [gcTS=0] [maxCommitTS=426295896791580674] [headPointer=“{"Fid":0,"Offset":95592}”] [handlePointer=“{"Fid":0,"Offset":95592}”]
    [2021/07/13 23:23:28.096 +08:00] [INFO] [server.go:438] [“register success”] [NodeID=10.16.40.229:8250]
    [2021/07/13 23:23:28.097 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:23:28.097 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:23:28.097 +08:00] [INFO] [node.go:213] [“Notifying drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:23:28.100 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:23:28.100 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:23:38.088 +08:00] [INFO] [storage.go:384] [DBStats] [DBStats=“{"WriteDelayCount":0,"WriteDelayDuration":0,"WritePaused":false,"AliveSnapshots":0,"AliveIterators":0,"IOWrite":1482,"IORead":3555,"BlockCacheSize":3870,"OpenedTablesCount":7,"LevelSizes":[2384,36326],"LevelTablesCounts":[7,1],"LevelRead":[0,0],"LevelWrite":[0,0],"LevelDurations":[0,0]}”]
    [2021/07/13 23:23:38.092 +08:00] [INFO] [server.go:567] [“server info tick”] [writeBinlogCount=0] [alivePullerCount=0] [MaxCommitTS=426295903410716673]
    [2021/07/13 23:23:38.098 +08:00] [ERROR] [main.go:75] [“start pump server failed”] [error=“fail to notify all living drainer: connect drainer(10.16.41.115:8249): context deadline exceeded”] [errorVerbose=“context deadline exceeded
    connect drainer(10.16.41.115:8249)
    github.com/pingcap/tidb-binlog/pump.notifyDrainer
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/node.go:206
    github.com/pingcap/tidb-binlog/pump.(*pumpNode).Notify
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/node.go:188
    github.com/pingcap/tidb-binlog/pump.(*Server).Start
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/server.go:443
    main.main
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/cmd/pump/main.go:74
    runtime.main
    \t/usr/local/go/src/runtime/proc.go:203
    runtime.goexit
    \t/usr/local/go/src/runtime/asm_amd64.s:1357
    fail to notify all living drainer”]
    [2021/07/13 23:23:53.315 +08:00] [INFO] [version.go:50] [“Welcome to Pump”] [“Release Version”=v4.0.12] [“Git Commit Hash”=e28b75cac81bea82c2a89ad024d1a37bf3c9bee9] [“Build TS”=“2021-04-02 03:26:43”] [“Go Version”=go1.13] [“Go OS/Arch”=linux/amd64]
    [2021/07/13 23:23:53.315 +08:00] [INFO] [main.go:48] [“start pump…”] [config=“{"log-level":"info","node-id":"10.16.40.229:8250","addr":"http://0.0.0.0:8250","advertise-addr":"http://10.16.40.229:8250","socket":"","pd-urls":"http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379","EtcdDialTimeout":5000000000,"data-dir":"/DATA/tidb/pump-8250","heartbeat-interval":2,"gc":"7","log-file":"/opt/tidb/deploy/pump-8250/log/pump.log","security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null},"gen-binlog-interval":3,"MetricsAddr":"","MetricsInterval":15,"storage":{"sync-log":null,"kv_chan_cap":0,"slow_write_threshold":0,"kv":null,"stop-write-at-available-space":null}}”]
    [2021/07/13 23:23:53.315 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:23:53.318 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:53.318 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:53.318 +08:00] [INFO] [server.go:131] [“get clusterID success”] [clusterID=6960939055123431353]
    [2021/07/13 23:23:53.318 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:23:53.321 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:53.321 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:53.321 +08:00] [INFO] [store.go:68] [“new store”] [path=“tikv://10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379?disableGC=true”]
    [2021/07/13 23:23:53.321 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379]”]
    [2021/07/13 23:23:53.323 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:23:53.323 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:23:53.325 +08:00] [INFO] [store.go:74] [“new store with retry success”]
    [2021/07/13 23:23:53.325 +08:00] [INFO] [storage.go:135] [NewAppendWithResolver] [options=“{"ValueLogFileSize":524288000,"Sync":true,"KVChanCapacity":1048576,"SlowWriteThreshold":1,"StopWriteAtAvailableSpace":10737418240,"KVConfig":null}”]
    [2021/07/13 23:23:53.327 +08:00] [INFO] [storage.go:1397] [“open metadata db”] [config=“{"block-cache-capacity":8388608,"block-restart-interval":16,"block-size":4096,"compaction-L0-trigger":8,"compaction-table-size":67108864,"compaction-total-size":536870912,"compaction-total-size-multiplier":8,"write-buffer":67108864,"write-L0-pause-trigger":24,"write-L0-slowdown-trigger":17}”]
    [2021/07/13 23:23:53.335 +08:00] [INFO] [storage.go:217] [“Append info”] [gcTS=0] [maxCommitTS=426295903410716673] [headPointer=“{"Fid":0,"Offset":95718}”] [handlePointer=“{"Fid":0,"Offset":95718}”]
    [2021/07/13 23:23:53.348 +08:00] [INFO] [server.go:438] [“register success”] [NodeID=10.16.40.229:8250]
    [2021/07/13 23:23:53.349 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:23:53.349 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:23:53.349 +08:00] [INFO] [node.go:213] [“Notifying drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:23:53.352 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:23:53.352 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:24:03.339 +08:00] [INFO] [storage.go:384] [DBStats] [DBStats=“{"WriteDelayCount":0,"WriteDelayDuration":0,"WritePaused":false,"AliveSnapshots":0,"AliveIterators":0,"IOWrite":38347,"IORead":40278,"BlockCacheSize":4704,"OpenedTablesCount":1,"LevelSizes":[0,36683],"LevelTablesCounts":[0,1],"LevelRead":[0,39051],"LevelWrite":[0,36683],"LevelDurations":[0,7837989]}”]
    [2021/07/13 23:24:03.345 +08:00] [INFO] [server.go:567] [“server info tick”] [writeBinlogCount=0] [alivePullerCount=0] [MaxCommitTS=426295910029852674]
    [2021/07/13 23:24:03.351 +08:00] [ERROR] [main.go:75] [“start pump server failed”] [error=“fail to notify all living drainer: connect drainer(10.16.41.115:8249): context deadline exceeded”] [errorVerbose=“context deadline exceeded
    connect drainer(10.16.41.115:8249)
    github.com/pingcap/tidb-binlog/pump.notifyDrainer
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/node.go:206
    github.com/pingcap/tidb-binlog/pump.(*pumpNode).Notify
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/node.go:188
    github.com/pingcap/tidb-binlog/pump.(*Server).Start
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/pump/server.go:443
    main.main
    \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/cmd/pump/main.go:74
    runtime.main
    \t/usr/local/go/src/runtime/proc.go:203
    runtime.goexit
    \t/usr/local/go/src/runtime/asm_amd64.s:1357
    fail to notify all living drainer”]
    [2021/07/13 23:24:18.565 +08:00] [INFO] [version.go:50] [“Welcome to Pump”] [“Release Version”=v4.0.12] [“Git Commit Hash”=e28b75cac81bea82c2a89ad024d1a37bf3c9bee9] [“Build TS”=“2021-04-02 03:26:43”] [“Go Version”=go1.13] [“Go OS/Arch”=linux/amd64]
    [2021/07/13 23:24:18.565 +08:00] [INFO] [main.go:48] [“start pump…”] [config=“{"log-level":"info","node-id":"10.16.40.229:8250","addr":"http://0.0.0.0:8250","advertise-addr":"http://10.16.40.229:8250","socket":"","pd-urls":"http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379","EtcdDialTimeout":5000000000,"data-dir":"/DATA/tidb/pump-8250","heartbeat-interval":2,"gc":"7","log-file":"/opt/tidb/deploy/pump-8250/log/pump.log","security":{"ssl-ca":"","ssl-cert":"","ssl-key":"","cert-allowed-cn":null},"gen-binlog-interval":3,"MetricsAddr":"","MetricsInterval":15,"storage":{"sync-log":null,"kv_chan_cap":0,"slow_write_threshold":0,"kv":null,"stop-write-at-available-space":null}}”]
    [2021/07/13 23:24:18.565 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:24:18.568 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:24:18.568 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:24:18.568 +08:00] [INFO] [server.go:131] [“get clusterID success”] [clusterID=6960939055123431353]
    [2021/07/13 23:24:18.568 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[http://10.16.40.229:2379,http://10.16.40.241:2379,http://10.16.41.206:2379]”]
    [2021/07/13 23:24:18.571 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:24:18.571 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:24:18.572 +08:00] [INFO] [store.go:68] [“new store”] [path=“tikv://10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379?disableGC=true”]
    [2021/07/13 23:24:18.572 +08:00] [INFO] [client.go:193] [“[pd] create pd client with endpoints”] [pd-address=“[10.16.40.229:2379,10.16.40.241:2379,10.16.41.206:2379]”]
    [2021/07/13 23:24:18.574 +08:00] [INFO] [base_client.go:308] [“[pd] switch leader”] [new-leader=http://10.16.40.229:2379] [old-leader=]
    [2021/07/13 23:24:18.574 +08:00] [INFO] [base_client.go:112] [“[pd] init cluster id”] [cluster-id=6960939055123431353]
    [2021/07/13 23:24:18.576 +08:00] [INFO] [store.go:74] [“new store with retry success”]
    [2021/07/13 23:24:18.577 +08:00] [INFO] [storage.go:135] [NewAppendWithResolver] [options=“{"ValueLogFileSize":524288000,"Sync":true,"KVChanCapacity":1048576,"SlowWriteThreshold":1,"StopWriteAtAvailableSpace":10737418240,"KVConfig":null}”]
    [2021/07/13 23:24:18.579 +08:00] [INFO] [storage.go:1397] [“open metadata db”] [config=“{"block-cache-capacity":8388608,"block-restart-interval":16,"block-size":4096,"compaction-L0-trigger":8,"compaction-table-size":67108864,"compaction-total-size":536870912,"compaction-total-size-multiplier":8,"write-buffer":67108864,"write-L0-pause-trigger":24,"write-L0-slowdown-trigger":17}”]
    [2021/07/13 23:24:18.587 +08:00] [INFO] [storage.go:217] [“Append info”] [gcTS=0] [maxCommitTS=426295910029852674] [headPointer=“{"Fid":0,"Offset":95844}”] [handlePointer=“{"Fid":0,"Offset":95844}”]
    [2021/07/13 23:24:18.598 +08:00] [INFO] [server.go:438] [“register success”] [NodeID=10.16.40.229:8250]
    [2021/07/13 23:24:18.599 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:24:18.599 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:24:18.600 +08:00] [INFO] [node.go:213] [“Notifying drainer”] [addr=10.16.40.242:8249]
    [2021/07/13 23:24:18.602 +08:00] [INFO] [node.go:197] [“Start trying to notify drainer”] [addr=10.16.41.115:8249]
    [2021/07/13 23:24:18.602 +08:00] [INFO] [node.go:200] [“Connecting drainer”] [addr=10.16.41.115:8249]

drainer日志
[2021/07/13 23:29:38.831 +08:00] [INFO] [merge.go:222] [“merger add source”] [“source id”=10.16.41.206:8250]
[2021/07/13 23:29:38.831 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.41.206:8250]
[2021/07/13 23:29:44.742 +08:00] [INFO] [merge.go:222] [“merger add source”] [“source id”=10.16.40.241:8250]
[2021/07/13 23:29:44.743 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.40.241:8250]
[2021/07/13 23:29:46.836 +08:00] [INFO] [merge.go:222] [“merger add source”] [“source id”=10.16.40.229:8250]
[2021/07/13 23:29:46.836 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.40.229:8250]
[2021/07/13 23:29:48.837 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.41.206:8250] [error=“rpc error: code = Unavailable desc = connection closed”]
[2021/07/13 23:29:48.837 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.41.206:8250] [error=“rpc error: code = Unavailable desc = connection closed”] [errorVerbose=“rpc error: code = Unavailable desc = connection closed
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:49.838 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.41.206:8250]
[2021/07/13 23:29:49.838 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.41.206:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport:
Error while dialing dial tcp 10.16.41.206:8250: connect: connection refused"”]
[2021/07/13 23:29:49.838 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.41.206:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.41.206:8250: connect: connection refused"”] [errorVerbose=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.41.206:8250: connect: connection refused"
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:50.838 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.41.206:8250]
[2021/07/13 23:29:50.839 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.41.206:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport:
Error while dialing dial tcp 10.16.41.206:8250: connect: connection refused"”]
[2021/07/13 23:29:50.839 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.41.206:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.41.206:8250: connect: connection refused"”] [errorVerbose=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.41.206:8250: connect: connection refused"
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:51.837 +08:00] [INFO] [merge.go:231] [“merger remove source”] [“source id”=10.16.41.206:8250]
[2021/07/13 23:29:51.838 +08:00] [INFO] [pump.go:77] [“pump is closing”] [id=10.16.41.206:8250]
[2021/07/13 23:29:51.838 +08:00] [INFO] [collector.go:354] [“node of cluster has been removed and release the connection to it”] [nodeID=10.16.41.206:8250] [clusterID=6960939055123431353]
[2021/07/13 23:29:51.839 +08:00] [WARN] [merge.go:284] [“can’t read binlog from pump”] [“source id”=10.16.41.206:8250]
[2021/07/13 23:29:54.749 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.40.241:8250] [error=“rpc error: code = Unavailable desc = connection closed”]
[2021/07/13 23:29:54.749 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.40.241:8250] [error=“rpc error: code = Unavailable desc = connection closed”] [errorVerbose=“rpc error: code = Unavailable desc = connection closed
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:55.749 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.40.241:8250]
[2021/07/13 23:29:55.750 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.40.241:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport:
Error while dialing dial tcp 10.16.40.241:8250: connect: connection refused"”]
[2021/07/13 23:29:55.750 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.40.241:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.40.241:8250: connect: connection refused"”] [errorVerbose=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.40.241:8250: connect: connection refused"
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:56.750 +08:00] [INFO] [pump.go:138] [“pump create pull binlogs client”] [id=10.16.40.241:8250]
[2021/07/13 23:29:56.750 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.40.241:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport:
Error while dialing dial tcp 10.16.40.241:8250: connect: connection refused"”]
[2021/07/13 23:29:56.750 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.40.241:8250] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.40.241:8250: connect: connection refused"”] [errorVerbose=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.16.40.241:8250: connect: connection refused"
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:56.839 +08:00] [INFO] [merge.go:231] [“merger remove source”] [“source id”=10.16.40.229:8250]
[2021/07/13 23:29:56.839 +08:00] [INFO] [pump.go:77] [“pump is closing”] [id=10.16.40.229:8250]
[2021/07/13 23:29:56.839 +08:00] [INFO] [collector.go:354] [“node of cluster has been removed and release the connection to it”] [nodeID=10.16.40.229:8250] [clusterID=6960939055123431353]
[2021/07/13 23:29:56.839 +08:00] [INFO] [merge.go:231] [“merger remove source”] [“source id”=10.16.40.241:8250]
[2021/07/13 23:29:56.839 +08:00] [INFO] [pump.go:77] [“pump is closing”] [id=10.16.40.241:8250]
[2021/07/13 23:29:56.839 +08:00] [INFO] [collector.go:354] [“node of cluster has been removed and release the connection to it”] [nodeID=10.16.40.241:8250] [clusterID=6960939055123431353]
[2021/07/13 23:29:56.843 +08:00] [ERROR] [pump.go:234] [“pump create PullBinlogs client failed”] [id=10.16.40.229:8250] [error=“rpc error: code = Unavailable desc = connection closed”]
[2021/07/13 23:29:56.843 +08:00] [ERROR] [pump.go:140] [“pump create pull binlogs client failed”] [id=10.16.40.229:8250] [error=“rpc error: code = Unavailable desc = connection closed”] [errorVerbose=“rpc error: code = Unavailable desc = connection closed
github.com/pingcap/errors.AddStack
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:238
github.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:139
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2021/07/13 23:29:57.843 +08:00] [WARN] [merge.go:284] [“can’t read binlog from pump”] [“source id”=10.16.40.229:8250]
[2021/07/13 23:29:57.843 +08:00] [WARN] [merge.go:284] [“can’t read binlog from pump”] [“source id”=10.16.40.241:8250]

1 个赞

麻烦重新启动下 pump 所有节点和 drianer 节点,看下是否能恢复正常。

pump和drainer重启完毕后仍然尝试连接老的drainer



麻烦确认下,如何操作 drainer 下线的?可以执行下面的命令查看下当前 drainer 的状态。

SHOW DRAINER STATUS;

直接在旧的drainer节点上停掉drainer,然后更新集群配置文件
image

从你上面的截图中可以看到,旧的 drainer 并未下线成功,还是 online 的状态。

另外你这边操作下线方式是有问题的。要么使用 TiUP scale-in 利用 TiUP 下线。要么手动下线 drainer,手动下线 drainer 可以使用 binlogctl,具体可参考官网。

已经根据文档正确下线drainer,目前pump已正常启动,多谢~

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。