drainer挂掉无法启动

【 TiDB 使用环境】生产环境
【 TiDB 版本】2.1.2
【遇到的问题:问题现象及影响】
最近通过ansible给集群开启binlog组件后,drainer频繁重启,报错日志如下,已经找到文档配置了kafka的配置参数
message.max.bytes=1073741824
replica.fetch.max.bytes=1073741824
fetch.message.max.bytes=1073741824
但是过一段时间,drainer又挂掉。kafka没有异常,没有报oom,有没有办法控制发送端的最大消息大小?

【资源配置】drainer 1个实例 8c16g,pump三个实例都是8c16g
【附件:截图/日志/监控】
日志:
2023/03/28 11:12:01 server.go:97: [info] clusterID of drainer server is 6575739456418311228
2023/03/28 11:12:01 checkpoint.go:49: [info] initialize kafka type checkpoint binlog commitTS = 440393664168984615 with config &{Db:0xc0001472c0 Schema: Table: ClusterID:6575739456418311228 InitialCommitTS:440338529740390402 CheckPointFile:/data1/deploy/data.drainer/savepoint}
2023/03/28 11:12:01 server.go:291: [info] register success, this drainer’s node id is tidb-drainer-kafka:8249
2023/03/28 11:12:02 server.go:342: [info] start to server request on http://172.23.5.156:8249
2023/03/28 11:12:03 merge.go:208: [info] merger add source tidb-pump-01:8250
2023/03/28 11:12:03 merge.go:208: [info] merger add source tidb-pump-03:8250
2023/03/28 11:12:03 merge.go:208: [info] merger add source tidb-pump-02:8250
2023/03/28 11:12:03 pump.go:115: [info] [pump tidb-pump-02:8250] create pull binlogs client
2023/03/28 11:12:03 pump.go:115: [info] [pump tidb-pump-03:8250] create pull binlogs client
2023/03/28 11:12:03 pump.go:115: [info] [pump tidb-pump-01:8250] create pull binlogs client
2023/03/28 11:12:03 client.go:120: [sarama] Initializing new client
2023/03/28 11:12:03 config.go:361: [sarama] Producer.MaxMessageBytes must be smaller than MaxRequestSize; it will be ignored.
2023/03/28 11:12:03 config.go:382: [sarama] ClientID is the default of ‘sarama’, you should consider setting it to something application-specific.
2023/03/28 11:12:03 client.go:167: [sarama] Successfully initialized new client
2023/03/28 11:12:03 config.go:361: [sarama] Producer.MaxMessageBytes must be smaller than MaxRequestSize; it will be ignored.
2023/03/28 11:12:03 config.go:382: [sarama] ClientID is the default of ‘sarama’, you should consider setting it to something application-specific.
2023/03/28 11:12:03 client.go:699: [sarama] client/metadata fetching metadata for [tidb] from broker 172.23.5.156:9092
2023/03/28 11:12:03 broker.go:148: [sarama] Connected to broker at 172.23.5.156:9092 (unregistered)
2023/03/28 11:12:33 client.go:726: [sarama] client/metadata got error from broker while fetching metadata: read tcp 172.23.5.156:34540->172.23.5.156:9092: i/o timeout
2023/03/28 11:12:33 broker.go:191: [sarama] Closed connection to broker 172.23.5.156:9092
2023/03/28 11:12:33 client.go:732: [sarama] client/metadata no available broker to send metadata request to
2023/03/28 11:12:33 client.go:508: [sarama] client/brokers resurrecting 1 dead seed brokers
2023/03/28 11:12:33 client.go:690: [sarama] client/metadata retrying after 500ms… (10000 attempts remaining)
2023/03/28 11:12:33 config.go:361: [sarama] Producer.MaxMessageBytes must be smaller than MaxRequestSize; it will be ignored.
2023/03/28 11:12:33 config.go:382: [sarama] ClientID is the default of ‘sarama’, you should consider setting it to something application-specific.
2023/03/28 11:12:33 client.go:699: [sarama] client/metadata fetching metadata for [tidb] from broker 172.23.5.156:9092
2023/03/28 11:12:33 broker.go:148: [sarama] Connected to broker at 172.23.5.156:9092 (unregistered)
2023/03/28 11:12:34 syncer.go:383: [fatal] /home/jenkins/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/executor/kafka.go:193: fail to push msg to kafka after 30s, check if kafka is up and working
/home/jenkins/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/executor/kafka.go:163:
/home/jenkins/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/executor/kafka.go:134:

看起来是 Kafka 下游参数配置问题,导致同步一直无法正常同步。 另外就是网络,可以从 Dainer 节点通过 telnet 检测一下下游 Kafka 的网络是否通。

drainer和kafka部署在一台机器上,我telnet是通的