alter hash分区表 add partition,手动终止后,tiflash崩溃

【 TiDB 使用环境】生产环境
【复现路径】
表结构

CREATE TABLE `leader_audit_activity_log` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `tenant_id` int(11) NOT NULL COMMENT ' 团长id',
  `third_activity_id` varchar(100) DEFAULT NULL COMMENT '第三方活动ID',
  `third_activity_name` varchar(100) DEFAULT NULL COMMENT '第三方活动名称',
  `platform` tinyint(4) DEFAULT '1' COMMENT '推广平台类型 1抖音,2有赞,3快手 6视频号',
  `apply_source` int(2) DEFAULT '1' COMMENT '报名来源(1:商家、2:二级团)',
  `partner_shop_id` varchar(50) DEFAULT NULL COMMENT '报名来源ID(商家合作存店铺ID,团长合作存原始团长机构ID)',
  `partner_shop_name` varchar(50) DEFAULT NULL COMMENT '报名来源名称',
  `relate_apply_id` bigint(20) DEFAULT '0' COMMENT '第三方-申请ID(活动审核使用)',
  `link_id` bigint(20) DEFAULT '0' COMMENT 'leader_relate_link表主键',
  `status` int(2) DEFAULT '1' COMMENT '审核状态,1审核通过,2审核拒绝 3审核失败',
  `fail_reason` varchar(2000) DEFAULT NULL COMMENT '失败原因',
  `audit_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '审核时间',
  `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`id`,`tenant_id`) /*T![clustered_index] NONCLUSTERED */,
  UNIQUE KEY `relate_apply_id` (`relate_apply_id`,`tenant_id`,`platform`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=577393737 /*T! SHARD_ROW_ID_BITS=4 PRE_SPLIT_REGIONS=2 */ COMMENT='团长自动审核记录'
PARTITION BY KEY (`tenant_id`) PARTITIONS 16

有tiflash副本数
image

执行的操作
alter table leader_audit_activity_log add PARTITION PARTITIONS 8

表的数据量

执行几分钟后取消,tiflash开始崩溃


tiflash 监控

怀疑是tiflash rollback 跑死了,有什么参数能让tiflash 启动的时候延迟同步

tiflash日志
server.log.2025-07-30-11_32_36.354.gz (6.4 MB)
server.log.2025-07-30-11_01_10.955.gz (6.6 MB)

[2025/07/30 10:59:13.419 +08:00] [INFO] [SegmentReader.cpp:60] [“Pop fail, stop=true”] [thread_id=865]
[2025/07/30 10:59:13.419 +08:00] [INFO] [SegmentReader.cpp:60] [“Pop fail, stop=true”] [thread_id=888]
[2025/07/30 10:59:13.419 +08:00] [INFO] [SegmentReader.cpp:60] [“Pop fail, stop=true”] [thread_id=692]
[2025/07/30 10:59:13.419 +08:00] [INFO] [SegmentReader.cpp:60] [“Pop fail, stop=true”] [thread_id=863]
[2025/07/30 10:59:13.419 +08:00] [INFO] [SegmentReader.cpp:60] [“Pop fail, stop=true”] [thread_id=775]
[2025/07/30 10:59:13.419 +08:00] [INFO] [SegmentReader.cpp:60] [“Pop fail, stop=true”] [thread_id=721]
[2025/07/30 10:59:13.524 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 945, log msg : subchannel 0x7f0b98fb0c00 {address=ipv4:192.168.5.69:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.69:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.69:3930}: connect failed: {"created":"@1753844353.524489006","description":"Failed to connect to remote host: FD Shutdown","file":"/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/lib/iomgr/lockfree_event.cc","file_line":217,"os_error":"Timeout occurred","referenced_errors":[{"created":"@1753844353.524482444","description":"connect() timed out","file":"/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/lib/iomgr/tcp_client_posix.cc","file_line":114}],"target_address":"ipv4:192.168.5.69:3930"}”] [source=grpc] [thread_id=146380]
[2025/07/30 10:59:13.524 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 884, log msg : subchannel 0x7f0b98fb0c00 {address=ipv4:192.168.5.69:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.69:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.69:3930}: Retry immediately”] [source=grpc] [thread_id=146380]
[2025/07/30 10:59:13.524 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 910, log msg : subchannel 0x7f0b98fb0c00 {address=ipv4:192.168.5.69:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.69:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.69:3930}: failed to connect to channel, retrying”] [source=grpc] [thread_id=146380]
[2025/07/30 10:59:13.592 +08:00] [INFO] [TiDBSchemaSyncer.cpp:261] [“Sync table schema begin, table_id=202534”] [source=“keyspace=4294967295”] [thread_id=407]
[2025/07/30 10:59:14.277 +08:00] [INFO] [TiDBSchemaSyncer.cpp:261] [“Sync table schema begin, table_id=202526”] [source=“keyspace=4294967295”] [thread_id=1691]
[2025/07/30 10:59:14.304 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 945, log msg : subchannel 0x7f096ef4b400 {address=ipv4:192.168.5.137:20160, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.137:20160, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.137:20160}: connect failed: {"created":"@1753844354.304551992","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/lib/iomgr/tcp_client_posix.cc","file_line":200,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:192.168.5.137:20160"}”] [source=grpc] [thread_id=146381]
[2025/07/30 10:59:14.304 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 887, log msg : subchannel 0x7f096ef4b400 {address=ipv4:192.168.5.137:20160, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.137:20160, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.137:20160}: Retry in 967 milliseconds”] [source=grpc] [thread_id=146381]
[2025/07/30 10:59:14.305 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 945, log msg : subchannel 0x7f17e04d2c00 {address=ipv4:192.168.5.30:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.30:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.30:3930}: connect failed: {"created":"@1753844354.304616563","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/lib/iomgr/tcp_client_posix.cc","file_line":200,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:192.168.5.30:3930"}”] [source=grpc] [thread_id=146381]
[2025/07/30 10:59:14.305 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 887, log msg : subchannel 0x7f17e04d2c00 {address=ipv4:192.168.5.30:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.30:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.30:3930}: Retry in 1492 milliseconds”] [source=grpc] [thread_id=146381]
[2025/07/30 10:59:14.305 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 945, log msg : subchannel 0x7f0b98fb0c00 {address=ipv4:192.168.5.69:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.69:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.69:3930}: connect failed: {"created":"@1753844354.304648140","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/lib/iomgr/tcp_client_posix.cc","file_line":200,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:192.168.5.69:3930"}”] [source=grpc] [thread_id=146381]
[2025/07/30 10:59:14.305 +08:00] [INFO] [Server.cpp:385] [“/data/tiflash_misc/tiflash-7.5.4/contrib/grpc/src/core/ext/filters/client_channel/subchannel.cc, line number: 887, log msg : subchannel 0x7f0b98fb0c00 {address=ipv4:192.168.5.69:3930, args=grpc.client_channel_factory=0x7f18f54ae3c0, grpc.default_authority=192.168.5.69:3930, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f18fff60f20, grpc.max_receive_message_length=-1, grpc.max_reconnect_backoff_ms=3000, grpc.min_reconnect_backoff_ms=1000, grpc.primary_user_agent=grpc-c++/1.44.0, grpc.resource_quota=0x7f18fff611f0, grpc.server_uri=dns:///192.168.5.69:3930}: Retry in 1796 milliseconds”] [source=grpc] [thread_id=146381]
[2025/07/30 11:00:31.680 +08:00] [INFO] [BaseDaemon.cpp:1178] [“Welcome to TiFlash”] [thread_id=1]
[2025/07/30 11:00:31.680 +08:00] [INFO] [BaseDaemon.cpp:1179] [“Starting daemon with revision 54381”] [thread_id=1]
[2025/07/30 11:00:31.680 +08:00] [INFO] [BaseDaemon.cpp:1182] [“TiFlash build info: TiFlash\nRelease Version: v7.5.4-emar\nEdition: Community\nGit Commit Hash: 85341773736131ac06d7644b47d66b8f00d36739\nGit Branch: HEAD\nUTC Build Time: 2024-10-16 08:32:24\nEnable Features: jemalloc sm4(GmSSL) unwind thinlto\nProfile: RELWITHDEBINFO\n”] [thread_id=1]
[2025/07/30 11:00:31.681 +08:00] [INFO] [] [“starting up”] [source=Application] [thread_id=1]
[2025/07/30 11:00:31.722 +08:00] [INFO] [Server.cpp:432] [“Got jemalloc version: 5.3-RC”] [thread_id=1]
[2025/07/30 11:00:31.722 +08:00] [INFO] [Server.cpp:441] [“Not found environment variable MALLOC_CONF”] [thread_id=1]
[2025/07/30 11:00:31.724 +08:00] [INFO] [Server.cpp:447] [“Got jemalloc config: opt.background_thread false, opt.max_background_threads 4”] [thread_id=1]
[2025/07/30 11:00:31.724 +08:00] [INFO] [Server.cpp:451] [“Try to use background_thread of jemalloc to handle purging asynchronously”] [thread_id=1]
[2025/07/30 11:00:31.724 +08:00] [INFO] [Server.cpp:454] [“Set jemalloc.max_background_threads 1”] [thread_id=1]
[2025/07/30 11:00:31.724 +08:00] [INFO] [Server.cpp:457] [“Set jemalloc.background_thread true”] [thread_id=1]
[2025/07/30 11:00:31.733 +08:00] [INFO] [ScanContext.cpp:235] [“flash_server_addr=0.0.0.0:3930, current_instance_id=yz-epbd-017073:3930”] [thread_id=1]
[2025/07/30 11:00:31.737 +08:00] [INFO] [StorageConfigParser.cpp:261] [“format_version 0 lazily_init_store true”] [thread_id=1]

[2025/07/30 11:00:31.680 +08:00] [INFO] [BaseDaemon.cpp:1178] [“Welcome to TiFlash”] [thread_id=1]

这条重启日志附近,啥也看不出来。有点难办啊。

tiflash加分区,老大难啊

可以看第二个日志,日志刷的太多。tiflash一直重复蹦了半个小时,直到强制下线才稳定

1 个赞

一般 tiflash 节点异常,只能重建。因为 tiflash 数据都是 tikv 拉的,因此没有很好的办法能修。只能重新同步。

https://docs.pingcap.com/zh/tidb/stable/release-7.5.5/

  • 修复在 TiDB 执行并发 DDL 遇到冲突时 TiFlash panic 的问题 #8578 @JaySon-Huang
    • 修复当表里含 Bit 类型列并且带有表示非法字符的默认值时,TiFlash 无法解析表 schema 的问题 #9461 @Lloyd-Pottiger

755 修了两个问题,你看下有没有相关性。

不太一样,修复的是并行DDL, hash表重分区操作不是并行吧

正常操作应该是先把tiflash调为0,分区加好后,再恢复tifash副本数

我觉得不是,如果生产库的大表都是AP请求, 你把副本数变为0, 业务不就打死tidb了吗

ap请求原则上是可控的,不是c端,不是tp请求有高并发需求