tiflash 查询报错

【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
版本为v7.1.5,查询2次均报错

ERROR 1105 (HY000): other error for mpp stream: From MPP<query:<query_ts:1746581081328516297, local_query_id:43306, server_id:1741530, start_ts:457855750975782930>,task_id:1>: Code: 0, e.displayText() = DB::TiFlashException: Region 38136561 is unavailable at 57993: (while doing learner read for table, logical table_id: 24704), e.what() = DB::TiFlashException,


ERROR 1105 (HY000): other error for mpp stream: From MPP<query:<query_ts:1746581673243438756, local_query_id:43316, server_id:1741530, start_ts:457855906139078672>,task_id:2>: Code: 0, e.displayText() = DB::TiFlashException: Region 36237694 is unavailable at 8: (while doing learner read for table, logical table_id: 24704), e.what() = DB::TiFlashException,  

pdctl 查看 region 状态为

» region 38136561
{"id":38136561,"start_key":"7480000000000060FF805F728000000003FF8E3BEA0000000000FA","end_key":"7480000000000060FF8100000000000000F8","epoch":{"conf_ver":17545,"version":108562},"peers":[{"role_name":"Learner","is_learner":true,"id":82424261,"store_id":8717408,"role":1},{"role_name":"Learner","is_learner":true,"id":90809417,"store_id":8032862,"role":1},{"role_name":"Voter","id":94678865,"store_id":8033247},{"role_name":"Voter","id":94690109,"store_id":8717523},{"role_name":"Voter","id":94690236,"store_id":8959052}],"leader":{"role_name":"Voter","id":94690236,"store_id":8959052},"pending_peers":[{"role_name":"Learner","is_learner":true,"id":90809417,"store_id":8032862,"role":1}],"cpu_usage":0,"written_bytes":215474,"read_bytes":228116,"written_keys":790,"read_keys":768,"approximate_size":138,"approximate_keys":87878}
» region 36237694
{"id":36237694,"start_key":"7480000000000060FF805F728000000003FF87EEA60000000000FA","end_key":"7480000000000060FF805F728000000003FF88E7FF0000000000FA","epoch":{"conf_ver":17355,"version":108555},"peers":[{"role_name":"Voter","id":36237695,"store_id":8033245},{"role_name":"Voter","id":36237696,"store_id":8717523},{"role_name":"Voter","id":36237697,"store_id":8959052},{"role_name":"Learner","is_learner":true,"id":64046784,"store_id":8910051,"role":1},{"role_name":"Learner","is_learner":true,"id":90647471,"store_id":8717408,"role":1}],"leader":{"role_name":"Voter","id":36237697,"store_id":8959052},"cpu_usage":0,"written_bytes":195,"read_bytes":0,"written_keys":2,"read_keys":0,"approximate_size":95,"approximate_keys":85837}

昨天查询这张表

select count(1) from ug_al_da_dic_hive_planting_video_hot_v2_r2;

提示有3个 region 状态异常,pdctl 查看有2个空 region ,所以把这个表 tiflash replica 先设置为了0,然后重新同步设置为2,今天查看 replica 状态

+-----------------+--------------------------------------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA    | TABLE_NAME                                 | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+-----------------+--------------------------------------------+----------+---------------+-----------------+-----------+----------+
| lego_statistics | ug_al_da_dic_hive_planting_video_hot_v2_r2 |    24704 |             2 |                 |         1 |     0.99 |
+-----------------+--------------------------------------------+----------+---------------+-----------------+-----------+----------+

设置为 0 最后最好等一会,看 tiflash 那边空间不下降了在设置回来。

region 不可用一般是什么原因造成的呢,从业务上看,这两张表是经常有 truncate 然后重新加载数据的操作。
对这样的 region 不可用的问题有没有更快的修复方法

感觉有点问题,我找找研发老师帮忙看看 :thinking:

@Mwkk

看看这个。tiflash 机器磁盘是否超过了 PD 的 low-space-ratio 的配置。

low-space-ratio 看了是默认的0.8,磁盘是5T,3个kv节点,每个节点的容量设置的1.2T ,监控看每个kv只使用了200G不到,应该不是 low-space-ratio 的问题。

后面又遇到一次 region 不可用,我直接把所有的表replica 都设置了0,然后重装 tiflash 组件,最近没遇到这个报错了,我再观察一段时间

1 个赞

那就观察一段时间吧。:thinking: