正常走tiflash的大表查询,无法查询返回出结果,或者,查询一直处于等待中。tiflash中出现报错:[2024/04/10 17:45:52.169 +08:00] [ERROR] [<unknown>] [" Failed4: Deadline Exceeded"] [source=pingcap.tikv] [thread_id=40939]

【 TiDB 使用环境】生产环境
【 TiDB 版本】v6.5.0
【复现路径】tiflash中出现报错:[2024/04/10 17:45:52.169 +08:00] [ERROR] [] [" Failed4: Deadline Exceeded"] [source=pingcap.tikv] [thread_id=40939]
【遇到的问题:问题现象及影响】业务侧的影响为:正常走tiflash的大表查询,无法查询返回出结果。

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】

从pd看看,这个region的learner副本正常不。

# tiup ctl:v6.5.0 pd -u http://10.3.8.231:2379  region 16381494
Starting component `ctl`: /root/.tiup/components/ctl/v6.5.0/ctl pd -u http://10.3.8.231:2379 region 16381494
{
  "id": 16381494,
  "start_key": "7480000000000039FFFE5F728000000007FFA013A40000000000FA",
  "end_key": "7480000000000039FFFE5F728000000008FF9CE29A0000000000FA",
  "epoch": {
    "conf_ver": 7,
    "version": 53144
  },
  "peers": [
    {
      "id": 16381495,
      "store_id": 1,
      "role_name": "Voter"
    },
    {
      "id": 16381496,
      "store_id": 4,
      "role_name": "Voter"
    },
    {
      "id": 16381497,
      "store_id": 5,
      "role_name": "Voter"
    },
    {
      "id": 16386237,
      "store_id": 2516002,
      "role": 1,
      "role_name": "Learner",
      "is_learner": true
    },
    {
      "id": 16386273,
      "store_id": 3733643,
      "role": 1,
      "role_name": "Learner",
      "is_learner": true
    }
  ],
  "leader": {
    "id": 16381497,
    "store_id": 5,
    "role_name": "Voter"
  },
  "cpu_usage": 0,
  "written_bytes": 2365,
  "read_bytes": 0,
  "written_keys": 13,
  "read_keys": 0,
  "approximate_size": 210,
  "approximate_keys": 246922
}

其他tiflash表没问题吗?table_id14815这个表结构发一下

根据抓取tiflash的日志,获取到当时报错的region_id,然后,再进行查询分组得到,主要集中在这3张加载到tiflash中的分区表:

select 
-- region_id,
table_id,
-- DB_NAME,
table_name,
count(*)
  from  information_schema.TIKV_REGION_STATUS
  where REGION_ID in 
  (
16373822
,16373518
,16373766
,16373814
,16383394
,16374068
,16373682
,18008487
,16383378
,16380622
,16380046
,16377630
,16374240
,16374124
,16373802
,16373654
,16377758
,17970704
,16374502
,17985767
,16381494
,18048017
,18051349
,18045645
,18031829
,18027579
,18026189
,18022247
,18011675
,17988865
,17988747
,17987687
,17987231
,17985179
,17981389
,17980745
,17977629
,17976025
,17975955
,17975009
,17969976
,17954114
,17156500
,17156484
,16384950
,16383386
,16382962
,16382818
,16382730
,16382610
,16382510
,16382410
,16381954
,16381770
,16381566
,16381562
,16381554
,16380946
,16380750
,16380694
,16380514
,16380458
,16380150
,16380094
,16380022
,16379874
,16379302
,16378838
,16378490
,16378074
,16377362
,16377062
,16377006
,16376118
,16375982
,16375774
,16375598
,16375058
,16374806
,16374650
,16374252
,16373968
,16373892
,16373838
,16373830
,16373810
,16373750
,16373738
,16373730
,16373718
,16373706
,16373666
,16373642
,16373638
,16373630
,16373626
,16373618
,16373570
,16373546
,16373526
,16373770
,16380710
,17156488
,16381778
,17978185
,18051565
,17977763
,17980695
,16374152
,18050309
,17969982
,16382154
,16383698
,16373510
,17983577
,16380734
  )
  group by table_name;

有REGION出现异常了,可以先关闭表的TIFLASH,再打开试试

不行先把对应表的tiflash replica置为0 再置为1看下还报错不

学习一下