升级6.1后,TiFlash服务异常

看执行计划的话并无异常:
这是开启了动态裁剪的:

 Limit_14	1000.00	root	""	offset:0, count:1000
└─IndexJoin_21	1000.00	root	""	inner join, inner:IndexLookUp_20, outer key:mbase_test. d_media.date, mbase_test. d_media.aweme_id, inner key:mbase_test. d_video.date, mbase_test. d_video.aweme_id, equal cond:eq(mbase_test. d_media.aweme_id, mbase_test. d_video.aweme_id), eq(mbase_test. d_media.date, mbase_test. d_video.date)
  ├─TableReader_55(Build)	999.98	root	partition:all	data:Selection_54
  │ └─Selection_54	999.98	cop[tiflash]	""	eq(mbase_test. d_media.download_status, 0)
  │   └─TableFullScan_53	2081.51	cop[tiflash]	table:m	keep order:false
  └─IndexLookUp_20(Probe)	1.00	root	partition:all	""
    ├─IndexRangeScan_18(Build)	1.00	cop[tikv]	table:v, index:PRIMARY(date, aweme_id)	range: decided by [eq(mbase_test. d_video.date, mbase_test. d_media.date) eq(mbase_test. d_video.aweme_id, mbase_test. d_media.aweme_id)], keep order:false
    └─TableRowIDScan_19(Probe)	1.00	cop[tikv]	table:v	keep order:false

这是没有开启动态裁剪的:

|Limit_154|1000.00|root|""|offset:0, count:1000|
|---|---|---|---|---|
|└─HashJoin_158|1000.00|root|""|inner join, equal:[eq(mbase_test. d_media.date, mbase_test. d_video.date) eq(mbase_test. d_media.aweme_id, mbase_test. d_video.aweme_id)]|
|  ├─PartitionUnion_859(Build)|8167108.25|root|""|""|
|  │ ├─TableReader_863|0.00|root|""|data:Selection_862|
|  │ │ └─Selection_862|0.00|cop[tikv]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_861|9.00|cop[tikv]|table:m, partition:p2017|keep order:false|
|  │ ├─TableReader_872|0.00|root|""|data:Selection_871|
|  │ │ └─Selection_871|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_870|15.00|cop[tiflash]|table:m, partition:p201801|keep order:false|
|  │ ├─TableReader_875|0.00|root|""|data:Selection_874|
|  │ │ └─Selection_874|0.00|cop[tikv]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_873|8.00|cop[tikv]|table:m, partition:p201802|keep order:false|
|  │ ├─TableReader_881|0.00|root|""|data:Selection_880|
|  │ │ └─Selection_880|0.00|cop[tikv]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_879|6.00|cop[tikv]|table:m, partition:p201803|keep order:false|
|  │ ├─TableReader_890|0.00|root|""|data:Selection_889|
|  │ │ └─Selection_889|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_888|19.00|cop[tiflash]|table:m, partition:p201804|keep order:false|
|  │ ├─TableReader_896|0.00|root|""|data:Selection_895|
|  │ │ └─Selection_895|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_894|16.00|cop[tiflash]|table:m, partition:p201805|keep order:false|
|  │ ├─TableReader_902|0.00|root|""|data:Selection_901|
|  │ │ └─Selection_901|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_900|41.00|cop[tiflash]|table:m, partition:p201806|keep order:false|
|  │ ├─TableReader_908|0.00|root|""|data:Selection_907|
|  │ │ └─Selection_907|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_906|43.00|cop[tiflash]|table:m, partition:p201807|keep order:false|
|  │ ├─TableReader_914|0.00|root|""|data:Selection_913|
|  │ │ └─Selection_913|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_912|55.00|cop[tiflash]|table:m, partition:p201808|keep order:false|
|  │ ├─TableReader_920|0.00|root|""|data:Selection_919|
|  │ │ └─Selection_919|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_918|42.00|cop[tiflash]|table:m, partition:p201809|keep order:false|
|  │ ├─TableReader_926|0.00|root|""|data:Selection_925|
|  │ │ └─Selection_925|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_924|35.00|cop[tiflash]|table:m, partition:p201810|keep order:false|
|  │ ├─TableReader_932|0.00|root|""|data:Selection_931|
|  │ │ └─Selection_931|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_930|56.00|cop[tiflash]|table:m, partition:p201811|keep order:false|
|  │ ├─TableReader_938|0.00|root|""|data:Selection_937|
|  │ │ └─Selection_937|0.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_936|35.00|cop[tiflash]|table:m, partition:p201812|keep order:false|
|  │ ├─TableReader_944|4.00|root|""|data:Selection_943|
|  │ │ └─Selection_943|4.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_942|71.00|cop[tiflash]|table:m, partition:p201901|keep order:false|
|  │ ├─TableReader_950|5.00|root|""|data:Selection_949|
|  │ │ └─Selection_949|5.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_948|54.00|cop[tiflash]|table:m, partition:p201902|keep order:false|
|  │ ├─TableReader_956|9.31|root|""|data:Selection_955|
|  │ │ └─Selection_955|9.31|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_954|91.00|cop[tiflash]|table:m, partition:p201903|keep order:false|
|  │ ├─TableReader_962|4.00|root|""|data:Selection_961|
|  │ │ └─Selection_961|4.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_960|108.00|cop[tiflash]|table:m, partition:p201904|keep order:false|
|  │ ├─TableReader_968|6.00|root|""|data:Selection_967|
|  │ │ └─Selection_967|6.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_966|115.00|cop[tiflash]|table:m, partition:p201905|keep order:false|
|  │ ├─TableReader_974|10.00|root|""|data:Selection_973|
|  │ │ └─Selection_973|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_972|122.00|cop[tiflash]|table:m, partition:p201906|keep order:false|
|  │ ├─TableReader_980|8.00|root|""|data:Selection_979|
|  │ │ └─Selection_979|8.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_978|116.00|cop[tiflash]|table:m, partition:p201907|keep order:false|
|  │ ├─TableReader_986|5.00|root|""|data:Selection_985|
|  │ │ └─Selection_985|5.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_984|138.00|cop[tiflash]|table:m, partition:p201908|keep order:false|
|  │ ├─TableReader_992|7.00|root|""|data:Selection_991|
|  │ │ └─Selection_991|7.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_990|182.00|cop[tiflash]|table:m, partition:p201909|keep order:false|
|  │ ├─TableReader_998|9.00|root|""|data:Selection_997|
|  │ │ └─Selection_997|9.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_996|161.00|cop[tiflash]|table:m, partition:p201910|keep order:false|
|  │ ├─TableReader_1004|8.00|root|""|data:Selection_1003|
|  │ │ └─Selection_1003|8.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1002|175.00|cop[tiflash]|table:m, partition:p201911|keep order:false|
|  │ ├─TableReader_1010|13.00|root|""|data:Selection_1009|
|  │ │ └─Selection_1009|13.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1008|134.00|cop[tiflash]|table:m, partition:p201912|keep order:false|
|  │ ├─TableReader_1016|8.00|root|""|data:Selection_1015|
|  │ │ └─Selection_1015|8.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1014|157.00|cop[tiflash]|table:m, partition:p202001|keep order:false|
|  │ ├─TableReader_1022|16.00|root|""|data:Selection_1021|
|  │ │ └─Selection_1021|16.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1020|225.00|cop[tiflash]|table:m, partition:p202002|keep order:false|
|  │ ├─TableReader_1028|20.00|root|""|data:Selection_1027|
|  │ │ └─Selection_1027|20.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1026|341.00|cop[tiflash]|table:m, partition:p202003|keep order:false|
|  │ ├─TableReader_1034|18.00|root|""|data:Selection_1033|
|  │ │ └─Selection_1033|18.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1032|379.00|cop[tiflash]|table:m, partition:p202004|keep order:false|
|  │ ├─TableReader_1040|101.00|root|""|data:Selection_1039|
|  │ │ └─Selection_1039|101.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1038|476.00|cop[tiflash]|table:m, partition:p202005|keep order:false|
|  │ ├─TableReader_1046|187.31|root|""|data:Selection_1045|
|  │ │ └─Selection_1045|187.31|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1044|607.00|cop[tiflash]|table:m, partition:p202006|keep order:false|
|  │ ├─TableReader_1052|195.00|root|""|data:Selection_1051|
|  │ │ └─Selection_1051|195.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1050|645.00|cop[tiflash]|table:m, partition:p202007|keep order:false|
|  │ ├─TableReader_1058|164.00|root|""|data:Selection_1057|
|  │ │ └─Selection_1057|164.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1056|759.00|cop[tiflash]|table:m, partition:p202008|keep order:false|
|  │ ├─TableReader_1064|170.00|root|""|data:Selection_1063|
|  │ │ └─Selection_1063|170.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1062|889.00|cop[tiflash]|table:m, partition:p202009|keep order:false|
|  │ ├─TableReader_1070|178.00|root|""|data:Selection_1069|
|  │ │ └─Selection_1069|178.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1068|986.00|cop[tiflash]|table:m, partition:p202010|keep order:false|
|  │ ├─TableReader_1076|160.00|root|""|data:Selection_1075|
|  │ │ └─Selection_1075|160.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1074|851.00|cop[tiflash]|table:m, partition:p202011|keep order:false|
|  │ ├─TableReader_1082|206.00|root|""|data:Selection_1081|
|  │ │ └─Selection_1081|206.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1080|979.00|cop[tiflash]|table:m, partition:p202012|keep order:false|
|  │ ├─TableReader_1088|11576.86|root|""|data:Selection_1087|
|  │ │ └─Selection_1087|11576.86|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1086|139636.00|cop[tiflash]|table:m, partition:p202101|keep order:false|
|  │ ├─TableReader_1094|17124.49|root|""|data:Selection_1093|
|  │ │ └─Selection_1093|17124.49|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1092|151482.00|cop[tiflash]|table:m, partition:p202102|keep order:false|
|  │ ├─TableReader_1100|28086.96|root|""|data:Selection_1099|
|  │ │ └─Selection_1099|28086.96|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1098|249083.00|cop[tiflash]|table:m, partition:p202103|keep order:false|
|  │ ├─TableReader_1106|16253.86|root|""|data:Selection_1105|
|  │ │ └─Selection_1105|16253.86|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1104|294474.00|cop[tiflash]|table:m, partition:p202104|keep order:false|
|  │ ├─TableReader_1112|17711.15|root|""|data:Selection_1111|
|  │ │ └─Selection_1111|17711.15|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1110|373435.00|cop[tiflash]|table:m, partition:p202105|keep order:false|
|  │ ├─TableReader_1118|19308.34|root|""|data:Selection_1117|
|  │ │ └─Selection_1117|19308.34|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1116|377256.00|cop[tiflash]|table:m, partition:p202106|keep order:false|
|  │ ├─TableReader_1124|20365.70|root|""|data:Selection_1123|
|  │ │ └─Selection_1123|20365.70|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1122|387199.00|cop[tiflash]|table:m, partition:p202107|keep order:false|
|  │ ├─TableReader_1130|22012.12|root|""|data:Selection_1129|
|  │ │ └─Selection_1129|22012.12|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1128|363670.00|cop[tiflash]|table:m, partition:p202108|keep order:false|
|  │ ├─TableReader_1136|35380.45|root|""|data:Selection_1135|
|  │ │ └─Selection_1135|35380.45|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1134|498532.00|cop[tiflash]|table:m, partition:p202109|keep order:false|
|  │ ├─TableReader_1142|33707.10|root|""|data:Selection_1141|
|  │ │ └─Selection_1141|33707.10|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1140|864763.00|cop[tiflash]|table:m, partition:p202110|keep order:false|
|  │ ├─TableReader_1148|57156.55|root|""|data:Selection_1147|
|  │ │ └─Selection_1147|57156.55|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1146|682646.00|cop[tiflash]|table:m, partition:p202111|keep order:false|
|  │ ├─TableReader_1154|85609.08|root|""|data:Selection_1153|
|  │ │ └─Selection_1153|85609.08|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1152|620890.00|cop[tiflash]|table:m, partition:p202112|keep order:false|
|  │ ├─TableReader_1160|422219.38|root|""|data:Selection_1159|
|  │ │ └─Selection_1159|422219.38|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1158|1065752.00|cop[tiflash]|table:m, partition:p202201|keep order:false|
|  │ ├─TableReader_1166|721447.58|root|""|data:Selection_1165|
|  │ │ └─Selection_1165|721447.58|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1164|2498566.00|cop[tiflash]|table:m, partition:p202202|keep order:false|
|  │ ├─TableReader_1172|4960878.37|root|""|data:Selection_1171|
|  │ │ └─Selection_1171|4960878.37|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1170|6322449.00|cop[tiflash]|table:m, partition:p202203|keep order:false|
|  │ ├─TableReader_1178|1696568.48|root|""|data:Selection_1177|
|  │ │ └─Selection_1177|1696568.48|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1176|2644530.00|cop[tiflash]|table:m, partition:p202204|keep order:false|
|  │ ├─TableReader_1184|0.03|root|""|data:Selection_1183|
|  │ │ └─Selection_1183|0.03|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1182|33.00|cop[tiflash]|table:m, partition:p202205|keep order:false, stats:pseudo|
|  │ ├─TableReader_1190|0.12|root|""|data:Selection_1189|
|  │ │ └─Selection_1189|0.12|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1188|124.00|cop[tiflash]|table:m, partition:p202206|keep order:false, stats:pseudo|
|  │ ├─TableReader_1196|10.00|root|""|data:Selection_1195|
|  │ │ └─Selection_1195|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1194|10000.00|cop[tiflash]|table:m, partition:p202207|keep order:false, stats:pseudo|
|  │ ├─TableReader_1202|10.00|root|""|data:Selection_1201|
|  │ │ └─Selection_1201|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1200|10000.00|cop[tiflash]|table:m, partition:p202208|keep order:false, stats:pseudo|
|  │ ├─TableReader_1208|10.00|root|""|data:Selection_1207|
|  │ │ └─Selection_1207|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1206|10000.00|cop[tiflash]|table:m, partition:p202209|keep order:false, stats:pseudo|
|  │ ├─TableReader_1214|10.00|root|""|data:Selection_1213|
|  │ │ └─Selection_1213|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1212|10000.00|cop[tiflash]|table:m, partition:p202210|keep order:false, stats:pseudo|
|  │ ├─TableReader_1220|10.00|root|""|data:Selection_1219|
|  │ │ └─Selection_1219|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1218|10000.00|cop[tiflash]|table:m, partition:p202211|keep order:false, stats:pseudo|
|  │ ├─TableReader_1226|10.00|root|""|data:Selection_1225|
|  │ │ └─Selection_1225|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1224|10000.00|cop[tiflash]|table:m, partition:p202212|keep order:false, stats:pseudo|
|  │ ├─TableReader_1232|10.00|root|""|data:Selection_1231|
|  │ │ └─Selection_1231|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1230|10000.00|cop[tiflash]|table:m, partition:p202301|keep order:false, stats:pseudo|
|  │ ├─TableReader_1238|10.00|root|""|data:Selection_1237|
|  │ │ └─Selection_1237|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1236|10000.00|cop[tiflash]|table:m, partition:p202302|keep order:false, stats:pseudo|
|  │ ├─TableReader_1244|10.00|root|""|data:Selection_1243|
|  │ │ └─Selection_1243|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1242|10000.00|cop[tiflash]|table:m, partition:p202303|keep order:false, stats:pseudo|
|  │ ├─TableReader_1250|10.00|root|""|data:Selection_1249|
|  │ │ └─Selection_1249|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1248|10000.00|cop[tiflash]|table:m, partition:p202304|keep order:false, stats:pseudo|
|  │ ├─TableReader_1256|10.00|root|""|data:Selection_1255|
|  │ │ └─Selection_1255|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1254|10000.00|cop[tiflash]|table:m, partition:p202305|keep order:false, stats:pseudo|
|  │ ├─TableReader_1262|10.00|root|""|data:Selection_1261|
|  │ │ └─Selection_1261|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1260|10000.00|cop[tiflash]|table:m, partition:p202306|keep order:false, stats:pseudo|
|  │ ├─TableReader_1268|10.00|root|""|data:Selection_1267|
|  │ │ └─Selection_1267|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1266|10000.00|cop[tiflash]|table:m, partition:p202307|keep order:false, stats:pseudo|
|  │ ├─TableReader_1274|10.00|root|""|data:Selection_1273|
|  │ │ └─Selection_1273|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1272|10000.00|cop[tiflash]|table:m, partition:p202308|keep order:false, stats:pseudo|
|  │ ├─TableReader_1280|10.00|root|""|data:Selection_1279|
|  │ │ └─Selection_1279|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1278|10000.00|cop[tiflash]|table:m, partition:p202309|keep order:false, stats:pseudo|
|  │ ├─TableReader_1286|10.00|root|""|data:Selection_1285|
|  │ │ └─Selection_1285|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1284|10000.00|cop[tiflash]|table:m, partition:p202310|keep order:false, stats:pseudo|
|  │ ├─TableReader_1292|10.00|root|""|data:Selection_1291|
|  │ │ └─Selection_1291|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1290|10000.00|cop[tiflash]|table:m, partition:p202311|keep order:false, stats:pseudo|
|  │ ├─TableReader_1298|10.00|root|""|data:Selection_1297|
|  │ │ └─Selection_1297|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │ │   └─TableFullScan_1296|10000.00|cop[tiflash]|table:m, partition:p202312|keep order:false, stats:pseudo|
|  │ └─TableReader_1304|10.00|root|""|data:Selection_1303|
|  │   └─Selection_1303|10.00|cop[tiflash]|""|eq(mbase_test. d_media.download_status, 0)|
|  │     └─TableFullScan_1302|10000.00|cop[tiflash]|table:m, partition:p299913|keep order:false, stats:pseudo|
|  └─PartitionUnion_1305(Probe)|2010.17|root|""|""|
|    ├─TableReader_1310|2010.17|root|""|data:TableFullScan_1309|
|    │ └─TableFullScan_1309|2010.17|cop[tiflash]|table:v, partition:p2018|keep order:false, stats:pseudo|
|    ├─TableReader_1314|337.00|root|""|data:TableFullScan_1313|
|    │ └─TableFullScan_1313|337.00|cop[tiflash]|table:v, partition:p201901|keep order:false|
|    ├─TableReader_1318|311.00|root|""|data:TableFullScan_1317|
|    │ └─TableFullScan_1317|311.00|cop[tiflash]|table:v, partition:p201902|keep order:false|
|    ├─TableReader_1322|371.00|root|""|data:TableFullScan_1321|
|    │ └─TableFullScan_1321|371.00|cop[tiflash]|table:v, partition:p201903|keep order:false|
|    ├─TableReader_1326|464.00|root|""|data:TableFullScan_1325|
|    │ └─TableFullScan_1325|464.00|cop[tiflash]|table:v, partition:p201904|keep order:false|
|    ├─TableReader_1330|526.00|root|""|data:TableFullScan_1329|
|    │ └─TableFullScan_1329|526.00|cop[tiflash]|table:v, partition:p201905|keep order:false|
|    ├─TableReader_1334|496.00|root|""|data:TableFullScan_1333|
|    │ └─TableFullScan_1333|496.00|cop[tiflash]|table:v, partition:p201906|keep order:false|
|    ├─TableReader_1338|546.00|root|""|data:TableFullScan_1337|
|    │ └─TableFullScan_1337|546.00|cop[tiflash]|table:v, partition:p201907|keep order:false|
|    ├─TableReader_1342|630.00|root|""|data:TableFullScan_1341|
|    │ └─TableFullScan_1341|630.00|cop[tiflash]|table:v, partition:p201908|keep order:false|
|    ├─TableReader_1346|2010.17|root|""|data:TableFullScan_1345|
|    │ └─TableFullScan_1345|2010.17|cop[tiflash]|table:v, partition:p201909|keep order:false|
|    ├─TableReader_1350|815.00|root|""|data:TableFullScan_1349|
|    │ └─TableFullScan_1349|815.00|cop[tiflash]|table:v, partition:p201910|keep order:false|
|    ├─TableReader_1354|850.00|root|""|data:TableFullScan_1353|
|    │ └─TableFullScan_1353|850.00|cop[tiflash]|table:v, partition:p201911|keep order:false|
|    ├─TableReader_1358|920.00|root|""|data:TableFullScan_1357|
|    │ └─TableFullScan_1357|920.00|cop[tiflash]|table:v, partition:p201912|keep order:false|
|    ├─TableReader_1362|869.00|root|""|data:TableFullScan_1361|
|    │ └─TableFullScan_1361|869.00|cop[tiflash]|table:v, partition:p202001|keep order:false|
|    ├─TableReader_1366|946.00|root|""|data:TableFullScan_1365|
|    │ └─TableFullScan_1365|946.00|cop[tiflash]|table:v, partition:p202002|keep order:false|
|    ├─TableReader_1370|1022.00|root|""|data:TableFullScan_1369|
|    │ └─TableFullScan_1369|1022.00|cop[tiflash]|table:v, partition:p202003|keep order:false|
|    ├─TableReader_1374|1219.00|root|""|data:TableFullScan_1373|
|    │ └─TableFullScan_1373|1219.00|cop[tiflash]|table:v, partition:p202004|keep order:false|
|    ├─TableReader_1378|1839.00|root|""|data:TableFullScan_1377|
|    │ └─TableFullScan_1377|1839.00|cop[tiflash]|table:v, partition:p202005|keep order:false|
|    ├─TableReader_1382|2010.17|root|""|data:TableFullScan_1381|
|    │ └─TableFullScan_1381|2010.17|cop[tiflash]|table:v, partition:p202006|keep order:false|
|    ├─TableReader_1386|2010.17|root|""|data:TableFullScan_1385|
|    │ └─TableFullScan_1385|2010.17|cop[tiflash]|table:v, partition:p202007|keep order:false|
|    ├─TableReader_1390|1785.00|root|""|data:TableFullScan_1389|
|    │ └─TableFullScan_1389|1785.00|cop[tiflash]|table:v, partition:p202008|keep order:false|
|    ├─TableReader_1394|2010.17|root|""|data:TableFullScan_1393|
|    │ └─TableFullScan_1393|2010.17|cop[tiflash]|table:v, partition:p202009|keep order:false|
|    ├─TableReader_1398|2010.17|root|""|data:TableFullScan_1397|
|    │ └─TableFullScan_1397|2010.17|cop[tiflash]|table:v, partition:p202010|keep order:false|
|    ├─TableReader_1402|2010.17|root|""|data:TableFullScan_1401|
|    │ └─TableFullScan_1401|2010.17|cop[tiflash]|table:v, partition:p202011|keep order:false|
|    ├─TableReader_1406|2010.17|root|""|data:TableFullScan_1405|
|    │ └─TableFullScan_1405|2010.17|cop[tiflash]|table:v, partition:p202012|keep order:false|
|    ├─TableReader_1410|2010.17|root|""|data:TableFullScan_1409|
|    │ └─TableFullScan_1409|2010.17|cop[tiflash]|table:v, partition:p202101|keep order:false|
|    ├─TableReader_1414|2010.17|root|""|data:TableFullScan_1413|
|    │ └─TableFullScan_1413|2010.17|cop[tiflash]|table:v, partition:p202102|keep order:false|
|    ├─TableReader_1418|2010.17|root|""|data:TableFullScan_1417|
|    │ └─TableFullScan_1417|2010.17|cop[tiflash]|table:v, partition:p202103|keep order:false|
|    ├─TableReader_1422|2010.17|root|""|data:TableFullScan_1421|
|    │ └─TableFullScan_1421|2010.17|cop[tiflash]|table:v, partition:p202104|keep order:false|
|    ├─TableReader_1426|2010.17|root|""|data:TableFullScan_1425|
|    │ └─TableFullScan_1425|2010.17|cop[tiflash]|table:v, partition:p202105|keep order:false|
|    ├─TableReader_1430|2010.17|root|""|data:TableFullScan_1429|
|    │ └─TableFullScan_1429|2010.17|cop[tiflash]|table:v, partition:p202106|keep order:false|
|    ├─TableReader_1434|2010.17|root|""|data:TableFullScan_1433|
|    │ └─TableFullScan_1433|2010.17|cop[tiflash]|table:v, partition:p202107|keep order:false|
|    ├─TableReader_1438|2010.17|root|""|data:TableFullScan_1437|
|    │ └─TableFullScan_1437|2010.17|cop[tiflash]|table:v, partition:p202108|keep order:false|
|    ├─TableReader_1442|2010.17|root|""|data:TableFullScan_1441|
|    │ └─TableFullScan_1441|2010.17|cop[tiflash]|table:v, partition:p202109|keep order:false|
|    ├─TableReader_1446|2010.17|root|""|data:TableFullScan_1445|
|    │ └─TableFullScan_1445|2010.17|cop[tiflash]|table:v, partition:p202110|keep order:false|
|    ├─TableReader_1450|2010.17|root|""|data:TableFullScan_1449|
|    │ └─TableFullScan_1449|2010.17|cop[tiflash]|table:v, partition:p202111|keep order:false|
|    ├─TableReader_1454|2010.17|root|""|data:TableFullScan_1453|
|    │ └─TableFullScan_1453|2010.17|cop[tiflash]|table:v, partition:p202112|keep order:false|
|    ├─TableReader_1458|2010.17|root|""|data:TableFullScan_1457|
|    │ └─TableFullScan_1457|2010.17|cop[tiflash]|table:v, partition:p202201|keep order:false|
|    ├─TableReader_1462|2010.17|root|""|data:TableFullScan_1461|
|    │ └─TableFullScan_1461|2010.17|cop[tiflash]|table:v, partition:p202202|keep order:false|
|    ├─TableReader_1466|2010.17|root|""|data:TableFullScan_1465|
|    │ └─TableFullScan_1465|2010.17|cop[tiflash]|table:v, partition:p202203|keep order:false|
|    ├─TableReader_1470|2010.17|root|""|data:TableFullScan_1469|
|    │ └─TableFullScan_1469|2010.17|cop[tiflash]|table:v, partition:p202204|keep order:false|
|    ├─TableReader_1474|75.00|root|""|data:TableFullScan_1473|
|    │ └─TableFullScan_1473|75.00|cop[tiflash]|table:v, partition:p202205|keep order:false, stats:pseudo|
|    ├─TableReader_1478|270.00|root|""|data:TableFullScan_1477|
|    │ └─TableFullScan_1477|270.00|cop[tiflash]|table:v, partition:p202206|keep order:false, stats:pseudo|
|    ├─TableReader_1482|2010.17|root|""|data:TableFullScan_1481|
|    │ └─TableFullScan_1481|2010.17|cop[tiflash]|table:v, partition:p202207|keep order:false, stats:pseudo|
|    ├─TableReader_1486|2010.17|root|""|data:TableFullScan_1485|
|    │ └─TableFullScan_1485|2010.17|cop[tiflash]|table:v, partition:p202208|keep order:false, stats:pseudo|
|    ├─TableReader_1490|2010.17|root|""|data:TableFullScan_1489|
|    │ └─TableFullScan_1489|2010.17|cop[tiflash]|table:v, partition:p202209|keep order:false, stats:pseudo|
|    ├─TableReader_1494|2010.17|root|""|data:TableFullScan_1493|
|    │ └─TableFullScan_1493|2010.17|cop[tiflash]|table:v, partition:p202210|keep order:false, stats:pseudo|
|    ├─TableReader_1498|2010.17|root|""|data:TableFullScan_1497|
|    │ └─TableFullScan_1497|2010.17|cop[tiflash]|table:v, partition:p202211|keep order:false, stats:pseudo|
|    ├─TableReader_1502|2010.17|root|""|data:TableFullScan_1501|
|    │ └─TableFullScan_1501|2010.17|cop[tiflash]|table:v, partition:p202212|keep order:false, stats:pseudo|
|    ├─TableReader_1506|2010.17|root|""|data:TableFullScan_1505|
|    │ └─TableFullScan_1505|2010.17|cop[tiflash]|table:v, partition:p202301|keep order:false, stats:pseudo|
|    ├─TableReader_1510|2010.17|root|""|data:TableFullScan_1509|
|    │ └─TableFullScan_1509|2010.17|cop[tiflash]|table:v, partition:p202302|keep order:false, stats:pseudo|
|    ├─TableReader_1514|2010.17|root|""|data:TableFullScan_1513|
|    │ └─TableFullScan_1513|2010.17|cop[tiflash]|table:v, partition:p202303|keep order:false, stats:pseudo|
|    ├─TableReader_1518|2010.17|root|""|data:TableFullScan_1517|
|    │ └─TableFullScan_1517|2010.17|cop[tiflash]|table:v, partition:p202304|keep order:false, stats:pseudo|
|    ├─TableReader_1522|2010.17|root|""|data:TableFullScan_1521|
|    │ └─TableFullScan_1521|2010.17|cop[tiflash]|table:v, partition:p202305|keep order:false, stats:pseudo|
|    ├─TableReader_1526|2010.17|root|""|data:TableFullScan_1525|
|    │ └─TableFullScan_1525|2010.17|cop[tiflash]|table:v, partition:p202306|keep order:false, stats:pseudo|
|    ├─TableReader_1530|2010.17|root|""|data:TableFullScan_1529|
|    │ └─TableFullScan_1529|2010.17|cop[tiflash]|table:v, partition:p202307|keep order:false, stats:pseudo|
|    ├─TableReader_1534|2010.17|root|""|data:TableFullScan_1533|
|    │ └─TableFullScan_1533|2010.17|cop[tiflash]|table:v, partition:p202308|keep order:false, stats:pseudo|
|    ├─TableReader_1538|2010.17|root|""|data:TableFullScan_1537|
|    │ └─TableFullScan_1537|2010.17|cop[tiflash]|table:v, partition:p202309|keep order:false, stats:pseudo|
|    ├─TableReader_1542|2010.17|root|""|data:TableFullScan_1541|
|    │ └─TableFullScan_1541|2010.17|cop[tiflash]|table:v, partition:p202310|keep order:false, stats:pseudo|
|    ├─TableReader_1546|2010.17|root|""|data:TableFullScan_1545|
|    │ └─TableFullScan_1545|2010.17|cop[tiflash]|table:v, partition:p202311|keep order:false, stats:pseudo|
|    ├─TableReader_1550|2010.17|root|""|data:TableFullScan_1549|
|    │ └─TableFullScan_1549|2010.17|cop[tiflash]|table:v, partition:p202312|keep order:false, stats:pseudo|
|    └─TableReader_1554|2010.17|root|""|data:TableFullScan_1553|
|      └─TableFullScan_1553|2010.17|cop[tiflash]|table:v, partition:p299913|keep order:false, stats:pseudo|

从上面的日志里,除了数据损坏相关的log,其他的错误信息还能发现其他问题吗? 还有从监控上看到的线程数暴涨到3万多应该也不正常吧?可能是什么原因?

在监控上看到 3 万多的线程数,是由于 tiflash 在处理请求的时候,短时间内触发了大量的 region not found,产生了许多内部重试请求 (在下面的监控上显示为 “cop”)。在接收到这类型的重试请求时,会生成新的线程,因此造成 3 万多的线程数。重试请求的数量超过 tiflash 的处理能力,产生任务堆积,进而带来的对外无响应。

有类似的 issue:https://github.com/pingcap/tiflash/issues/3696

image
image

麻烦在服务器上,执行 cat /proc/sys/vm/max_map_count, 确认下参数值是多少?

65530

这个issue里的解决方案上了吗?

另外还有几个问题:

  1. issue里的TiFlash内存都跑到很高了,我们的服务器TiFlash内存最高的时候也才30多G,服务器上至少还有100G的内存可用,为什么这些资源没有充分利用上?是不是和raftstore-proxy.memory-usage-limit=29471976106 这个配置有关系?

  1. 为什么会在短时间内触发大量region not found

关于 issue#3696 及这个 case 中的内存使用

每个线程创建的时候都会利用 mmap 向操作系统申请线程的栈内存,因此在默认的 vm.max_map_count=65530 下,每个进程可创建的线程数大约就是 3w 个。超过这个数量之后,即使内存还有富余,但创建线程时还会导致返回 errno=12,显示 “Cannot allocate memory” 的异常。(通过简单的代码可以验证)

issue#3696 中每个任务占用的内存较多,因此表现为 “cop” 任务数及内存都快速上涨,并内存使用触达了可用上限 OOM 了。但是你这个环境里面,内存使用上涨不快,但是线程数已经达到 3w 个。表现为内存使用不高,但是却看到类似 “Cannot malloc 1.00 MiB., errno: 12, strerror: Cannot allocate memory” 的错误信息。跟 “raftstore-proxy.memory-usage-limit=29471976106” 配置没有关系。

之前这个 issue 定的优先级不是很高,目前还没有实现相关的优化。

想进一步确认是否上面的问题,你这边可以在 Grafana 的 TiFlash-Summary 面板,右上角时间范围选 2022-06-23 02:00:00 到 2022-06-23 13:00:00,左上角分别选择192.168.14.25,192.168.14.28,两个实例各自用工具导出 json 文件,我们这边再确认下哈~ https://metricstool.pingcap.com/#backup-with-dev-tools

image
image


tiflash 异常问题

上述 issue#3696 描述想解决的问题,是一个 tiflash 发生了异常后,减少对其他 tiflash 实例以及业务查询的影响。但是第一个 tiflash为何发生异常,或者为什么短时间内触发大量 region not found 异常,还需要收集其他信息来判断。

SQ-cluster-TiFlash-Summary_192-168-14-25.json (135.7 KB) SQ-cluster-TiFlash-Summary_192-168-14-28.json (135.7 KB)

这是那个时间段这两台TiFlash Summary的监控。

从 json 文件恢复出来没有数据。。看起来导出这两个监控的时候,数据还没有加载完。可以在选择好实例之后,等待约1~2min,等监控数据加载好了之后,再导出 json 么?

SQ-cluster-TiFlash-Summary_14-25.json (4.2 MB) SQ-cluster-TiFlash-Summary_14-28.json (4.4 MB)

不好意思,重新导了。

1 个赞

在dmsg里发现一下错误日志:

[Jun23 08:37] cop-pool[164685]: segfault at 0 ip 00000000080c6268 sp 00007fa86c0247f0 error 4 in tiflash[1d0f000+6c46000]
[Jun23 08:41] cop-pool[236789]: segfault at 0 ip 00000000080c6268 sp 00007f38696be7f0 error 4 in tiflash[1d0f000+6c46000]
[Jun23 09:03] perf: interrupt took too long (5054 > 4997), lowering kernel.perf_event_max_sample_rate to 39000
[Jun23 09:13] cop-pool[188418]: segfault at 0 ip 00000000080c6268 sp 00007f6980f757f0 error 4 in tiflash[1d0f000+6c46000]
[Jun23 09:28] cop-pool[75284]: segfault at 0 ip 00000000080c6268 sp 00007fbd163c27f0 error 4 in tiflash[1d0f000+6c46000]
[  +0.047594] cop-pool[75286]: segfault at 0 ip 00000000080c6268 sp 00007fbd14fc07f0 error 4 in tiflash[1d0f000+6c46000]
[Jun23 09:31] BkgPool1[143837]: segfault at 0 ip 00000000080c6268 sp 00007fac54ab98c0 error 4 in tiflash[1d0f000+6c46000]
[  +0.000017] BkgPool8[143844]: segfault at 0 ip 00000000080c6268 sp 00007fac4edc98c0 error 4
[  +0.000000] BkgPool0[143836]: segfault at 0 ip 00000000080c6268 sp 00007fac554ba8c0 error 4 in tiflash[1d0f000+6c46000]
[  +0.000006]  in tiflash[1d0f000+6c46000]


[  +0.010451] BkgPool9[143845]: segfault at 0 ip 00000000080c6268 sp 00007fac4e3c88c0 error 4 in tiflash[1d0f000+6c46000]
[  +0.000928] BkgPool2[143838]: segfault at 0 ip 00000000080c6268 sp 00007fac53dca770 error 4 in tiflash[1d0f000+6c46000]
[Jun23 10:33] perf: interrupt took too long (6321 > 6317), lowering kernel.perf_event_max_sample_rate to 31000
[Jun23 10:53] TiFlashMain[143215]: segfault at 8ad5488 ip 0000000008ad5488 sp 00007fa4a8a13e08 error 15 in tiflash[8955000+181000]
1 个赞

从上面两个实例的监控中可以观察到,“cop” 请求堆积的数量和 tiflash 实例重启存在较强的相关性。判断 tiflash 只用 30 GB 但是会重启的原因是 issue#3696 中类似的问题。

dmsg 里面的错误日志,从经验上来说,是线程数太多,报错时回溯堆栈导致的系统卡顿,也支持这个判断。



从监控上看到 cop 达到 3万的时间,基本会有 region not found error。但是 region not found error 不一定会导致 cop 数量暴涨。这个问题拉另外一位同事来协助分析下哈~

region not found 这个问题,可以在 tiflash log 搜索一下 “RegionException: region ”,从中挑选为 region not found 的 region 同时记录一下时间,在 tikv 日志中搜索 “region_id=xxx”,xxx 为 region id,主要需要看一下相近时间的日志。

可以挑选几个 region 把 tiflash 和 tikv 的日志发上来吗?

另外能提供下 tidb 的这几个参数的值么:
tidb_index_lookup_join_concurrency
tidb_distsql_scan_concurrency
tidb_executor_concurrency
以及
d_media 这个表的分区数

tidb_index_lookup_join_concurrency=-1
tidb_distsql_scan_concurrency=32
tidb_executor_concurrency=16

d_media 这个表按照月做的分区,一共74个分区:

PARTITION BY RANGE (TO_DAYS(date))
(PARTITION p2017 VALUES LESS THAN (737060),
PARTITION p201801 VALUES LESS THAN (737091),
PARTITION p201802 VALUES LESS THAN (737119),
PARTITION p201803 VALUES LESS THAN (737150),
PARTITION p201804 VALUES LESS THAN (737180),
PARTITION p201805 VALUES LESS THAN (737211),
PARTITION p201806 VALUES LESS THAN (737241),
PARTITION p201807 VALUES LESS THAN (737272),
PARTITION p201808 VALUES LESS THAN (737303),
PARTITION p201809 VALUES LESS THAN (737333),
PARTITION p201810 VALUES LESS THAN (737364),
PARTITION p201811 VALUES LESS THAN (737394),
PARTITION p201812 VALUES LESS THAN (737425),
PARTITION p201901 VALUES LESS THAN (737456),
PARTITION p201902 VALUES LESS THAN (737484),
PARTITION p201903 VALUES LESS THAN (737515),
PARTITION p201904 VALUES LESS THAN (737545),
PARTITION p201905 VALUES LESS THAN (737576),
PARTITION p201906 VALUES LESS THAN (737606),
PARTITION p201907 VALUES LESS THAN (737637),
PARTITION p201908 VALUES LESS THAN (737668),
PARTITION p201909 VALUES LESS THAN (737698),
PARTITION p201910 VALUES LESS THAN (737729),
PARTITION p201911 VALUES LESS THAN (737759),
PARTITION p201912 VALUES LESS THAN (737790),
PARTITION p202001 VALUES LESS THAN (737821),
PARTITION p202002 VALUES LESS THAN (737850),
PARTITION p202003 VALUES LESS THAN (737881),
PARTITION p202004 VALUES LESS THAN (737911),
PARTITION p202005 VALUES LESS THAN (737942),
PARTITION p202006 VALUES LESS THAN (737972),
PARTITION p202007 VALUES LESS THAN (738003),
PARTITION p202008 VALUES LESS THAN (738034),
PARTITION p202009 VALUES LESS THAN (738064),
PARTITION p202010 VALUES LESS THAN (738095),
PARTITION p202011 VALUES LESS THAN (738125),
PARTITION p202012 VALUES LESS THAN (738156),
PARTITION p202101 VALUES LESS THAN (738187),
PARTITION p202102 VALUES LESS THAN (738215),
PARTITION p202103 VALUES LESS THAN (738246),
PARTITION p202104 VALUES LESS THAN (738276),
PARTITION p202105 VALUES LESS THAN (738307),
PARTITION p202106 VALUES LESS THAN (738337),
PARTITION p202107 VALUES LESS THAN (738368),
PARTITION p202108 VALUES LESS THAN (738399),
PARTITION p202109 VALUES LESS THAN (738429),
PARTITION p202110 VALUES LESS THAN (738460),
PARTITION p202111 VALUES LESS THAN (738490),
PARTITION p202112 VALUES LESS THAN (738521),
PARTITION p202201 VALUES LESS THAN (738552),
PARTITION p202202 VALUES LESS THAN (738580),
PARTITION p202203 VALUES LESS THAN (738611),
PARTITION p202204 VALUES LESS THAN (738641),
PARTITION p202205 VALUES LESS THAN (738672),
PARTITION p202206 VALUES LESS THAN (738702),
PARTITION p202207 VALUES LESS THAN (738733),
PARTITION p202208 VALUES LESS THAN (738764),
PARTITION p202209 VALUES LESS THAN (738794),
PARTITION p202210 VALUES LESS THAN (738825),
PARTITION p202211 VALUES LESS THAN (738855),
PARTITION p202212 VALUES LESS THAN (738886),
PARTITION p202301 VALUES LESS THAN (738917),
PARTITION p202302 VALUES LESS THAN (738945),
PARTITION p202303 VALUES LESS THAN (738976),
PARTITION p202304 VALUES LESS THAN (739006),
PARTITION p202305 VALUES LESS THAN (739037),
PARTITION p202306 VALUES LESS THAN (739067),
PARTITION p202307 VALUES LESS THAN (739098),
PARTITION p202308 VALUES LESS THAN (739129),
PARTITION p202309 VALUES LESS THAN (739159),
PARTITION p202310 VALUES LESS THAN (739190),
PARTITION p202311 VALUES LESS THAN (739220),
PARTITION p202312 VALUES LESS THAN (739251),
PARTITION p299913 VALUES LESS THAN (MAXVALUE));

exception.log (193.6 KB) region_id.log (175.2 KB)

第一个是从tiflash的log里找到的,第二个是tiflash_tikv的log里找到的。

关于为什么 dynamic partition mode下面会挂而改成 static partition prune 就不会挂了这个原因是这样的:
在 static partition prune 模式下,TiDB 通过 PartitionUnion 这个 operator 来访问 TiFlash,对于 union 算子如果它的 children 数量非常多(比如出问题的这个 sql,应该有 74 个 children),union 会有一定的并发限制,具体来说 union 会保证同时运行的 children 数量不超过 tidb_executor_concurrency 个,也就是 16 个。而在 dynamic partition prune 下面,TiDB 会同时并发访问所有的 partition,对这个出问题的 sql 来说,就是会有 74 个 partition 同时被访问。可以看出 dynamic partition prune 模式下 TiFlash 的压力是 static 模式下的将近 5 倍,但是 TiFlash 这边的处理能力是有限的,这导致了大量 cop request 在 TiFlash 这边堆积,而 TiDB 在一定时间(默认是 20s)内如果 cop request 没有返回,又会发起重试,整个系统相当于进入了正反馈的模式,导致 cop request 越来越多,直至 TiFlash crash。
为解决这个问题 TiDB 需要对并发度进行限制( Unlimited high concurrent access to TiFlash may make TiFlash crash when using dynamic partition prune mode · Issue #35864 · pingcap/tidb · GitHub ),而 TiFlash 在遇到大量 cop request 堆积导致处理不过来的时候也需要返回一些明确的 error 让 TiDB 不再继续重试( TiFlash cop thread pool can not handle request with high QPS · Issue #3696 · pingcap/tiflash · GitHub
在后续版本里应该会进行 fix

明白了,非常感谢。实际上我们库里还有很多其他的分区表分区数量比d_media这个表还大的,这些表的访问都会诱发这个问题出现。

另外还想请教一下,实际上我们平时TiFlash的资源利用率并不高,想问一下TiFlash的性能优化这块有哪些配置或参数可以优化,目前官方文档上性能调优这块主要是针对TiKV的,不知道对TiFlash是否适用。

你可以关注下 TiDB 的这些 SQL 都以什么形式访问 TiFlash,理想情况下应该有很多 SQL 都以 MPP 的形式来访问 TiFlash,而且 SQL 的所有计算都是在 TiFlash 中进行的。不过以你贴的 SQL 为例,其实是选择了最低效的一种方式来访问 TiFlash(也就是用 cop 的模式来访问)。如果要对 TiFlash 有针对性的优化的话,第一步应该是先 确保大多数 SQL 都以 MPP 形式下推给了 TiFlash。

不过你们如果主要以分区表为主的话,因为 TiFlash 不支持 static partition prune mode,如果 partition prune mode 设置成 static 的话,可能大部分情况下 TiFlash 只是被当做一个更快的 TiKV 来用了:joy: