tidb版本 6.1.7
背景:我们的架构四个tidb节点,然后有三个tidb是给业务做正常的读写,走的tikv,有一个是给大数据抽数分析的,所有流量走的tiflash,所以算是物理隔离,之所以要这么做是因为之前让优化器自己选的话,大数据抽数的查询是范围查询,每次好几万,然后会被打到tikv,这时候就会影响到业务的正常读写流量(耗时变大),所以我们单独部署了一个tidb给大数据抽数分析用。
现象:近期发现大数据分析有一类limit 1查询报内存不够,导致查询失败。
tiflash配置如下:
tiflash:
log.file.max-days: 30
profiles.default.max_memory_usage: 4294967296
profiles.default.max_memory_usage_for_all_queries: 8589934592
mpp配置默认。
mysql> show variables like '%mpp%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| tidb_allow_mpp | ON |
| tidb_enforce_mpp | OFF |
| tidb_mpp_store_fail_ttl | 60s |
| tidb_opt_mpp_outer_join_fixed_build_side | OFF |
+------------------------------------------+-------+
4 rows in set (0.00 sec)
mysql>
现在连接tidb节点查数具体操作如下
ERROR 1105 (HY000): other error for mpp stream: DB::Exception: Memory limit (for query) exceeded: would use 4.00 GiB (attempt to allocate chunk of 10512061 bytes), maximum: 4.00 GiB: (while reading from DTFile: /work/tidb-oltp-145-v6.1.7/data/tiflash-34145/data/t_77/stable/dmf_1252)
tiflash日志报错如下
[2025/02/19 14:46:02.378 +08:00] [INFO] [MPPTaskStatistics.cpp:127] ["mpp_task_tracing:MPP<query:456116813736706056,task:1> {\"query_tso\":456116813736706056,\"task_id\":1,\"is_root\":true,\"sender_executor_id\":\"ExchangeSender_20\",\"
executors\":[{\"id\":\"ExchangeSender_20\",\"type\":\"ExchangeSender\",\"children\":[\"Limit_19\"],\"outbound_rows\":0,\"outbound_blocks\":0,\"outbound_bytes\":0,\"execution_time_ns\":0,\"partition_num\":1,\"sender_target_task_ids\":[-1
],\"exchange_type\":\"PassThrough\",\"connection_details\":[{\"tunnel_id\":\"tunnel1+-1\",\"sender_target_task_id\":-1,\"sender_target_host\":\"10.45.164.116:43480\",\"is_local\":false,\"packets\":0,\"bytes\":0}]},{\"id\":\"Limit_19\",\"t
ype\":\"Limit\",\"children\":[\"Selection_18\"],\"outbound_rows\":0,\"outbound_blocks\":0,\"outbound_bytes\":0,\"execution_time_ns\":0},{\"id\":\"Selection_18\",\"type\":\"Selection\",\"children\":[\"TableFullScan_17\"],\"outbound_rows\
":0,\"outbound_blocks\":0,\"outbound_bytes\":0,\"execution_time_ns\":0},{\"id\":\"TableFullScan_17\",\"type\":\"TableScan\",\"children\":[],\"outbound_rows\":0,\"outbound_blocks\":0,\"outbound_bytes\":0,\"execution_time_ns\":0,\"connect
ion_details\":[{\"is_local\":true,\"packets\":0,\"bytes\":0},{\"is_local\":false,\"packets\":0,\"bytes\":0}]}],\"host\":\"10.45.165.143:36145\",\"task_init_timestamp\":1739947562184573000,\"task_start_timestamp\":1739947562209934000,\"ta
sk_end_timestamp\":1739947562378048000,\"compile_start_timestamp\":1739947562185304000,\"compile_end_timestamp\":1739947562209834000,\"read_wait_index_start_timestamp\":1739947562186673000,\"read_wait_index_end_timestamp\":1739947562193
046000,\"local_input_bytes\":0,\"remote_input_bytes\":0,\"output_bytes\":0,\"status\":\"CANCELLED\",\"error_message\":\"DB::Exception: Memory limit (for query) exceeded: would use 4.01 GiB (attempt to allocate chunk of 10078218 bytes),
maximum: 4.00 GiB: (while reading from DTFile: /work/tidb-oltp-145-v6.1.7/data/tiflash-34145/data/t_77/stable/dmf_1030)\",\"working_time\":0,\"m
可以发现执行一条简单的select limit 1会报内存不够,但是如果加上where条件就可以执行出结果,但是tiflash没有索引的概念,按说这两个sql都是全表扫描,所以没搞懂为啥,这算是bug还是预期内的现象呢,如果是预期内现象那原因是啥呢?