- 【TiDB 版本】:5.7.25-TiDB-v3.0.2
- 【问题描述】:程序在进行分页批量任务时,耗时越来越久 一天数据大概在1.5千万左右,(group by,order by)完数据量在1.1千万 ,通过一系列group by,order by ,发现在开始前半小时内,数据统计很快,能统计完70%左右的数据,但是后面的30%越来越慢,一直延迟到9~10小时,甚至更长,在此过程中程序没有报错,tidb内存占用很高,没有释放,tidb相关日志也没有发现错误信息,不明白为啥执行到后面越来越慢。?可能是因为数据都汇总到了同一个时间点(零点),导致tidb不断要对已汇总完的数据重新分片,做一致性处理吗,所以越到后面数据量越多,处理的速度越慢?
-
测试过先把一天数据汇总到临时表,再到数据表 ,防止源数据表数据量过大,导致查询慢,结果也是出现上述情况。
-
测试过把一天数据按6小时分段处理,第一段6小时处理很快10分钟,第二段又出现上述情况,最后20%~30%的数据处理很慢。
-
统计逻辑
先删除该时间段的表数据-> select (分页10w) -> insert (批量2w)
24小时的小时数据->汇总当天的零点数据
-
sql语句
SELECT camp_id,aff_id, aff_m_id, adv_id,adv_m_id,country_id,source_id, SUM(clicks) AS clicks,SUM(unique_clicks) AS unique_clicks, SUM(conversions) AS conversions, SUM(pay_out) AS pay_out,SUM(revenue) AS revenue,SUM(amount) as amount, SUM(pending_conversions) AS pending_conversions,0 AS pending_pay_out, SUM(pending_revenue) AS pending_revenue,SUM(pending_amount) AS pending_amount, SUM(rejected_conversions) AS rejected_conversions,0 AS rejected_pay_out, SUM(rejected_revenue) AS rejected_revenue,SUM(rejected_amount) AS rejected_amount, #{startZeroTime} AS create_time, is_back, currency,platform_id,event_id, is_private, SUM(impression) as impression,revenue_type,payout_type,sl_id,carrier_id,url_id,ai_id,async_clicks FROM ${reportDB}.${operationAllHourTB} WHERE create_time=]]>#{start} AND create_time#{end} AND is_tmp = 0 GROUP BY camp_id,aff_id, aff_m_id, adv_id,adv_m_id,country_id,source_id, currency,platform_id,event_id, is_private,revenue_type,payout_type,sl_id,carrier_id,url_id,ai_id,async_clicks ORDER BY camp_id,aff_id, aff_m_id, adv_id,adv_m_id,country_id,source_id, currency,platform_id,event_id, is_private,revenue_type,payout_type,sl_id,carrier_id,url_id,ai_id,async_clicks LIMIT #{_skiprows}, #{_pagesize}