单表数据超2千万,执行DELETE删除6万条左右的数据,耗时60s左右

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0
  • 【问题描述】:

单表数据超2千万,执行删除6万条数据,耗时60s左右。
集群是4台机器,3台tikv,SSD盘。

db_1.flow 表总条数:21451468

执行SQL:
delete from db_1.flow where timestamp>=1595332800 and timestamp <1595336400 and cid=20 and did=200602;

explain结果:

Delete_4 N/A root N/A
└─IndexLookUp_12 0.00 root
├─IndexRangeScan_9(Build) 10.00 cop[tikv] table:flow, did(did) range:[200602,200602], keep order:false, stats:pseudo
└─Selection_11(Probe) 0.00 cop[tikv] eq(db_1.flow.cid, 20), ge(db_1.flow…timestamp, 1595332800), lt(db_1.flow…timestamp, 1595336400)
└─TableRowIDScan_10 10.00 cop[tikv] table:flow keep order:false, stats:pseudo

请帮忙分析一下,删除效率低的具体原因是什么?

看下 db_1.flow 的表结构,并且提供下 select tidb_version();

select tidb_version();

Release Version: v4.0.0
Edition: Community
Git Commit Hash: 689a6b6439ae7835947fcaccf329a3fc303986cb
Git Branch: heads/refs/tags/v4.0.0
UTC Build Time: 2020-05-28 01:37:40
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

desc db_1.flow;

id	bigint(20)	NO	PRI		auto_increment
timestamp	bigint(20)	NO	MUL		
date	int(11)	NO	MUL		
cid	int(11)	NO	MUL	0	
did	int(10) unsigned	NO	MUL		
isid	int(10) unsigned	NO			
flow	bigint(20)	NO			
pv	bigint(20)	NO			
spd	bigint(20)	NO			
rede	int(10) unsigned	NO			
sotype	int(10) unsigned	NO			
no	varchar(64)	NO			
teal	varchar(255)	NO			
ph	varchar(255)	NO			
pro	varchar(64)	NO			
update_timestamp	timestamp	NO		CURRENT_TIMESTAMP	DEFAULT_GENERATED on update CURRENT_TIMESTAMP

select count(*) from db_1.flow;

21596217

show create table db_1.flow;

show create table db_1.flow;

CREATE TABLE flow (
id bigint(20) NOT NULL AUTO_INCREMENT,
timestamp bigint(20) NOT NULL COMMENT ‘’,
date int(11) NOT NULL COMMENT ‘’,
cid int(11) NOT NULL DEFAULT 0 COMMENT ‘’,
did int(10) unsigned NOT NULL COMMENT ‘’,
…省略了部分非key字段。。。
PRIMARY KEY (id),
KEY date (date),
KEY did (did),
KEY cid (cid),
KEY idx_flow (timestamp,cid,did)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=12800586460 COMMENT=‘’

执行下 analyze table db_1.flow; 在执行下 explain delete from db_1.flow where timestamp>=1595332800 and timestamp <1595336400 and cid=20 and did=200602; 看下执行计划。

analyze table db_1.flow;

 Delete_4	N/A	root		N/A
└─IndexLookUp_20	73136.28	root		
  ├─Selection_19(Build)	73136.28	cop[tikv]		eq(db_1.flow.cid, 20), eq(db_1.flow.did, 200602)
  │ └─IndexRangeScan_17	205131.24	cop[tikv]	table:flow, index:idx_flow(timestamp, cid, did)	range:[1595332800,1595336400), keep order:false
  └─TableRowIDScan_18(Probe)	73136.28	cop[tikv]	table:flow	keep order:false

已执行,刚测一条删除,同样的删除语句,耗时3秒多,比之前快了很多。
请问这个具体是什么原因呢? 另外速度还能优化一下吗?(理想效果是在1s左右)

因为当前查询条件满足 key idx_flow ,看上面的执行计划走的是 index range scan,所以应该是不符合预期的,看执行计划中 stats:pseudo,应该是需要收集下统计信息,

对于当前问题,可以适当调整自动收集统计信息的触发,:
https://docs.pingcap.com/zh/tidb/stable/statistics#自动更新

多谢,我去看一下

:ok_hand:

目前这个计划我感觉是到头了,如果想提速可以把 idx_flow 的顺序改一下,弄成 cid, did, timestamp 这样这个查询条件扫描的时候可能会快一点,不过可能会影响其他只根据 timestamp 进行的查询,总的来说我感觉提升应该不会很大了。

另外希望在有数据的情况下帮忙执行以下 sql
explain select count() from db_1.flow where timestamp>=1595332800 and timestamp <1595336400 and cid=20 and did=200602; 和 select count() from db_1.flow where timestamp>=1595332800 and timestamp <1595336400;

嗯嗯, 主要是这个表还是在继续增长,目前每天大概在500w条左右。
如果想保留一个月的数据的话,大概要保存15000w条左右的记录,到时候这个删除和查询时间,
会是一个随着记录数增多线性的增长还是基本保持不变?(热点数据是近1-3天的)

如果是保留一个月,删除的频率是什么样子的呢,每天都删除的话大概是 500w 。单条语句估计是不行的,需要分页处理,循环删除

1.5 亿的数据删除的量和查询的量都是有关系的,如果是 pointget 则会很快。

不知上面 sql 执行结果可否提供下

删除的话,是每5分钟执行一次,循环处理的。

explain select count(1) from db_1.flow where timestamp>=1595332800 and timestamp <1595336400 and cid=20 and did=200602;

StreamAgg_36	1.00	root		funcs:count(Column#24)->Column#17
└─IndexReader_37	1.00	root		index:StreamAgg_9
  └─StreamAgg_9	1.00	cop[tikv]		funcs:count(1)->Column#24
    └─Selection_35	73406.02	cop[tikv]		eq(db_1.flow.cid, 20), eq(db_1.flow.did, 200602)
      └─IndexRangeScan_34	205888.08	cop[tikv]	table:flow, index:idx_flow(timestamp, cid, did)	range:[1595332800,1595336400), keep order:false

explain select count(1) from db_1.flow where timestamp>=1595332800 and timestamp <1595336400;

StreamAgg_17	1.00	root		funcs:count(Column#25)->Column#17
└─IndexReader_18	1.00	root		index:StreamAgg_9
  └─StreamAgg_9	1.00	cop[tikv]		funcs:count(1)->Column#25
    └─IndexRangeScan_16	205835.77	cop[tikv]	table:flow, index:idx_flow(timestamp, cid, did)	range:[1595332800,1595336400), keep order:false

select count(1) from db_1.flow where timestamp>=1595332800 and timestamp <1595336400;

205721

explain analyze select count(1) from db_1.flow where timestamp>=1595332800 and timestamp <1595336400 and cid=20 and did=200602;

这样试试呢,看看那个多余的 selection 大概耗时多少

explain analyze 的执行结果:
explain analyze select count(1) from db_1.flow where timestamp>=1595332800 and timestamp <1595336400 and cid=20 and did=200602;

|StreamAgg_36|1.00|1|root||time:86.385499ms, loops:2|funcs:count(Column#24)->Column#17|372 Bytes|N/A|
|---|---|---|---|---|---|---|---|---|
|└─IndexReader_37|1.00|2|root||time:86.380409ms, loops:2, rpc num: 2, rpc max:86.289834ms, min:43.541723ms, avg:64.915778ms, p80:86.289834ms, p95:86.289834ms, proc keys max:94348, p95:94348|index:StreamAgg_9|310 Bytes|N/A|
|  └─StreamAgg_9|1.00|2|cop[tikv]||proc max:86ms, min:43ms, p80:86ms, p95:86ms, iters:141, tasks:2|funcs:count(1)->Column#24|N/A|N/A|
|    └─Selection_35|33134.27|63231|cop[tikv]||proc max:86ms, min:43ms, p80:86ms, p95:86ms, iters:141, tasks:2|eq(db_1.flow.cid, 20), eq(db_1.flow.did, 200602)|N/A|N/A|
|      └─IndexRangeScan_34|76172.66|143181|cop[tikv]|table:flow, index:idx_flow(timestamp, cid, did)|proc max:82ms, min:41ms, p80:82ms, p95:82ms, iters:141, tasks:2|range:[1595332800,1595336400), keep order:false|N/A|N/A|

ok,感谢反馈,这边已经收到,对于本帖初始问题,应该在上面已经有了定论,这边收集的信息,会在看下,看是否还有优化的空间。有进展会及时回复。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。