TiDB支持流式输出巨大的查询结果吗

Mysql可以执行 mysql -uroot -p -hhost dbname -e "select * from mytab" > mytab.tsv 这样简单得到数据库查询的结果。无论表有多大,查询有多慢,Mysql 客户端也总会持续的把数据输出到本地。
感觉Mysql这个过程应该是流式处理,客户端和传输过程中都会保证内存占用稳定。

但是用 tidb 进行这样的大查询,会被 killed, 文档也说查询不能超过一些内存设置的总大小。

mysql -uroot -h mytidbserver -p -P 4000 mydb -e 'set @@tidb_mem_quota_query = 8 << 30; select member_id destinct from orders;' > all_member_ids

Killed

似乎TiDB不能简单做到流式处理,对吗?
智能做脚本,where id > 123 / order by id / limit 1000 这样来循环分批处理?

补充:
上面的查询想设置查询内存用8G,但是似乎没占用到8G就退出了?

dashborad

	id           	task     	estRows  	operator info                                                        	actRows  	execution info                                                                                                                                                                                                     	memory              	disk
	IndexReader_7	root     	371209009	index:IndexFullScan_6                                                	138611679	time:399.0117ms, loops:135437, cop_task: {num: 149, max: 1.135885144s, min: 244.668247ms, avg: 528.72626ms, p95: 829.713853ms, max_proc_keys: 1465354, p95_proc_keys: 1398024, tot_proc: 1m9.672s, tot_wait: 184ms}	547.2192544937134 MB	N/A
	└─IndexScan_6	cop[tikv]	371209009	table:order_entry, index:index_member_id(member_id), keep order:false	138969968	proc max:980ms, min:176ms, p80:508ms, p95:684ms, iters:136385, tasks:149                                                                                                                                           	N/A                 	N/A

测试 Dumping

tidb-toolkit/bin/dumpling --host $hostname -uroot -p$pw -P 4000 --filetype csv -S "select member_id destinct from mydb.orders"

Release version: v4.0.7
...

[2021/01/06 02:52:02.510 +00:00] [INFO] [config.go:158] ["detect server type"] [type=TiDB]
...
[2021/01/06 02:52:12.549 +00:00] [WARN] [dump.go:104] ["If the amount of data to dump is large, criteria: (data more than 60GB or dumped time more than 10 minutes)\
you'd better adjust the tikv_gc_life_time to avoid export failure due to TiDB GC during the dump process.\
Before dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\
After dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\
"]
[2021/01/06 02:52:13.479 +00:00] [INFO] [sql.go:168] ["dump data from sql"] [sql="select member_id destinct from mydb.orders"]

[2021/01/06 02:55:26.919 +00:00] [INFO] [main.go:216] ["dump data successfully, dumpling will exit now"]
[ec2-user@ip-10-113-208-150 ~]$ cd export-2021-01-06T02\:52\:02Z/
[ec2-user@ip-10-113-208-150 export-2021-01-06T02:52:02Z]$ ls -l
total 5436388
-rwxr-xr-x 1 ec2-user ec2-user        139 Jan  6 02:55 metadata
-rwxr-xr-x 1 ec2-user ec2-user 5566853696 Jan  6 02:55 result.0.csv

GC 超时了,可以看一下文档对应的 GC 相关的信息,建议修改一下 tikv_gc_life_time 的时间。

1 个赞

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。