关于dumpling导出数据的问题

【 TiDB 使用环境】测试
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
1、-r 这个参数并不是用来指定每个文件中存放多少行的数据,只是用来开启表内并发加速导出的?如果是的,那-r后面指定多少行其实都是没有关系的?只要大于0就可以了?
[kevinsna@tidb01 ~]$ tiup dumpling -udumpusr -p’123456’ -h 192.168.90.30 -P 4000 --filetype sql -B test -t 8 -r 1 -F256MiB -o /tmp/test
[kevinsna@tidb01 ~]$ cd /tmp/test
[kevinsna@tidb01 test]$ ls
metadata test-schema-create.sql test.t.0000000010000.sql test.t-schema.sql
[kevinsna@tidb01 test]$ cat test.t.0000000010000.sql
/!40014 SET FOREIGN_KEY_CHECKS=0/;
/!40101 SET NAMES binary/;
INSERT INTO t VALUES
(1,‘a’),
(2,‘b’),
(3,‘c’),
(4,‘d’),
(5,‘e’);
[kevinsna@tidb01 test]$

2、另外,通过dumpling导出MySQL的数据,指定–consistency snapshot 是相当于mysqldump --sinle-transaction吗?还是通过dumpling导出MySQL的数据,指定了snapshot是无效的,依旧是flush ,会导致全库只能读不能写?

对了,在303的课程里面的dumpling是这样说的:-r 200000 -F256MiB,这两个参数哪个限制先达到就先使用哪个来限制文件的大小,先到20万行数据,但文件大小没有达到256MiB,则使用20万行来限制,不过我上面测试-r 1其实并没有生效

-r 没有生效可以查看 dumpling 日志,里面会有日志信息辅助判断是因为啥没有生效。

设置 -r 参数大于 0 表示使用 TiDB region 信息划分表内并发,具体取值不影响划分算法。对上游为 MySQL 且表的主键是 int 的场景,该参数也有表内并发效果。

你好,没有看到相关的日志信息有提示为什么没有生效噢:
[kevinsna@tidb01 tmp]$ tiup dumpling -udumpusr -p’123456’ -h 192.168.90.30 -P 4000 --filetype sql -B test -t 8 -r 1 -F256MiB -o /tmp/test
Starting component dumpling: /home/kevinsna/.tiup/components/dumpling/v8.1.0/dumpling -udumpusr -p123456 -h 192.168.90.30 -P 4000 --filetype sql -B test -t 8 -r 1 -F256MiB -o /tmp/test
Release version: v8.1.0
Git commit hash: 945d07c5d5c7a1ae212f6013adfb187f2de24b23
Git branch: HEAD
Build timestamp: 2024-05-21 03:51:57Z
Go version: go version go1.21.10 linux/amd64

[2024/11/28 14:13:12.443 +08:00] [INFO] [versions.go:54] [“Welcome to dumpling”] [“Release Version”=v8.1.0] [“Git Commit Hash”=945d07c5d5c7a1ae212f6013adfb187f2de24b23] [“Git Branch”=HEAD] [“Build timestamp”=“2024-05-21 03:51:57”] [“Go Version”=“go version go1.21.10 linux/amd64”]
[2024/11/28 14:13:12.450 +08:00] [INFO] [version.go:440] [“detect server version”] [type=TiDB] [version=8.1.0]
[2024/11/28 14:13:12.504 +08:00] [INFO] [pd_service_discovery.go:991] [“[pd] update member urls”] [old-urls=“[http://192.168.90.32:2379,http://192.168.90.31:2379,http://192.168.90.30:2379]”] [new-urls=“[http://192.168.90.30:2379,http://192.168.90.31:2379,http://192.168.90.32:2379]”]
[2024/11/28 14:13:12.505 +08:00] [INFO] [pd_service_discovery.go:1016] [“[pd] switch leader”] [new-leader=http://192.168.90.30:2379] [old-leader=]
[2024/11/28 14:13:12.505 +08:00] [INFO] [pd_service_discovery.go:498] [“[pd] init cluster id”] [cluster-id=7388826782122131320]
[2024/11/28 14:13:12.506 +08:00] [INFO] [client.go:613] [“[pd] changing service mode”] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2024/11/28 14:13:12.506 +08:00] [INFO] [tso_client.go:293] [“[tso] switch dc tso global allocator serving url”] [dc-location=global] [new-url=http://192.168.90.30:2379]
[2024/11/28 14:13:12.508 +08:00] [INFO] [client.go:619] [“[pd] service mode changed”] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2024/11/28 14:13:12.509 +08:00] [INFO] [tso_dispatcher.go:119] [“[tso] start tso deadline watcher”] [dc-location=global]
[2024/11/28 14:13:12.510 +08:00] [INFO] [tso_dispatcher.go:168] [“[tso] tso dispatcher created”] [dc-location=global]
[2024/11/28 14:13:12.510 +08:00] [INFO] [tso_dispatcher.go:336] [“[tso] start tso connection contexts updater”] [dc-location=global]
[2024/11/28 14:13:12.510 +08:00] [INFO] [tso_client.go:130] [“[tso] start tso dispatcher check loop”]
[2024/11/28 14:13:12.515 +08:00] [INFO] [dump.go:1518] [“generate dumpling gc safePoint id”] [id=dumpling_1732774392515062409]
[2024/11/28 14:13:12.588 +08:00] [INFO] [dump.go:151] [“begin to run Dump”] [conf=“{"s3":{"endpoint":"","region":"","storage-class":"","sse":"","sse-kms-key-id":"","acl":"","access-key":"","secret-access-key":"","session-token":"","provider":"","force-path-style":true,"use-accelerate-endpoint":false,"role-arn":"","external-id":"","object-lock-enabled":false},"gcs":{"endpoint":"","storage-class":"","predefined-acl":"","credentials-file":""},"azblob":{"endpoint":"","account-name":"","account-key":"","access-tier":"","sas-token":"","encryption-scope":"","encryption-key":""},"SpecifiedTables":false,"AllowCleartextPasswords":false,"SortByPk":true,"NoViews":false,"NoSequences":false,"NoHeader":false,"NoSchemas":false,"NoData":false,"CompleteInsert":false,"TransactionalConsistency":true,"EscapeBackslash":true,"DumpEmptyDatakevinsna":true,"PosAfterConnect":false,"CompressType":0,"Host":"192.168.90.30","Port":4000,"Threads":8,"User":"dumpusr","Security":{"CAPath":"","CertPath":"","KeyPath":""},"LogLevel":"info","LogFile":"","LogFormat":"text","OutputDirPath":"/tmp/test","StatusAddr":":8281","Snapshot":"454236410348830724","Consistency":"snapshot","CsvNullValue":"\\N","SQL":"","CsvSeparator":",","CsvDelimiter":"\"","CsvLineTerminator":"\r\n","Datakevinsnas":["test"],"Where":"","FileType":"sql","ServerInfo":{"ServerType":3,"ServerVersion":"8.1.0","HasTiKV":true},"Rows":1,"ReadTimeout":900000000000,"TiDBMemQuotaQuery":0,"FileSize":268435456,"StatementSize":1000000,"SessionParams":{"tidb_snapshot":"454236410348830724"},"Tables":{},"CollationCompatible":"loose","CsvOutputDialect":0,"IOTotalBytes":null,"Net":""}”]
[2024/11/28 14:13:12.782 +08:00] [INFO] [writer.go:272] [“no data written in table chunk”] [datakevinsna=test] [table=t] [chunkIdx=0]
[2024/11/28 14:13:12.808 +08:00] [INFO] [collector.go:264] [“backup success summary”] [total-ranges=6] [ranges-succeed=6] [ranges-failed=0] [total-take=60.624993ms] [total-kv-size=134B] [average-speed=2.21kB/s] [total-rows=5]
[2024/11/28 14:13:12.813 +08:00] [INFO] [main.go:82] [“dump data successfully, dumpling will exit now”]
[kevinsna@tidb01 tmp]$ cat /tmp/test/test.t.0000000010000.sql
/!40014 SET FOREIGN_KEY_CHECKS=0/;
/!40101 SET NAMES binary/;
INSERT INTO t VALUES
(1,‘a’),
(2,‘b’),
(3,‘c’),
(4,‘d’),
(5,‘e’);

我本地模拟了一下
[2024/11/28 14:34:02.489 +08:00] [DEBUG] [dump.go:878] [“dumping TiDB tables with TABLESAMPLE”] [database=test] [table=dump_test]
[2024/11/28 14:34:02.490 +08:00] [DEBUG] [dump.go:825] [“send task to writer”] [task=“data of table ‘test’.‘dump_test’(0/2)”]
[2024/11/28 14:34:02.490 +08:00] [DEBUG] [writer.go:200] [“trying to dump table chunk”] [retryTime=1] [db=test] [table=dump_test] [chunkIndex=1]
[2024/11/28 14:34:02.490 +08:00] [DEBUG] [dump.go:825] [“send task to writer”] [task=“data of table ‘test’.‘dump_test’(1/2)”]
[2024/11/28 14:34:02.490 +08:00] [DEBUG] [ir_impl.go:205] [“try to start tableData”] [query=“SELECT * FROM test.dump_test WHERE _tidb_rowid>=1 ORDER BY _tidb_rowid”]
[2024/11/28 14:34:02.490 +08:00] [DEBUG] [writer.go:200] [“trying to dump table chunk”] [retryTime=1] [db=test] [table=dump_test] [chunkIndex=0]
[2024/11/28 14:34:02.490 +08:00] [DEBUG] [ir_impl.go:205] [“try to start tableData”] [query=“SELECT * FROM test.dump_test WHERE _tidb_rowid<1 ORDER BY _tidb_rowid”]
[2024/11/28 14:34:02.493 +08:00] [INFO] [writer.go:272] [“no data written in table chunk”] [database=test] [table=dump_test] [chunkIdx=0]
[2024/11/28 14:34:02.494 +08:00] [DEBUG] [dump.go:369] [“finish dumping table data task”] [database=test] [table=dump_test] [chunkIdx=0]
[2024/11/28 14:34:02.494 +08:00] [DEBUG] [writer_util.go:498] [“opened file”] [path=file:///home/tidb/ghy/test.dump_test.0000000010000.csv]
[2024/11/28 14:34:02.494 +08:00] [DEBUG] [writer_util.go:351] [“finish dumping table(chunk)”] [database=test] [table=dump_test] [“finished rows”=5] [“finished size”=48]
[2024/11/28 14:34:02.494 +08:00] [DEBUG] [writer_util.go:508] [“tear down lazy file writer…”] [path=file:///home/tidb/ghy/test.dump_test.0000000010000.csv]
[2024/11/28 14:34:02.494 +08:00] [DEBUG] [writer.go:256] [“finish dumping table(chunk)”] [database=test] [table=dump_test] [chunkIdx=1] [“total rows”=5]
[2024/11/28 14:34:02.494 +08:00] [DEBUG] [dump.go:369] [“finish dumping table data task”] [database=test] [table=dump_test] [chunkIdx=1]
[2024/11/28 14:34:02.494 +08:00] [INFO] [collector.go:264] [“backup success summary”] [total-ranges=4] [ranges-succeed=4] [ranges-failed=0] [total-take=9.529024ms] [total-kv-size=48B] [average-speed=5.037kB/s] [total-rows=5]

日志代表的是 tidb 首先使用 TABLESAMPLE 去划分 chunk ,可以看到数据库侧执行的是

select `_tidb_rowid` from `test` . `dump_test` tablesample regions ( ) order by `_tidb_rowid`

得到结果如下

然后根据范围生成了两条导出语句

[query="SELECT * FROM `test`.`dump_test` WHERE `_tidb_rowid`>=1  ORDER BY `_tidb_rowid`"]

[query="SELECT * FROM `test`.`dump_test` WHERE `_tidb_rowid`<1  ORDER BY `_tidb_rowid`"]  # 此语句是没有结果返回的
# ["no data written in table chunk"] [database=test] [table=dump_test] [chunkIdx=0]

所以最终数据导出只导出到了一个文件。其实是并发了,只不过因为chunk划分没有办法那么精准,所以造成了没有并发的表面现象。数据量越大的场景表现会更好(依赖于 region 信息划分 chunk 的算法会跟更准确)。

综上可以得出一些结论
1、-r 后面数字的大小貌似影响真的不大,也和官网的说辞 ” 设置 -r 参数大于 0 表示使用 TiDB region 信息划分表内并发,具体取值不影响划分算法。“ 能对得上。
2、具体文件生成的数量取决于 tablesample 的结果。

3 个赞
  • -r 选项用于指定单个文件的最大记录数,或者说,数据库中的行数。开启后 Dumpling 会开启表内并发,提高导出大表的速度。当上游为 TiDB 且版本为 v3.0 或更新版本时,设置 -r 参数大于 0 表示使用 TiDB region 信息划分表内并发,具体取值不影响划分算法。对上游为 MySQL 且表的主键是 int 的场景,该参数也有表内并发效果。 摘自官方文档: 使用 Dumpling 导出数据 | TiDB 文档中心 ,官方文档中也有说明是用于指定单个文件的最大记录数,不过应该就像你说的,chunk划分没有办法那么精准,谢谢啦~
1 个赞

mysql的mysqldump是会加锁的,只是看情况,锁的时间比较短而已。mysql-8.0支持备份锁,更轻量一些。

使用–single-transaction可以保持一致性读,我这测试通过dumpling导出mysql的数据,锁的时间也挺短的,一开始还以为是在整个导出过程都会把整个数据库锁住,提示这个的时候,就解锁表了,然后再导出数据:[2024/11/28 18:04:45.534 +08:00] [INFO] [dump.go:275] [“All the dumping transactions have started. Start to unlock tables”]

后面把表的数据量增大到100万,然后设置-r为20000,-F256MiB,导出的包含数据的sql文件行数是22379,文件大小65M左右,那看来-r确实像你说的,总体还是会限制这文件的行数的,但是没有那么精准,那也符合我们想要的了,差别不是很大,感谢,给您标记为最佳答案

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。