dumpling导出数据报错一

【 TiDB 使用环境`】生产环境

【 TiDB 版本】v5.1.0

【遇到的问题】在使用dumpling命令导出数据时候报如下错误(tikv_gc_life_time已经设置为10h),此数据使用navicate可以查询到。
1、导出命令:

2、报错信息:

怀疑你导出线程数太大把tidb节点打挂了。
1、把线程数调小试试
2、确认下tidb节点是否发生了重启

1 Like

集群状态正常,我来把线程数调小试试。

线程数已经调整到16了,但是错误依旧。

报错时间点的tidb日志麻烦上传一下

我在中转服务器上使用dumpling命令导出的,我现在把tidb日志发一下,请帮忙分析原因,谢谢!tidb.rar (2.0 MB)

dumpling所在的机器是什么配置?如果配置不高建议默认线程数4

另外符合导出条件的数据有多少行,加个-r 10000参数分批次导出试试。

从已有日志信息来看是dumpling这边关闭连接了

中转服务器配置: 32核cpu、64G内存,线程数修改为4了,其实以前一直都是32,-r也设置为1000了,但是问题依旧。
图一:导出命令

图二:导出日志一

图三:导出日志二

从报错上来看应该是 dumpling 导出的时候,tidb 重启( 比如 tidb oom),这个可以通过 granfafa tidb -> uptime 和 tidb memory 确认。你的 tidb memory size 多大呢?
另外,dumpling 版本最好 >= tidb 版本

1 Like

TiDB memeory 3台各16核32G
图一: uptime

TiDB版本: v5.1.0
dumpling版本: 5.1.0

而且我测试只导出一条语句也报同样的错误,不是和集群参数有没有关系。
图一:导出命令

图二:报错信息

图三:集群参数信息。

config show all
{
“client-urls”: “http://0.0.0.0:2379”,
“peer-urls”: “http://0.0.0.0:2380”,
“advertise-client-urls”: “http://x.x.x.x:2379”,
“advertise-peer-urls”: “http://x.x.x.x:2380”,
“name”: “pd-x.x.x.x-2379”,
“data-dir”: “/data/tidb-data/pd-2379”,
“force-new-cluster”: false,
“enable-grpc-gateway”: true,
“initial-cluster”: “pd-x.x.x.x-2379=http://x.x.x.x:2380,pd-x.x.x.x-2379=http://x.x.x.x:2380,pd-x.x.x.x-2379=http://x.x.x.x:2380”,
“initial-cluster-state”: “new”,
“initial-cluster-token”: “pd-cluster”,
“join”: “”,
“lease”: 3,
“log”: {
“level”: “”,
“format”: “text”,
“disable-timestamp”: false,
“file”: {
“filename”: “/data/tidb-deploy/pd-2379/log/pd.log”,
“max-size”: 300,
“max-days”: 0,
“max-backups”: 0
},
“development”: false,
“disable-caller”: false,
“disable-stacktrace”: false,
“disable-error-verbose”: true,
“sampling”: null
},
“tso-save-interval”: “3s”,
“tso-update-physical-interval”: “50ms”,
“enable-local-tso”: false,
“metric”: {
“job”: “pd-x.x.x.x-2379”,
“address”: “”,
“interval”: “15s”
},
“schedule”: {
“max-snapshot-count”: 48,
“max-pending-peer-count”: 3,
“max-merge-region-size”: 20,
“max-merge-region-keys”: 200000,
“split-merge-interval”: “1h0m0s”,
“enable-one-way-merge”: “false”,
“enable-cross-table-merge”: “true”,
“patrol-region-interval”: “100ms”,
“max-store-down-time”: “30m0s”,
“leader-schedule-limit”: 4,
“leader-schedule-policy”: “count”,
“region-schedule-limit”: 2048,
“replica-schedule-limit”: 64,
“merge-schedule-limit”: 8,
“hot-region-schedule-limit”: 4,
“hot-region-cache-hits-threshold”: 3,
“store-limit”: {
“1”: {
“add-peer”: 15,
“remove-peer”: 15
},
“2”: {
“add-peer”: 15,
“remove-peer”: 15
},
“4376258”: {
“add-peer”: 15,
“remove-peer”: 15
},
“5”: {
“add-peer”: 15,
“remove-peer”: 15
},
“56”: {
“add-peer”: 30,
“remove-peer”: 30
},
“57”: {
“add-peer”: 30,
“remove-peer”: 30
},
“6”: {
“add-peer”: 15,
“remove-peer”: 15
},
“7”: {
“add-peer”: 15,
“remove-peer”: 15
}
},
“tolerant-size-ratio”: 0,
“low-space-ratio”: 0.8,
“high-space-ratio”: 0.7,
“region-score-formula-version”: “v2”,
“scheduler-max-waiting-operator”: 5,
“enable-remove-down-replica”: “true”,
“enable-replace-offline-replica”: “true”,
“enable-make-up-replica”: “true”,
“enable-remove-extra-replica”: “true”,
“enable-location-replacement”: “true”,
“enable-debug-metrics”: “false”,
“enable-joint-consensus”: “true”,
“schedulers-v2”: [
{
“type”: “balance-region”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “balance-leader”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “hot-region”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “label”,
“args”: null,
“disable”: false,
“args-payload”: “”
}
],
“schedulers-payload”: {
“balance-hot-region-scheduler”: null,
“balance-leader-scheduler”: {
“name”: “balance-leader-scheduler”,
“ranges”: [
{
“end-key”: “”,
“start-key”: “”
}
]
},
“balance-region-scheduler”: {
“name”: “balance-region-scheduler”,
“ranges”: [
{
“end-key”: “”,
“start-key”: “”
}
]
},
“label-scheduler”: {
“name”: “label-scheduler”,
“ranges”: [
{
“end-key”: “”,
“start-key”: “”
}
]
}
},
“store-limit-mode”: “manual”
},
“replication”: {
“max-replicas”: 3,
“location-labels”: “”,
“strictly-match-label”: “false”,
“enable-placement-rules”: “true”,
“isolation-level”: “”
},
“pd-server”: {
“use-region-storage”: “true”,
“max-gap-reset-ts”: “24h0m0s”,
“key-type”: “table”,
“runtime-services”: “”,
“metric-storage”: “”,
“dashboard-address”: “http://x.x.x.x:2379”,
“flow-round-by-digit”: 3
},
“cluster-version”: “5.1.0”,
“labels”: {},
“quota-backend-bytes”: “8GiB”,
“auto-compaction-mode”: “periodic”,
“auto-compaction-retention-v2”: “1h”,
“TickInterval”: “500ms”,
“ElectionInterval”: “3s”,
“PreVote”: true,
“security”: {
“cacert-path”: “”,
“cert-path”: “”,
“key-path”: “”,
“cert-allowed-cn”: null,
“redact-info-log”: false,
“encryption”: {
“data-encryption-method”: “plaintext”,
“data-key-rotation-period”: “168h0m0s”,
“master-key”: {
“type”: “plaintext”,
“key-id”: “”,
“region”: “”,
“endpoint”: “”,
“path”: “”
}
}
},
“label-property”: {},
“WarningMsgs”: null,
“DisableStrictReconfigCheck”: false,
“HeartbeatStreamBindInterval”: “1m0s”,
“LeaderPriorityCheckInterval”: “1m0s”,
“dashboard”: {
“tidb-cacert-path”: “”,
“tidb-cert-path”: “”,
“tidb-key-path”: “”,
“public-path-prefix”: “”,
“internal-proxy”: false,
“enable-telemetry”: true,
“enable-experimental”: false
},
“replication-mode”: {
“replication-mode”: “majority”,
“dr-auto-sync”: {
“label-key”: “”,
“primary”: “”,
“dr”: “”,
“primary-replicas”: 0,
“dr-replicas”: 0,
“wait-store-timeout”: “1m0s”,
“wait-sync-timeout”: “1m0s”,
“wait-async-timeout”: “2m0s”
}
}
}

报的是连接不通,你看下在dumpling的机器,用你的dumpling的连接信息看是否能够正常连接

1 Like

远程登录访问、Telnet数据库服务器都是可以的。


我导出另外一个库数据正常(而且threads设置为32,-F 设置为10240Mi),并且也测试将dumpling放到其他服务器,也是同样结果。

请问可以把要导出的表复制到别的库一份么,然后同样的命令试下两个库的表现,感觉不应太棒出现这种情况

我同样的命令在老环境、新环境(同表只是数据量不同)查询,老环境可以,新环境不行,不知是不是这张表数据量太大问题导致?

嗯,表太大了,有几个猜想
1.查询期间发生了oom,这个可以看下日志有没有welcome的字样,我这没安装rar解压,看不了
2.查询时间过长,导致了dumpling的超时,可以在tidb上查询下对应语句,看看查询时间多长
3.表结构出现了损坏,导致问题产生,但是10来个T的文件check一下好费劲。。。

换成 mysqldumper 试试

1 Like

您好,麻烦发下 dumpling 导出一条时,tidb-server 的日志,如果有多个 tidb-server 并且通过 loadbalance 导出,请发送所有 tidb-server 的日志。

1 Like

使用mysqldumper导出是可以