【 TiDB 使用环境】生产环境 /测试/ Poc
- 生产环境
【 TiDB 版本】
- v6.1.0
【复现路径】做过哪些操作出现的问题
- 基础环境均为亚马逊
AWS EC2
- 使用
tiup
部署TiDB v6.1.0
集群(因为想要使用TiUniManager
,TiUniManager
最高只能支持TiDB v6.1.0
) - 按照官方文档使用单独的
EC2
部署TiUniManager 1.0.2
, 并成功接管集群(工作流任务均成功)。 - 由于使用
AWS
部署的集群,数据也想直接备份到AWS s3
,于是在AWS
创建了Bucket
,然后使用OpenAPI
修改了以下配置信息(再三检查过配置无误):- BackupStorageType: s3(默认值未更改)
- BackupStoragePath: my-db-bak/rel
- BackupS3AccessKey: xxxxx
- BackupS3SecretAccessKey: xxxxx
- BackupS3Endpoint: https://s3.ap-southeast-1.amazonaws.com
- 设置备份计划,进入对应集群–>备份管理–>备份计划,到对应时间以后发现备份失败。
- 进入对应集群,点击手动备份,同样备份失败,以下是日志信息
【遇到的问题:问题现象及影响】
-
TiUniManager
工作流任务失败截图:
-
备份无法成功,关键日志如下
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin execute workflow BackupCluster, id fPZILNxURBWYmbBikcqM4w, node name backup","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin backupCluster","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"get cluster my-cluster tidb address from meta, [{IP:172.31.8.220 Port:4000} {IP:172.31.3.72 Port:4000}]","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"get cluster my-cluster user info from meta","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin do backup sql, request[{NodeID:kiHba6k-RYCf4Z5hCbqQug DbName: TableName: StorageAddress:s3://my-db-bak/rel/my-cluster/2023-11-01-13-33-56-full/?access-key=XXXXXXXXXXXXXXXXXXXX\\\u0026secret-access-key=aaaaaaaaaaaaaaaaaaaaaaaaaaa\\\u0026endpoint=https://s3.ap-southeast-1.amazonaws.com\\\u0026force-path-style=true DbConnParameter:{Username:EM_Backup_Restore Password:duw819273a IP:172.31.8.220 Port:4000} RateLimitM: Concurrency: CheckSum:}]","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin exec backup sql, request: {NodeID:kiHba6k-RYCf4Z5hCbqQug DbName: TableName: StorageAddress:s3://my-db-bak/rel/my-cluster/2023-11-01-13-33-56-full/?access-key=XXXXXXXXXXXXXXXXXXXX\\\u0026secret-access-key=aaaaaaaaaaaaaaaaaaaaaaaaaaa\\\u0026endpoint=https://s3.ap-southeast-1.amazonaws.com\\\u0026force-path-style=true DbConnParameter:{Username:EM_Backup_Restore Password:duw819273a IP:172.31.8.220 Port:4000} RateLimitM: Concurrency: CheckSum:}, bizId: kiHba6k-RYCf4Z5hCbqQug","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"error","msg":"query backup sql cmd failed Error 8124: Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request\n\tstatus code: 400, request id: S0VPEZTCA2M9P318, host id: GvO7VNT6L6Z/H7noQU7SwOaslyF+wpafwtavzgipLQIzBQBJ5xjZ+zuWFQIbgu11gfg0SfPO7OzTLsRhkVyoiw==","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"error","msg":"call backup api failed, Error 8124: Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request\n\tstatus code: 400, request id: S0VPEZTCA2M9P318, host id: GvO7VNT6L6Z/H7noQU7SwOaslyF+wpafwtavzgipLQIzBQBJ5xjZ+zuWFQIbgu11gfg0SfPO7OzTLsRhkVyoiw==","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"end backupCluster","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"workflow fPZILNxURBWYmbBikcqM4w of bizId my-cluster do node backup failed, Error 8124: Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request\n\tstatus code: 400, request id: S0VPEZTCA2M9P318, host id: GvO7VNT6L6Z/H7noQU7SwOaslyF+wpafwtavzgipLQIzBQBJ5xjZ+zuWFQIbgu11gfg0SfPO7OzTLsRhkVyoiw==","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"end execute workflow BackupCluster, id fPZILNxURBWYmbBikcqM4w, node name backup","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"","level":"info","msg":"delete flow id fPZILNxURBWYmbBikcqM4w","time":"2023-11-01T13:34:00+08:00"}
【资源配置】
- TiDB * 2
- PD * 3
- TiKV * 3
- Control * 1
- Haproxy * 1
【排查情况】
自己排查的情况,有些地方不理解,这个项目才刚开始使用TiDB,纯属个人猜测,轻喷…
首先说明
我已经在TiUniManager
的机器上使用br
备份到AWS s3
成功,说明access-key
和secret-access-key
没有问题的,权限也是有的。
附上TiUniManager
源码仓库地址,后续会用到:TiUniManager
疑问一: 打印出的StorageAddress
每一个参数后面多了一个反斜杠
- 类似下面的结构:
access-key=xxxxx\&secret-access-key=xxaaddww\&endpoint=xxaa\
- 以下是原日志信息:
StorageAddress:s3://my-db-bak/rel/my-cluster/2023-11-01-13-33-56-full/?access-key=XXXXXXXXXXXXXXXXXXXX\&secret-access-key=aaaaaaaaaaaaaaaaaaaaaaaaaaa\&endpoint=https://s3.ap-southeast-1.amazonaws.com\&force-path-style=true
-
查看了对应的源码处(
/micro-cluster/cluster/backuprestore/executor.go
314行):
-
我不太理解拼接这个斜杠是不是有影响
疑问二: 备份的sql
是否有问题
-
查看了日志报错的地方,发现这里在执行
sql
的时候报错了(db.QueryRow(brSQLCmd).Scan()
),对应源码位置为/util/api/tidb/sql/backuprestore.go
113行:
-
我想把
sql
语句打印出来看看,但是没办法打印,于是去TiUniManager
查看日志,找到了刚刚执行的sql
,内容如下(里面的xxxx
不是我替换的,复制出来就带着):
sql是这样的:
BACKUP DATABASE * TO 's3://my-db-bak/rel/my-cluster/2023-11-02-23-16-05-full/?access-key=xxxxxx&endpoint=https%3A%2F%2Fs3.ap-southeast-1.amazonaws.com&force-path-style=true&secret-access-key=xxxxxx'
- 萌新求教: 这个
sql
在mysql
里面能执行么?我复制到命令行里面执行,得到了如下错误, 跟日志里面的报错一样:
Error Code: 8124. Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request status code: 400, request id: QHW82FH2KB55JRTV, host id: +iAXuNsdn7AxxdS5WDMj6OYhHiySwUlwrVat33DnV2RSZYYhyEGI6UwkR1YRoVXO9EBpih6AWPs=
如果实在没有办法,只能换条路走了:
- 不备份到
s3
,改用nfs
试试看; - 放弃
TiUniManager
备份,该用br
; - 自己
build
源码部署然后找办法;
求教各位社区大神们,有什么解决方案吗?是我的用法不对还是TiUniManager不支持备份到s3呢?
折腾了两天了,太纠结了,望各位大神帮忙看看