TiUniManager备份到AWS S3出错

【 TiDB 使用环境】生产环境 /测试/ Poc

  • 生产环境

【 TiDB 版本】

  • v6.1.0

【复现路径】做过哪些操作出现的问题

  • 基础环境均为亚马逊 AWS EC2
  • 使用tiup部署TiDB v6.1.0集群(因为想要使用TiUniManagerTiUniManager最高只能支持TiDB v6.1.0
  • 按照官方文档使用单独的EC2部署TiUniManager 1.0.2, 并成功接管集群(工作流任务均成功)。
  • 由于使用AWS部署的集群,数据也想直接备份到AWS s3,于是在AWS创建了Bucket,然后使用OpenAPI修改了以下配置信息(再三检查过配置无误):
  • 设置备份计划,进入对应集群–>备份管理–>备份计划,到对应时间以后发现备份失败。
  • 进入对应集群,点击手动备份,同样备份失败,以下是日志信息

【遇到的问题:问题现象及影响】

  • TiUniManager工作流任务失败截图:

  • 备份无法成功,关键日志如下

{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin execute workflow BackupCluster, id fPZILNxURBWYmbBikcqM4w, node name backup","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin backupCluster","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"get cluster my-cluster tidb address from meta, [{IP:172.31.8.220 Port:4000} {IP:172.31.3.72 Port:4000}]","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"get cluster my-cluster user info from meta","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin do backup sql, request[{NodeID:kiHba6k-RYCf4Z5hCbqQug DbName: TableName: StorageAddress:s3://my-db-bak/rel/my-cluster/2023-11-01-13-33-56-full/?access-key=XXXXXXXXXXXXXXXXXXXX\\\u0026secret-access-key=aaaaaaaaaaaaaaaaaaaaaaaaaaa\\\u0026endpoint=https://s3.ap-southeast-1.amazonaws.com\\\u0026force-path-style=true DbConnParameter:{Username:EM_Backup_Restore Password:duw819273a IP:172.31.8.220 Port:4000} RateLimitM: Concurrency: CheckSum:}]","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"begin exec backup sql, request: {NodeID:kiHba6k-RYCf4Z5hCbqQug DbName: TableName: StorageAddress:s3://my-db-bak/rel/my-cluster/2023-11-01-13-33-56-full/?access-key=XXXXXXXXXXXXXXXXXXXX\\\u0026secret-access-key=aaaaaaaaaaaaaaaaaaaaaaaaaaa\\\u0026endpoint=https://s3.ap-southeast-1.amazonaws.com\\\u0026force-path-style=true DbConnParameter:{Username:EM_Backup_Restore Password:duw819273a IP:172.31.8.220 Port:4000} RateLimitM: Concurrency: CheckSum:}, bizId: kiHba6k-RYCf4Z5hCbqQug","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"error","msg":"query backup sql cmd failed Error 8124: Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request\n\tstatus code: 400, request id: S0VPEZTCA2M9P318, host id: GvO7VNT6L6Z/H7noQU7SwOaslyF+wpafwtavzgipLQIzBQBJ5xjZ+zuWFQIbgu11gfg0SfPO7OzTLsRhkVyoiw==","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"error","msg":"call backup api failed, Error 8124: Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request\n\tstatus code: 400, request id: S0VPEZTCA2M9P318, host id: GvO7VNT6L6Z/H7noQU7SwOaslyF+wpafwtavzgipLQIzBQBJ5xjZ+zuWFQIbgu11gfg0SfPO7OzTLsRhkVyoiw==","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"end backupCluster","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"workflow fPZILNxURBWYmbBikcqM4w of bizId my-cluster do node backup failed, Error 8124: Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request\n\tstatus code: 400, request id: S0VPEZTCA2M9P318, host id: GvO7VNT6L6Z/H7noQU7SwOaslyF+wpafwtavzgipLQIzBQBJ5xjZ+zuWFQIbgu11gfg0SfPO7OzTLsRhkVyoiw==","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"NsvRJ76rS6q5-wI0lHvhlg","level":"info","msg":"end execute workflow BackupCluster, id fPZILNxURBWYmbBikcqM4w, node name backup","time":"2023-11-01T13:34:00+08:00"}
{"Em-X-Trace-Id":"","level":"info","msg":"delete flow id fPZILNxURBWYmbBikcqM4w","time":"2023-11-01T13:34:00+08:00"}

【资源配置】

  • TiDB * 2
  • PD * 3
  • TiKV * 3
  • Control * 1
  • Haproxy * 1

【排查情况】

自己排查的情况,有些地方不理解,这个项目才刚开始使用TiDB,纯属个人猜测,轻喷…

首先说明
我已经在TiUniManager的机器上使用br备份到AWS s3成功,说明access-keysecret-access-key没有问题的,权限也是有的。

附上TiUniManager源码仓库地址,后续会用到:TiUniManager

疑问一: 打印出的StorageAddress每一个参数后面多了一个反斜杠

  • 类似下面的结构:
access-key=xxxxx\&secret-access-key=xxaaddww\&endpoint=xxaa\
  • 以下是原日志信息:
StorageAddress:s3://my-db-bak/rel/my-cluster/2023-11-01-13-33-56-full/?access-key=XXXXXXXXXXXXXXXXXXXX\&secret-access-key=aaaaaaaaaaaaaaaaaaaaaaaaaaa\&endpoint=https://s3.ap-southeast-1.amazonaws.com\&force-path-style=true  

  • 查看了对应的源码处(/micro-cluster/cluster/backuprestore/executor.go 314行):

  • 我不太理解拼接这个斜杠是不是有影响

疑问二: 备份的sql是否有问题

  1. 查看了日志报错的地方,发现这里在执行sql的时候报错了(db.QueryRow(brSQLCmd).Scan()),对应源码位置为/util/api/tidb/sql/backuprestore.go 113行:

  2. 我想把sql语句打印出来看看,但是没办法打印,于是去TiUniManager查看日志,找到了刚刚执行的sql,内容如下(里面的xxxx不是我替换的,复制出来就带着):

sql是这样的:

BACKUP DATABASE * TO 's3://my-db-bak/rel/my-cluster/2023-11-02-23-16-05-full/?access-key=xxxxxx&endpoint=https%3A%2F%2Fs3.ap-southeast-1.amazonaws.com&force-path-style=true&secret-access-key=xxxxxx'
  1. 萌新求教: 这个sqlmysql里面能执行么?我复制到命令行里面执行,得到了如下错误, 跟日志里面的报错一样:
Error Code: 8124. Backup failed: error occurred when checking backupmeta file: BadRequest: Bad Request  status code: 400, request id: QHW82FH2KB55JRTV, host id: +iAXuNsdn7AxxdS5WDMj6OYhHiySwUlwrVat33DnV2RSZYYhyEGI6UwkR1YRoVXO9EBpih6AWPs=

如果实在没有办法,只能换条路走了:

  1. 不备份到s3,改用nfs试试看;
  2. 放弃TiUniManager备份,该用br
  3. 自己build源码部署然后找办法;

求教各位社区大神们,有什么解决方案吗?是我的用法不对还是TiUniManager不支持备份到s3呢?

折腾了两天了,太纠结了,望各位大神帮忙看看 :sob: :sob: :sob:

直接用 br 试试

参考这个:
https://docs.pingcap.com/zh/tidb/stable/br-snapshot-guide

TiUniManager 没怎么关注过… :rofl:

TiUniManager这个真没用过。。。

tiunimanager 仓库上次更新时间还是今年 4 月,今年就 4 个 commit,不知道是不是不维护了

如果有需要管控平台之类的,可以关注一下 TiDB 的 商业版, 里面有个TEM

:joy: :joy: 感谢回复,我把版本充7.1.2降级到6.1.0就是想使用TiUniManager,结果没想到是这样。。。按理说官方发布出来的工具不应该有问题呀,研究了两天了,太纠结了 :sweat_smile: :sweat_smile:

感谢回复,想问一下,不用TiUniManager的话,集群监控是用什么呢?

集群监控用 dashboard 和 grafana 就可以了

感谢回复 :pray: :pray:,官方不太可能发布一个半成品吧。。。我看文档里面介绍这个工具也写了很多篇幅,要是这样我得气晕了,专门把版本降级下来的 :sob: :sob:,另外TEM有介绍的链接吗?我刚没搜到相关说明呢。

自己顶一下,大家都没有使用TiUniManager的需求么? :sob: :sob: :sob:

商业版的,可能更符合你的预期, tiem

商业版的也没有看到有TIEM的介绍呀?目前什么都部署好了,就差一个备份了 :sob: :sob:

看这


社区版的功能就上面链接中有介绍…