TIDB实时性能分析

鲁班看海 · 2021 年10 月 20 日 01:52

为提高效率，请提供以下信息，问题描述清晰能够更快得到解决：
【 TiDB 使用环境】
TIDB集群版本5.2.1 部署使用tiup
两台TIDB机器
3台PD机器
4台TIKV机器，3台8C32G 一台8C64G（后面扩容了一台高配机器）
两台TIFlash机器

【概述】场景 + 问题概述
在控制页面dashboard 访问高级调试==》实时性能分析==> 有三台tikv机器出现状态报错，具体为
{“error”:true,“message”:“error.api.other: record not found”,“code”:“error.api.other”,“full_text”:“error.api.other: record not found\ at github.com/pingcap/tidb-dashboard/pkg/apiserver/utils.NewAPIError()\ \t/nfs/cache/mod/github.com/pingcap/tidb-dashboard@v0.0.0-20210826074103-29034af68525/pkg/apiserver/utils/error.go:67\ at github.com/pingcap/tidb-dashboard/pkg/apiserver/utils.MWHandleErrors.func1()\ \t/nfs/cache/mod/github.com/pingcap/tidb-dashboard@v0.0.0-20210826074103-29034af68525/pkg/apiserver/utils/error.go:96\ at github.com/gin-gonic/gin.(*Context).Next()\ \t/nfs/cache/mod/github.com/gin-gonic/gin@v1.5.0/context.go:147\ at github.com/gin-contrib/gzip.Gzip.func2()\ \t/nfs/cache/mod/github.com/gin-contrib/gzip@v0.0.1/gzip.go:47\ at github.com/gin-gonic/gin.(*Context).Next()\ \t/nfs/cache/mod/github.com/gin-gonic/gin@v1.5.0/context.go:147\ at github.com/gin-gonic/gin.RecoveryWithWriter.func1()\ \t/nfs/cache/mod/github.com/gin-gonic/gin@v1.5.0/recovery.go:83\ at github.com/gin-gonic/gin.(*Context).Next()\ \t/nfs/cache/mod/github.com/gin-gonic/gin@v1.5.0/context.go:147\ at github.com/gin-gonic/gin.(*Engine).handleHTTPRequest()\ \t/nfs/cache/mod/github.com/gin-gonic/gin@v1.5.0/gin.go:403\ at github.com/gin-gonic/gin.(*Engine).ServeHTTP()\ \t/nfs/cache/mod/github.com/gin-gonic/gin@v1.5.0/gin.go:364\ at github.com/pingcap/tidb-dashboard/pkg/apiserver.(*Service).handler()\ \t/nfs/cache/mod/github.com/pingcap/tidb-dashboard@v0.0.0-20210826074103-29034af68525/pkg/apiserver/apiserver.go:208\ at net/http.HandlerFunc.ServeHTTP()\ \t/usr/local/go/src/net/http/server.go:2069\ at github.com/pingcap/tidb-dashboard/pkg/utils.(*ServiceStatus).NewStatusAwareHandler.func1()\ \t/nfs/cache/mod/github.com/pingcap/tidb-dashboard@v0.0.0-20210826074103-29034af68525/pkg/utils/service_status.go:79\ at net/http.HandlerFunc.ServeHTTP()\ \t/usr/local/go/src/net/http/server.go:2069\ at net/http.(*ServeMux).ServeHTTP()\ \t/usr/local/go/src/net/http/server.go:2448\ at go.etcd.io/etcd/embed.(*accessController).ServeHTTP()\ \t/nfs/cache/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/embed/serve.go:359\ at net/http.serverHandler.ServeHTTP()\ \t/usr/local/go/src/net/http/server.go:2887\ at net/http.(*conn).serve()\ \t/usr/local/go/src/net/http/server.go:1952\ at runtime.goexit()\ \t/usr/local/go/src/runtime/asm_amd64.s:1371”}

【背景】做过哪些操作
dashboard控制台发起实时性能分析60s内

【现象】业务和数据库现象
tikv中的3台机器，出现接口访问失败，具体问上面概述中的错误，
然后3台TIKV离线，导致集群不可用，应用大面积瘫痪，
当时tikv 各种资源使用率都不高，（目测是假死现象）
重启tidb集群问题恢复

【问题】当前遇到的问题
每次发起实时性能分析都会引起TIDB集群性能抖动，实时性能分析都对tidb集群做了操作，对本身集群影响非常大
【业务影响】
应用查询超时（超时时间为5s），业务大面积瘫痪
【TiDB 版本】
5.2.1

【应用软件及版本】

【附件】相关日志及配置信息

TiUP Cluster Display 信息
TiUP CLuster Edit config 信息

监控（https://metricstool.pingcap.com/)

TiDB-Overview Grafana监控
TiDB Grafana 监控
TiKV Grafana 监控
PD Grafana 监控
对应模块日志（包含问题前后 1 小时日志）

若提问为性能优化、故障排查类问题，请下载脚本运行。终端输出的打印结果，请务必全选并复制粘贴上传。

yilong · 2021 年10 月 21 日 06:37

请问您的 tikv 是部署在什么目录？使用的是 lvm 划分的目录吗？
@Hacker_BpBoDoOJ 能麻烦您提供下重启之前到重启后的tikv日志吗？非常感谢。

Billmay表妹 · 2021 年10 月 25 日 09:11

请问问题解决了吗？

system · 2022 年10 月 31 日 19:16

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。