v6.5.1 dashboard 异常

【 TiDB 使用环境】生产环境
【 TiDB 版本】v.6.5.1
将集群升级到6.5.1 之后,有两个集群出现dashboard 异常的问题。将topsql 自动给关闭了,并且无法打开,
dashboard 报错截图如下:

登录到数据库中发现参数tidb_enable_top_sql 是OFF ,设置为on 之后依然无法开启topsql 的功能。

升级完成后,发现ng.log 中主动关闭了topsql 的功能,日志如下:

[2023/04/07 15:56:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 15:57:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 15:58:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 15:58:52.517 +08:00] [WARN] [client.go:107] ["Request failed"] [kindTag=PD] [url=http://10.105.129.19:2429/pd/api/v1/members] [responseStatus="503 Service Unavailable"] [responseBody="no leader"] [error="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503"] [errorVerbose="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Client).handleAfterResponseHook()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/client.go:81\n at github.com/go-resty/resty/v2.(*Client).execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/client.go:947\n at github.com/go-resty/resty/v2.(*Request).Execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/request.go:729\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Execute()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:102\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Get()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:76\n at github.com/pingcap/tidb-dashboard/util/client/pdclient.(*APIClient).GetMembers()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/pdclient/pd_api.go:43\n at github.com/pingcap/tidb-dashboard/util/topo.GetPDInstances()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/topo/pd.go:28\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).getPDComponents()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:168\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchAllScrapeTargets()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:130\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchTopology()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:95\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).loadTopologyLoop()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:81\n at github.com/pingcap/ng-monitoring/utils.GoWithRecovery()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/utils/misc.go:26\n at runtime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"]
[2023/04/07 15:58:52.517 +08:00] [ERROR] [discovery.go:83] ["load topology failed"] [error="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503"] [errorVerbose="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Client).handleAfterResponseHook()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/client.go:81\n at github.com/go-resty/resty/v2.(*Client).execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/client.go:947\n at github.com/go-resty/resty/v2.(*Request).Execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/request.go:729\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Execute()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:102\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Get()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:76\n at github.com/pingcap/tidb-dashboard/util/client/pdclient.(*APIClient).GetMembers()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/pdclient/pd_api.go:43\n at github.com/pingcap/tidb-dashboard/util/topo.GetPDInstances()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/topo/pd.go:28\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).getPDComponents()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:168\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchAllScrapeTargets()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:130\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchTopology()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:95\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).loadTopologyLoop()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:81\n at github.com/pingcap/ng-monitoring/utils.GoWithRecovery()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/utils/misc.go:26\n at runtime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [stack="github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).loadTopologyLoop\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:83\ngithub.com/pingcap/ng-monitoring/utils.GoWithRecovery\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/utils/misc.go:26"]
[2023/04/07 15:58:56.038 +08:00] [INFO] [pdvariable.go:116] ["global config watch channel closed"]
[2023/04/07 15:59:22.505 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:00:11.063 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:00:13.064 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [retried=1]
[2023/04/07 16:00:18.065 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:00:22.066 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [retried=2]
[2023/04/07 16:00:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:00:22.530 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:00:22.530 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:00:52.531 +08:00] [INFO] [scraper.go:68] ["starting to scrape Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:00:57.241 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:00:59.242 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [retried=1]
[2023/04/07 16:01:04.242 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:01:08.242 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [retried=2]
[2023/04/07 16:01:22.505 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:01:48.965 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:01:50.966 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [retried=1]
[2023/04/07 16:01:55.966 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:01:59.967 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [retried=2]
[2023/04/07 16:02:22.505 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:02:22.976 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:24.977 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [retried=1]
[2023/04/07 16:02:29.977 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:33.978 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [retried=2]
[2023/04/07 16:02:38.979 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:46.979 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [retried=3]
[2023/04/07 16:02:51.101 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:53.102 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [retried=1]
[2023/04/07 16:02:58.103 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:03:02.103 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [retried=2]
[2023/04/07 16:03:06.278 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Unavailable desc = transport is closing"]
[2023/04/07 16:03:08.279 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"] [retried=1]
[2023/04/07 16:03:11.982 +08:00] [INFO] [main.go:108] ["received signal"] [sig=terminated]
[2023/04/07 16:03:11.982 +08:00] [INFO] [http.go:79] ["shutting down http server"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [http.go:81] ["http server is down"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [subscriber.go:48] ["stopping Top SQL scrapers"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.48\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.48\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [subscriber.go:51] ["stop Top SQL scrapers successfully"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [database.go:20] ["Stopping timeseries database"]
[2023/04/07 16:03:12.031 +08:00] [INFO] [database.go:22] ["Stop timeseries database successfully"]
[2023/04/07 16:03:12.031 +08:00] [INFO] [database.go:24] ["Stopping document database"]
[2023/04/07 16:03:12.031 +08:00] [INFO] [gc.go:23] ["badger stop running value log gc loop"]
[2023/04/07 16:03:12.052 +08:00] [INFO] [database.go:26] ["Stop document database successfully"]

请问这种问题如何解决 ?

另外在另外一个集群查看dashboard 时 总是获取非常旧的prometheus 的地址,导致监控信息显示失败,如下图


并且在线实例中也是显示了非常旧的地址

这种情况如何解决 ?

https://docs.pingcap.com/zh/tidb/stable/dashboard-faq#界面提示-集群中未启动必要组件-ngmonitoring
参考这个部署一下

参考了,一通折腾,不好用呢,ng_port 什么的都是配置好的。关键是我上一个版本都开的好好地,为啥升级上来就给关了呢

安装过了是吗,ng进程是不是启动了啊,可以看下ng日志的报错,一般来说部署完就好了
可能是老版本升级上来会有问题吧,这个不清楚

已经配置了还是提示:集群中未启动必要组件 NgMonitoring,部分功能将不可用。 - :ringer_planet: TiDB 技术问题 - TiDB 的问答社区 (asktug.com)

NgMonitoring 无法启动问题 - :ringer_planet: TiDB 技术问题 / 部署&运维管理 - TiDB 的问答社区 (asktug.com)
看看这个

目测解决问题了,所有升级到6.5.1 的集群,ng-monitor 的这个配置文件生成的有问题,endpoint 中的元素没有用逗号分隔,改一下重启prometheus 就好了。

TiUP v1.12.1 已经发布,修复了这个问题。升级后 tiup cluster reload xxx -R prometheus 可以解决

1 个赞

好的,多谢

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。