【 TiDB 使用环境】生产环境
【 TiDB 版本】v.6.5.1
将集群升级到6.5.1 之后,有两个集群出现dashboard 异常的问题。将topsql 自动给关闭了,并且无法打开,
dashboard 报错截图如下:
登录到数据库中发现参数tidb_enable_top_sql 是OFF ,设置为on 之后依然无法开启topsql 的功能。
升级完成后,发现ng.log 中主动关闭了topsql 的功能,日志如下:
[2023/04/07 15:56:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 15:57:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 15:58:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 15:58:52.517 +08:00] [WARN] [client.go:107] ["Request failed"] [kindTag=PD] [url=http://10.105.129.19:2429/pd/api/v1/members] [responseStatus="503 Service Unavailable"] [responseBody="no leader"] [error="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503"] [errorVerbose="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Client).handleAfterResponseHook()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/client.go:81\n at github.com/go-resty/resty/v2.(*Client).execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/client.go:947\n at github.com/go-resty/resty/v2.(*Request).Execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/request.go:729\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Execute()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:102\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Get()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:76\n at github.com/pingcap/tidb-dashboard/util/client/pdclient.(*APIClient).GetMembers()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/pdclient/pd_api.go:43\n at github.com/pingcap/tidb-dashboard/util/topo.GetPDInstances()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/topo/pd.go:28\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).getPDComponents()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:168\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchAllScrapeTargets()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:130\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchTopology()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:95\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).loadTopologyLoop()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:81\n at github.com/pingcap/ng-monitoring/utils.GoWithRecovery()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/utils/misc.go:26\n at runtime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"]
[2023/04/07 15:58:52.517 +08:00] [ERROR] [discovery.go:83] ["load topology failed"] [error="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503"] [errorVerbose="http_client.server_error: GET http://10.105.129.19:2429/pd/api/v1/members (PD): Response status 503\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Client).handleAfterResponseHook()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/client.go:81\n at github.com/go-resty/resty/v2.(*Client).execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/client.go:947\n at github.com/go-resty/resty/v2.(*Request).Execute()\n\t/go/pkg/mod/github.com/go-resty/resty/v2@v2.6.0/request.go:729\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Execute()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:102\n at github.com/pingcap/tidb-dashboard/util/client/httpclient.(*Request).Get()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/httpclient/request.go:76\n at github.com/pingcap/tidb-dashboard/util/client/pdclient.(*APIClient).GetMembers()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/client/pdclient/pd_api.go:43\n at github.com/pingcap/tidb-dashboard/util/topo.GetPDInstances()\n\t/go/pkg/mod/github.com/pingcap/tidb-dashboard/util@v0.0.0-20211014081729-82f8b809f5ae/topo/pd.go:28\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).getPDComponents()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:168\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchAllScrapeTargets()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:130\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).fetchTopology()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:95\n at github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).loadTopologyLoop()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:81\n at github.com/pingcap/ng-monitoring/utils.GoWithRecovery()\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/utils/misc.go:26\n at runtime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [stack="github.com/pingcap/ng-monitoring/component/topology.(*TopologyDiscoverer).loadTopologyLoop\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/component/topology/discovery.go:83\ngithub.com/pingcap/ng-monitoring/utils.GoWithRecovery\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/ng-monitoring/utils/misc.go:26"]
[2023/04/07 15:58:56.038 +08:00] [INFO] [pdvariable.go:116] ["global config watch channel closed"]
[2023/04/07 15:59:22.505 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:00:11.063 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:00:13.064 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [retried=1]
[2023/04/07 16:00:18.065 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:00:22.066 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [retried=2]
[2023/04/07 16:00:22.504 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:00:22.530 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:00:22.530 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:00:52.531 +08:00] [INFO] [scraper.go:68] ["starting to scrape Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:00:57.241 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:00:59.242 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [retried=1]
[2023/04/07 16:01:04.242 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:01:08.242 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [retried=2]
[2023/04/07 16:01:22.505 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:01:48.965 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:01:50.966 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [retried=1]
[2023/04/07 16:01:55.966 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [error="context deadline exceeded"]
[2023/04/07 16:01:59.967 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [retried=2]
[2023/04/07 16:02:22.505 +08:00] [INFO] [pdvariable.go:110] ["load global config"] [cfg="{\"EnableTopSQL\":true}"]
[2023/04/07 16:02:22.976 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:24.977 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [retried=1]
[2023/04/07 16:02:29.977 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:33.978 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [retried=2]
[2023/04/07 16:02:38.979 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:46.979 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [retried=3]
[2023/04/07 16:02:51.101 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:02:53.102 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [retried=1]
[2023/04/07 16:02:58.103 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [error="context deadline exceeded"]
[2023/04/07 16:03:02.103 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [retried=2]
[2023/04/07 16:03:06.278 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Unavailable desc = transport is closing"]
[2023/04/07 16:03:08.279 +08:00] [WARN] [scraper.go:236] ["retry to scrape component"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"] [retried=1]
[2023/04/07 16:03:11.982 +08:00] [INFO] [main.go:108] ["received signal"] [sig=terminated]
[2023/04/07 16:03:11.982 +08:00] [INFO] [http.go:79] ["shutting down http server"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [http.go:81] ["http server is down"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [subscriber.go:48] ["stopping Top SQL scrapers"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.129.127\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:248] ["failed to dial scrape target"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"] [error="context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.105.128.164\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.94\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.128.77\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tikv\",\"ip\":\"10.105.129.9\",\"port\":20213,\"status_port\":20232}"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.48\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.48\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [WARN] [scraper.go:265] ["failed to call Subscribe"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"] [error="rpc error: code = Canceled desc = context canceled"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [scraper.go:71] ["stop scraping Top SQL from the component"] [component="{\"name\":\"tidb\",\"ip\":\"10.33.32.47\",\"port\":5740,\"status_port\":10132}"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [subscriber.go:51] ["stop Top SQL scrapers successfully"]
[2023/04/07 16:03:11.983 +08:00] [INFO] [database.go:20] ["Stopping timeseries database"]
[2023/04/07 16:03:12.031 +08:00] [INFO] [database.go:22] ["Stop timeseries database successfully"]
[2023/04/07 16:03:12.031 +08:00] [INFO] [database.go:24] ["Stopping document database"]
[2023/04/07 16:03:12.031 +08:00] [INFO] [gc.go:23] ["badger stop running value log gc loop"]
[2023/04/07 16:03:12.052 +08:00] [INFO] [database.go:26] ["Stop document database successfully"]
请问这种问题如何解决 ?
另外在另外一个集群查看dashboard 时 总是获取非常旧的prometheus 的地址,导致监控信息显示失败,如下图
并且在线实例中也是显示了非常旧的地址
这种情况如何解决 ?