TiDB Dashboard在查询TopSQL时候报错，报错内容是" when searching tsids: the number of matching unique timeseries exceeds 300000"

dba-kit · 2024 年1 月 23 日 08:09

TiDB Dashboard在查询TopSQL时候报错，报错内容如下：

API: /topsql/summary
{"status":"error","errorType":"422","error":"error when executing query=\"sum_over_time(sql_exec_count{instance=\\\"172.18.243.39:20160\\\", instance_type=\\\"tikv\\\"}[52s])\" for (time=1705993080000, step=300000): cannot evaluate \"sum_over_time(sql_exec_count{instance=\\\"172.18.243.39:20160\\\", instance_type=\\\"tikv\\\"}[52s])\": search error after reading 0 data blocks: error when searching for tagFilters=[{__name__=\"sql_exec_count\", instance=\"172.18.243.39:20160\", instance_type=\"tikv\"}] on the time range [2024-01-23 06:48:00 +0000 UTC - 2024-01-23 06:58:00 +0000 UTC]: error when searching tsids: the number of matching unique timeseries exceeds 300000; either narrow down the search or increase -search.maxUniqueTimeseries"}

df统计了下数据量，发现总共有17G的数据，确实比其他集群大了很多，是因为数据量太大导致的？

1.6G	docdb
17G	tsdb
641M	wal

db_user · 2024 年1 月 23 日 08:18

应该是量太大的限制，

maxUniqueTimeseries error persists after using delete_series · Issue #706 · VictoriaMetrics/VictoriaMetrics (github.com)

dba-kit · 2024 年1 月 23 日 08:18

手动将tsdb这个目录清理后，当时确实能查询出来，但是到第二天又会报这个错，所以才怀疑是数据量导致的。
报错内容里提示：error when searching tsids: the number of matching unique timeseries exceeds 300000; either narrow down the search or increase -search.maxUniqueTimeseries
有办法通过参数调大 -search.maxUniqueTimeseries么？

db_user · 2024 年1 月 23 日 08:20

Top SQL: The number of matching unique timeseries exceeds 300000 · Issue #212 · pingcap/ng-monitoring (github.com)

dba远航 · 2024 年1 月 24 日 02:09

时间序列超出了

dba-kit · 2024 年1 月 26 日 06:39

升级到7.5之后，新增了两个参数，不过当前还必须手动改配置来，没办法通过TiUP配置。
修改方法是：

复制ngmonitoring的配置到新文件中。(原始的配置文件每次reload prometheus时候都会被覆盖)

cd /data/tidb-deploy/prometheus-9090
cp conf/ngmonitoring.toml conf/ngmonitoring-new.toml

在conf/ngmonitoring-new.toml中增加tsdb的两个配置项。我这里里缩短了一下tsdb的保留时间，以及调大了search-max-unique-timeseries的值

[tsdb]
# Data with timestamps outside the retentionPeriod is automatically deleted
# The following optional suffixes are supported: h (hour), d (day), w (week), y (year).
# If suffix isn't set, then the duration is counted in months.
retention-period = "7d"
# `search-max-unique-timeseries` limits the number of unique time series a single query can find and process.
# VictoriaMetrics(tsdb) keeps in memory some metainformation about the time series located by each query
# and spends some CPU time for processing the found time series. This means that the maximum memory usage
# and CPU usage a single query can use is proportional to `search-max-unique-timeseries`.
search-max-unique-timeseries = 9000000

修改scripts/ng-wrapper.sh，让其使用新的配置文件。(scripts里的文件，只有在tiup cluster upgrade时候才会被覆盖，常规的的scale-out/scale-in/reload操作并不会被覆盖)

#!/bin/bash

# WARNING: This file was auto-generated to restart ng-monitoring when fail.
#          Do not edit! All your edit might be overwritten!

while true
do
    bin/ng-monitoring-server         --config /home/tidb/tidb-deploy/prometheus-9090/conf/ngmonitoring-new.toml   >/dev/null 2>&1
    sleep 15s
done

dba-kit · 2024 年3 月 26 日 06:39

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。