remove掉指定的runaway不生效

Bug 反馈
清晰准确地描述您发现的问题,提供任何可能复现问题的步骤有助于研发同学及时处理问题
【 TiDB 版本】8.0.11-TiDB-v8.1.0
【 Bug 的影响】query watch remove <runaway_id>不生效

【可能的问题复现步骤】

ALTER RESOURCE GROUP default QUERY_LIMIT=(EXEC_ELAPSED='1h', ACTION=KILL,  WATCH=PLAN DURATION='10m');
QUERY WATCH ADD RESOURCE GROUP rg1 SQL DIGEST '299bb8320165dc1a1ab39861d5ed2826189ba7e30c3eb3aceace6ac7a4b78354';
SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id;
query watch remove <id>;

从上图中可以看到query watch remove 30005未能将ID为30005的runaway_watch杀掉,只能重启来清空所有RUNAWAY_WATCHES。

TiDB的查询监视功能(Query Watch)不按预期工作的情况,具体是query watch remove 命令不能成功移除指定的监控项。

复现没法复现成功,请问当时 tidb 日志还有吗?

我这里是从7.5.1升级上来的,每次都必现。如下:

ALTER RESOURCE GROUP default QUERY_LIMIT=(EXEC_ELAPSED='1h', ACTION=KILL,  WATCH=PLAN DURATION='10m');
QUERY WATCH ADD RESOURCE GROUP default SQL text SIMILAR to 'select count(*) from tpch1.customer';
SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id;
query watch remove <id>;
SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id;

因日志量比较少,我直接贴这里了(上面步骤之前清理并重启了tidb-server):

[tidb@localhost log]$ cat tidb.log
[2024/06/17 20:32:01.531 +08:00] [INFO] [meminfo.go:196] ["use physical memory hook"] [cgroupMemorySize=9223372036854771712] [physicalMemorySize=67240763392]
[2024/06/17 20:32:01.532 +08:00] [INFO] [cpuprofile.go:113] ["parallel cpu profiler started"]
[2024/06/17 20:32:01.532 +08:00] [INFO] [cgmon.go:60] ["cgroup monitor started"]
[2024/06/17 20:32:01.532 +08:00] [INFO] [printer.go:47] ["Welcome to TiDB."] ["Release Version"=v8.1.0] [Edition=Community] ["Git Commit Hash"=945d07c5d5c7a1ae212f6013adfb187f2de24b23] ["Git Branch"=HEAD] ["UTC Build Time"="2024-05-21 03:51:57"] [GoVersion=go1.21.10] ["Race Enabled"=false] ["Check Table Before Drop"=false]
[2024/06/17 20:32:01.532 +08:00] [INFO] [cgmon.go:130] ["set the maxprocs"] [quota=2]
[2024/06/17 20:32:01.532 +08:00] [INFO] [printer.go:52] ["loaded config"] [config="{\"host\":\"0.0.0.0\",\"advertise-address\":\"192.168.31.201\",\"port\":4000,\"cors\":\"\",\"store\":\"tikv\",\"path\":\"192.168.31.201:2379\",\"socket\":\"/tmp/tidb-4000.sock\",\"lease\":\"45s\",\"split-table\":true,\"token-limit\":1000,\"temp-dir\":\"/data/tidb-data\",\"tmp-storage-path\":\"/data/tidb-data/1000_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage\",\"tmp-storage-quota\":-1,\"server-version\":\"\",\"version-comment\":\"\",\"tidb-edition\":\"\",\"tidb-release-version\":\"\",\"keyspace-name\":\"\",\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":null,\"enable-timestamp\":null,\"disable-error-stack\":null,\"enable-error-stack\":null,\"file\":{\"filename\":\"/data/tidb-deploy/tidb-4000/log/tidb.log\",\"max-size\":300,\"max-days\":0,\"max-backups\":0,\"compression\":\"\"},\"slow-query-file\":\"/data/tidb-deploy/tidb-4000/log/tidb_slow_query.log\",\"expensive-threshold\":10000,\"general-log-file\":\"\",\"query-log-max-len\":4096,\"enable-slow-log\":true,\"slow-threshold\":300,\"record-plan-in-slow-log\":1,\"timeout\":0},\"instance\":{\"tidb_general_log\":false,\"tidb_pprof_sql_cpu\":false,\"ddl_slow_threshold\":300,\"tidb_expensive_query_time_threshold\":60,\"tidb_expensive_txn_time_threshold\":600,\"tidb_stmt_summary_enable_persistent\":false,\"tidb_stmt_summary_filename\":\"tidb-statements.log\",\"tidb_stmt_summary_file_max_days\":3,\"tidb_stmt_summary_file_max_size\":64,\"tidb_stmt_summary_file_max_backups\":0,\"tidb_enable_slow_log\":true,\"tidb_slow_log_threshold\":300,\"tidb_record_plan_in_slow_log\":1,\"tidb_check_mb4_value_in_utf8\":true,\"tidb_force_priority\":\"NO_PRIORITY\",\"tidb_memory_usage_alarm_ratio\":0.8,\"tidb_enable_collect_execution_info\":true,\"plugin_dir\":\"/data/deploy/plugin\",\"plugin_load\":\"\",\"max_connections\":0,\"tidb_enable_ddl\":true,\"tidb_rc_read_check_ts\":false,\"tidb_service_scope\":\"\"},\"security\":{\"skip-grant-table\":false,\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\",\"cluster-ssl-ca\":\"\",\"cluster-ssl-cert\":\"\",\"cluster-ssl-key\":\"\",\"cluster-verify-cn\":null,\"session-token-signing-cert\":\"\",\"session-token-signing-key\":\"\",\"spilled-file-encryption-method\":\"plaintext\",\"enable-sem\":false,\"auto-tls\":false,\"tls-version\":\"\",\"rsa-key-size\":4096,\"secure-bootstrap\":false,\"auth-token-jwks\":\"\",\"auth-token-refresh-interval\":\"1h0m0s\",\"disconnect-on-expired-password\":true},\"status\":{\"status-host\":\"0.0.0.0\",\"metrics-addr\":\"\",\"status-port\":10080,\"metrics-interval\":15,\"report-status\":true,\"record-db-qps\":false,\"record-db-label\":false,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-concurrent-streams\":1024,\"grpc-initial-window-size\":2097152,\"grpc-max-send-msg-size\":2147483647},\"performance\":{\"max-procs\":0,\"max-memory\":0,\"server-memory-quota\":0,\"stats-lease\":\"3s\",\"stmt-count-limit\":5000,\"pseudo-estimate-ratio\":0.8,\"bind-info-lease\":\"3s\",\"txn-entry-size-limit\":6291456,\"txn-total-size-limit\":104857600,\"tcp-keep-alive\":true,\"tcp-no-delay\":true,\"cross-join\":true,\"distinct-agg-push-down\":false,\"projection-push-down\":false,\"max-txn-ttl\":3600000,\"index-usage-sync-lease\":\"\",\"plan-replayer-gc-lease\":\"10m\",\"gogc\":100,\"enforce-mpp\":false,\"stats-load-concurrency\":5,\"stats-load-queue-size\":1000,\"analyze-partition-concurrency-quota\":16,\"plan-replayer-dump-worker-concurrency\":1,\"enable-stats-cache-mem-quota\":true,\"committer-concurrency\":128,\"run-auto-analyze\":true,\"force-priority\":\"NO_PRIORITY\",\"memory-usage-alarm-ratio\":0.8,\"enable-load-fmsketch\":false,\"lite-init-stats\":true,\"force-init-stats\":true,\"concurrently-init-stats\":false},\"prepared-plan-cache\":{\"enabled\":true,\"capacity\":100,\"memory-guard-ratio\":0.1},\"opentracing\":{\"enable\":false,\"rpc-metrics\":false,\"sampler\":{\"type\":\"const\",\"param\":1,\"sampling-server-url\":\"\",\"max-operations\":0,\"sampling-refresh-interval\":0},\"reporter\":{\"queue-size\":0,\"buffer-flush-interval\":0,\"log-spans\":false,\"local-agent-host-port\":\"\"}},\"proxy-protocol\":{\"networks\":\"\",\"header-timeout\":5,\"fallbackable\":false},\"pd-client\":{\"pd-server-timeout\":3},\"tikv-client\":{\"grpc-connection-count\":4,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-compression-type\":\"none\",\"grpc-shared-buffer-pool\":false,\"grpc-initial-window-size\":134217728,\"grpc-initial-conn-window-size\":134217728,\"commit-timeout\":\"41s\",\"async-commit\":{\"keys-limit\":256,\"total-key-size-limit\":4096,\"safe-window\":2000000000,\"allowed-clock-drift\":500000000},\"max-batch-size\":128,\"overload-threshold\":200,\"max-batch-wait-time\":0,\"batch-wait-size\":8,\"enable-chunk-rpc\":true,\"region-cache-ttl\":600,\"store-limit\":0,\"store-liveness-timeout\":\"1s\",\"copr-cache\":{\"capacity-mb\":0},\"copr-req-timeout\":60000000000,\"ttl-refreshed-txn-size\":33554432,\"resolve-lock-lite-threshold\":16,\"max-concurrency-request-limit\":9223372036854775807,\"enable-replica-selector-v2\":true},\"binlog\":{\"enable\":false,\"ignore-error\":false,\"write-timeout\":\"15s\",\"binlog-socket\":\"\",\"strategy\":\"range\"},\"compatible-kill-query\":false,\"pessimistic-txn\":{\"max-retry-count\":256,\"deadlock-history-capacity\":10,\"deadlock-history-collect-retryable\":false,\"pessimistic-auto-commit\":false,\"constraint-check-in-place-pessimistic\":true},\"max-index-length\":3072,\"index-limit\":64,\"table-column-count-limit\":1017,\"graceful-wait-before-shutdown\":0,\"alter-primary-key\":false,\"treat-old-version-utf8-as-utf8mb4\":true,\"enable-table-lock\":false,\"delay-clean-table-lock\":0,\"split-region-max-num\":1000,\"top-sql\":{\"receiver-address\":\"\"},\"repair-mode\":false,\"repair-table-list\":[],\"isolation-read\":{\"engines\":[\"tikv\",\"tiflash\",\"tidb\"]},\"new_collations_enabled_on_first_bootstrap\":true,\"experimental\":{\"allow-expression-index\":false},\"skip-register-to-dashboard\":false,\"enable-telemetry\":false,\"labels\":{},\"enable-global-index\":false,\"deprecate-integer-display-length\":false,\"enable-enum-length-limit\":true,\"stores-refresh-interval\":60,\"enable-tcp4-only\":false,\"enable-forwarding\":false,\"max-ballast-object-size\":0,\"ballast-object-size\":0,\"transaction-summary\":{\"transaction-summary-capacity\":500,\"transaction-id-digest-min-duration\":2147483647},\"enable-global-kill\":true,\"enable-32bits-connection-id\":true,\"initialize-sql-file\":\"\",\"enable-batch-dml\":false,\"mem-quota-query\":1073741824,\"oom-action\":\"cancel\",\"oom-use-tmp-storage\":true,\"check-mb4-value-in-utf8\":true,\"enable-collect-execution-info\":true,\"plugin\":{\"dir\":\"/data/deploy/plugin\",\"load\":\"\"},\"max-server-connections\":0,\"run-ddl\":true,\"disaggregated-tiflash\":false,\"autoscaler-type\":\"aws\",\"autoscaler-addr\":\"tiflash-autoscale-lb.tiflash-autoscale.svc.cluster.local:8081\",\"is-tiflashcompute-fixed-pool\":false,\"autoscaler-cluster-id\":\"\",\"use-autoscaler\":false,\"tidb-max-reuse-chunk\":64,\"tidb-max-reuse-column\":256,\"tidb-enable-exit-check\":false,\"in-mem-slow-query-topn-num\":30,\"in-mem-slow-query-recent-num\":500}"]
[2024/06/17 20:32:01.532 +08:00] [INFO] [main.go:467] ["disable Prometheus push client"]
[2024/06/17 20:32:01.532 +08:00] [INFO] [store.go:76] ["new store"] [path=tikv://192.168.31.201:2379]
[2024/06/17 20:32:01.532 +08:00] [INFO] [cgmon.go:154] ["set the memory limit"] [memLimit=67240763392]
[2024/06/17 20:32:01.532 +08:00] [INFO] [systime_mon.go:26] ["start system time monitor"]
[2024/06/17 20:32:01.534 +08:00] [INFO] [pd_service_discovery.go:1016] ["[pd] switch leader"] [new-leader=http://192.168.31.201:2379] [old-leader=]
[2024/06/17 20:32:01.534 +08:00] [INFO] [pd_service_discovery.go:498] ["[pd] init cluster id"] [cluster-id=7375541095005751330]
[2024/06/17 20:32:01.535 +08:00] [INFO] [client.go:613] ["[pd] changing service mode"] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2024/06/17 20:32:01.535 +08:00] [INFO] [tso_client.go:293] ["[tso] switch dc tso global allocator serving url"] [dc-location=global] [new-url=http://192.168.31.201:2379]
[2024/06/17 20:32:01.535 +08:00] [INFO] [client.go:619] ["[pd] service mode changed"] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2024/06/17 20:32:01.535 +08:00] [INFO] [tso_dispatcher.go:119] ["[tso] start tso deadline watcher"] [dc-location=global]
[2024/06/17 20:32:01.535 +08:00] [INFO] [tso_dispatcher.go:168] ["[tso] tso dispatcher created"] [dc-location=global]
[2024/06/17 20:32:01.535 +08:00] [INFO] [tso_dispatcher.go:336] ["[tso] start tso connection contexts updater"] [dc-location=global]
[2024/06/17 20:32:01.535 +08:00] [INFO] [tso_client.go:130] ["[tso] start tso dispatcher check loop"]
[2024/06/17 20:32:01.535 +08:00] [INFO] [tikv_driver.go:197] ["using API V1."]
[2024/06/17 20:32:01.536 +08:00] [INFO] [store.go:82] ["new store with retry success"]
[2024/06/17 20:32:01.537 +08:00] [INFO] [store_cache.go:510] ["change store resolve state"] [store=1] [addr=192.168.31.201:20160] [from=unresolved] [to=resolved] [liveness-state=reachable]
[2024/06/17 20:32:01.542 +08:00] [INFO] [tidb.go:80] ["new domain"] [store=tikv-7375541095005751330] ["ddl lease"=45s] ["stats lease"=3s]
[2024/06/17 20:32:01.550 +08:00] [WARN] [info.go:316] ["init TiFlashReplicaManager"]
[2024/06/17 20:32:01.554 +08:00] [INFO] [domain.go:2796] [acquireServerID] [serverID=1287] ["lease id"=6a9b8ffd9bb68972]
[2024/06/17 20:32:01.555 +08:00] [INFO] [controller.go:185] ["load resource controller config"] [config="{\"degraded-mode-wait-duration\":\"0s\",\"ltb-max-wait-duration\":\"30s\",\"wait-retry-interval\":\"50ms\",\"wait-retry-times\":10,\"request-unit\":{\"read-base-cost\":0.125,\"read-per-batch-base-cost\":0.5,\"read-cost-per-byte\":0.0000152587890625,\"write-base-cost\":1,\"write-per-batch-base-cost\":1,\"write-cost-per-byte\":0.0009765625,\"read-cpu-ms-cost\":0.3333333333333333},\"enable-controller-trace-log\":\"false\"}"] [ru-config="{\"ReadBaseCost\":0.125,\"ReadPerBatchBaseCost\":0.5,\"ReadBytesCost\":0.0000152587890625,\"WriteBaseCost\":1,\"WritePerBatchBaseCost\":1,\"WriteBytesCost\":0.0009765625,\"CPUMsCost\":0.3333333333333333,\"LTBMaxWaitDuration\":30000000000,\"WaitRetryInterval\":50000000,\"WaitRetryTimes\":10,\"DegradedModeWaitDuration\":0}"]
[2024/06/17 20:32:01.588 +08:00] [INFO] [domain.go:315] ["full load InfoSchema success"] [currentSchemaVersion=0] [neededSchemaVersion=1204] ["start time"=31.77388ms]
[2024/06/17 20:32:01.589 +08:00] [INFO] [domain.go:635] ["full load and reset schema validator"]
[2024/06/17 20:32:01.589 +08:00] [INFO] [ddl.go:817] ["start DDL"] [category=ddl] [ID=aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392] [runWorker=true]
[2024/06/17 20:32:01.589 +08:00] [INFO] [ddl.go:776] ["start delRangeManager OK"] [category=ddl] ["is a emulator"=false]
[2024/06/17 20:32:01.590 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[ddl] /tidb/ddl/fg/owner"]
[2024/06/17 20:32:01.591 +08:00] [INFO] [env.go:101] ["the ingest sorted directory"] [category=ddl-ingest] ["data path"=/data/tidb-data/tmp_ddl-4000]
[2024/06/17 20:32:01.591 +08:00] [INFO] [env.go:74] ["init global ingest backend environment finished"] [category=ddl-ingest] ["memory limitation"=33620381696] ["disk usage info"="disk usage: 251069968384/418659696640, backend usage: 0"] ["max open file number"=1000000] ["lightning is initialized"=true]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=loadSchemaInLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=mdlCheckLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=topNSlowQueryLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=infoSyncerKeeper]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=globalConfigSyncerKeeper]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=runawayRecordFlushLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=runawayWatchSyncLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=requestUnitsWriterLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=topologySyncerKeeper]
[2024/06/17 20:32:01.591 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=closestReplicaReadCheckLoop]
[2024/06/17 20:32:01.591 +08:00] [INFO] [owner_daemon.go:70] ["begin advancer daemon"] [daemon-id=LogBackup::Advancer]
[2024/06/17 20:32:01.591 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[log-backup] /tidb/br-stream/owner"]
[2024/06/17 20:32:01.591 +08:00] [INFO] [job_table.go:334] ["get global state and global state change"] [category=ddl] [oldState=false] [currState=false]
[2024/06/17 20:32:01.592 +08:00] [INFO] [manager.go:354] ["get owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392"] ["owner key"=/tidb/ddl/fg/owner/6a9b8ffd9bb6897f] [ownerID=aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392] [op=none]
[2024/06/17 20:32:01.592 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=logBackupAdvancer]
[2024/06/17 20:32:01.592 +08:00] [INFO] [owner_daemon.go:81] ["begin running daemon"] [id=f6eac5ed-33eb-4e9c-90b5-26e8d2fd01b4] [daemon-id=LogBackup::Advancer]
[2024/06/17 20:32:01.596 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[bindinfo] /tidb/bindinfo/owner"]
[2024/06/17 20:32:01.596 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=globalBindHandleWorkerLoop]
[2024/06/17 20:32:01.597 +08:00] [WARN] [sysvar_cache.go:49] ["sysvar cache is empty, triggering rebuild"]
[2024/06/17 20:32:01.598 +08:00] [INFO] [manager.go:354] ["get owner"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392"] ["owner key"=/tidb/bindinfo/owner/6a9b8ffd9bb6898a] [ownerID=aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392] [op=none]
[2024/06/17 20:32:01.599 +08:00] [INFO] [sysvar.go:2748] ["set resource control"] [enable=true]
[2024/06/17 20:32:01.599 +08:00] [INFO] [controller.go:447] ["[resource group controller] create resource group cost controller"] [name=default]
[2024/06/17 20:32:01.607 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=loadPrivilegeInLoop]
[2024/06/17 20:32:01.609 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=LoadSysVarCacheLoop]
[2024/06/17 20:32:01.610 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=PlanReplayerTaskCollectHandle]
[2024/06/17 20:32:01.610 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=PlanReplayerTaskDumpHandle]
[2024/06/17 20:32:01.610 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=dumpFileGcChecker]
[2024/06/17 20:32:01.610 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=HistoricalStatsWorker]
[2024/06/17 20:32:01.611 +08:00] [INFO] [domain.go:2016] ["PlanReplayerTaskCollectHandle started"]
[2024/06/17 20:32:01.610 +08:00] [INFO] [domain.go:2038] ["PlanReplayerTaskDumpHandle started"]
[2024/06/17 20:32:01.611 +08:00] [INFO] [plan_replayer.go:409] ["planReplayerTaskDumpWorker started."]
[2024/06/17 20:32:01.611 +08:00] [INFO] [domain.go:2070] ["dumpFileGcChecker started"]
[2024/06/17 20:32:01.611 +08:00] [INFO] [domain.go:2103] ["HistoricalStatsWorker started"]
[2024/06/17 20:32:01.611 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=loadStatsWorker]
[2024/06/17 20:32:01.612 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[stats] /tidb/stats/owner"]
[2024/06/17 20:32:01.615 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=indexUsageWorker]
[2024/06/17 20:32:01.615 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=updateStatsWorker]
[2024/06/17 20:32:01.615 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=autoAnalyzeWorker]
[2024/06/17 20:32:01.615 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=analyzeJobsCleanupWorker]
[2024/06/17 20:32:01.615 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=loadSigningCertLoop]
[2024/06/17 20:32:01.615 +08:00] [INFO] [domain.go:2389] ["updateStatsWorker started."]
[2024/06/17 20:32:01.616 +08:00] [INFO] [notifier.go:225] ["etcd notify loop to watch timer events started"] [EtcdKey=/tidb/timer/cluster/1/notify/94dd819d-059b-4ac7-9c16-c29e684370c6]
[2024/06/17 20:32:01.616 +08:00] [INFO] [gc_worker.go:195] [start] [category="gc worker"] [uuid=64098bb51400001]
[2024/06/17 20:32:01.617 +08:00] [INFO] [task_manager.go:217] ["scale ttl worker"] [ttl-worker=job-manager] [ttl-worker=task-manager] [originalCount=0] [newCount=4]
[2024/06/17 20:32:01.617 +08:00] [INFO] [task_manager.go:217] ["scale ttl worker"] [ttl-worker=job-manager] [ttl-worker=task-manager] [originalCount=0] [newCount=4]
[2024/06/17 20:32:01.618 +08:00] [INFO] [manager.go:354] ["get owner"] ["owner info"="[stats] /tidb/stats/owner ownerManager aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392"] ["owner key"=/tidb/stats/owner/6a9b8ffd9bb6898f] [ownerID=aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392] [op=none]
[2024/06/17 20:32:01.621 +08:00] [INFO] [domain.go:2308] ["init stats info time"] [lite=true] ["take time"=9.027951ms]
[2024/06/17 20:32:01.622 +08:00] [INFO] [manager.go:113] ["build task executor manager"] [total-cpu=2] [total-mem=62.62GiB]
[2024/06/17 20:32:01.624 +08:00] [INFO] [wait_group_wrapper.go:133] ["background process started"] [source=domain] [process=distTaskFrameworkLoop]
[2024/06/17 20:32:01.624 +08:00] [WARN] [misc.go:464] ["Automatic TLS Certificate creation is disabled"] []
[2024/06/17 20:32:01.624 +08:00] [INFO] [manager.go:146] ["task executor manager start"]
[2024/06/17 20:32:01.624 +08:00] [INFO] [domain.go:1525] ["dist task executor manager started"]
[2024/06/17 20:32:01.625 +08:00] [INFO] [cpu.go:83] ["sql cpu collector started"]
[2024/06/17 20:32:01.625 +08:00] [INFO] [http_status.go:101] ["for status and metrics report"] ["listening on addr"=0.0.0.0:10080]
[2024/06/17 20:32:01.625 +08:00] [INFO] [server.go:308] ["server is running MySQL protocol"] [addr=0.0.0.0:4000]
[2024/06/17 20:32:01.625 +08:00] [INFO] [server.go:322] ["server is running MySQL protocol"] [socket=/tmp/tidb-4000.sock]
[2024/06/17 20:32:01.625 +08:00] [INFO] [store.go:76] ["new store"] [path=tikv://192.168.31.201:2379]
[2024/06/17 20:32:01.627 +08:00] [INFO] [pd_service_discovery.go:1016] ["[pd] switch leader"] [new-leader=http://192.168.31.201:2379] [old-leader=]
[2024/06/17 20:32:01.627 +08:00] [INFO] [pd_service_discovery.go:498] ["[pd] init cluster id"] [cluster-id=7375541095005751330]
[2024/06/17 20:32:01.627 +08:00] [INFO] [client.go:613] ["[pd] changing service mode"] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_client.go:293] ["[tso] switch dc tso global allocator serving url"] [dc-location=global] [new-url=http://192.168.31.201:2379]
[2024/06/17 20:32:01.627 +08:00] [INFO] [client.go:619] ["[pd] service mode changed"] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_dispatcher.go:119] ["[tso] start tso deadline watcher"] [dc-location=global]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_dispatcher.go:137] ["[tso] exit tso deadline watcher"] [dc-location=global]
[2024/06/17 20:32:01.627 +08:00] [INFO] [pd_service_discovery.go:550] ["[pd] exit member loop due to context canceled"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_dispatcher.go:168] ["[tso] tso dispatcher created"] [dc-location=global]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_dispatcher.go:171] ["[tso] exit tso dispatcher"] [dc-location=global]
[2024/06/17 20:32:01.627 +08:00] [WARN] [resource_manager_client.go:307] ["[resource_manager] get token stream error"] [error="rpc error: code = Canceled desc = context canceled"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_batch_controller.go:160] ["[pd] clear the tso batch controller"] [max-batch-size=10000] [best-batch-size=8] [collected-request-count=0] [pending-request-count=0]
[2024/06/17 20:32:01.627 +08:00] [INFO] [resource_manager_client.go:295] ["[resource manager] exit resource token dispatcher"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_client.go:155] ["[tso] closing tso client"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_client.go:130] ["[tso] start tso dispatcher check loop"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_client.go:145] ["[tso] exit tso dispatcher check loop"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_client.go:160] ["[tso] close tso client"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_batch_controller.go:160] ["[pd] clear the tso batch controller"] [max-batch-size=10000] [best-batch-size=8] [collected-request-count=0] [pending-request-count=0]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_dispatcher.go:336] ["[tso] start tso connection contexts updater"] [dc-location=global]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_client.go:162] ["[tso] tso client is closed"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [pd_service_discovery.go:637] ["[pd] close pd service discovery client"]
[2024/06/17 20:32:01.627 +08:00] [ERROR] [tso_client.go:365] ["[tso] update connection contexts failed"] [dc=global] [error="rpc error: code = Canceled desc = context canceled"]
[2024/06/17 20:32:01.627 +08:00] [INFO] [tso_dispatcher.go:350] ["[tso] exit tso connection contexts updater"] [dc-location=global]
[2024/06/17 20:32:01.627 +08:00] [INFO] [store.go:82] ["new store with retry success"]
[2024/06/17 20:32:01.628 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[autoid] tidb/autoid/leader"]
[2024/06/17 20:32:01.628 +08:00] [INFO] [http_status.go:509] ["register auto service at"] [addr=192.168.31.201:10080]
[2024/06/17 20:32:01.629 +08:00] [INFO] [manager.go:354] ["get owner"] ["owner info"="[autoid] tidb/autoid/leader ownerManager 192.168.31.201:10080"] ["owner key"=tidb/autoid/leader/6a9b8ffd9bb68995] [ownerID=192.168.31.201:10080] [op=none]
[2024/06/17 20:32:01.629 +08:00] [INFO] [autoid.go:333] ["leader change of autoid service, this node become owner"] [addr=192.168.31.201:10080] [category="autoid service"]
[2024/06/17 20:32:02.593 +08:00] [INFO] [job_table.go:334] ["get global state and global state change"] [category=ddl] [oldState=false] [currState=false]
[2024/06/17 20:32:02.593 +08:00] [INFO] [manager.go:354] ["get owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392"] ["owner key"=/tidb/ddl/fg/owner/6a9b8ffd9bb6897f] [ownerID=aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392] [op=none]
[2024/06/17 20:32:02.593 +08:00] [INFO] [manager.go:394] ["set owner op is the same as the original, so do nothing."] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager aeb5e4cb-2b2a-4b0c-ac3f-4a8b9437b392"] [op=none]
[2024/06/17 20:32:02.593 +08:00] [INFO] [job_table.go:349] ["the owner sets owner operator value"] [category=ddl] [ownerOp=none]
[2024/06/17 20:32:02.618 +08:00] [INFO] [runtime.go:160] ["TimerGroupRuntime loop started"] [groupID=ttl]
[2024/06/17 20:32:02.618 +08:00] [INFO] [runtime.go:402] ["create watch chan if possible for timer runtime"] [groupID=ttl] [storeSupportWatch=true]
[2024/06/17 20:32:02.619 +08:00] [INFO] [notifier.go:140] ["new etcd watcher created to watch timer events"] [EtcdKey=/tidb/timer/cluster/1/notify/94dd819d-059b-4ac7-9c16-c29e684370c6] [watcherID=91b0fe5d-ffb2-4375-9ec5-ef146e4ce76a]
[2024/06/17 20:32:02.626 +08:00] [INFO] [slots.go:205] ["initialize slot capacity"] [capacity=2]
[2024/06/17 20:32:02.627 +08:00] [INFO] [scheduler_manager.go:200] ["schedule task loop start"]
[2024/06/17 20:32:02.627 +08:00] [INFO] [scheduler_manager.go:312] ["subtask table gc loop start"]
[2024/06/17 20:32:02.627 +08:00] [INFO] [scheduler_manager.go:373] ["cleanup loop start"]
[2024/06/17 20:32:02.627 +08:00] [INFO] [scheduler_manager.go:453] ["collect loop start"]
[2024/06/17 20:32:03.592 +08:00] [INFO] [info.go:1142] [SetTiFlashGroupConfig]
[2024/06/17 20:32:04.623 +08:00] [INFO] [refresher.go:120] ["No table to analyze"] [category=stats]
[2024/06/17 20:32:46.199 +08:00] [INFO] [manager.go:354] ["get owner"] ["owner info"="[log-backup] /tidb/br-stream/owner ownerManager f6eac5ed-33eb-4e9c-90b5-26e8d2fd01b4"] ["owner key"=/tidb/br-stream/owner/6a9b8ffd9bb68983] [ownerID=f6eac5ed-33eb-4e9c-90b5-26e8d2fd01b4] [op=none]
[2024/06/17 20:32:49.592 +08:00] [INFO] [owner_daemon.go:56] ["daemon became owner"] [id=f6eac5ed-33eb-4e9c-90b5-26e8d2fd01b4] [daemon-id=LogBackup::Advancer]
[2024/06/17 20:32:49.592 +08:00] [INFO] [advancer.go:514] ["Subscription handler spawned."] [category="log backup subscription manager"]
[tidb@localhost log]$ 

我直接安装8.1版本,然后重新做了测试。如果加一条QUERY WATCH,可以kill掉,然后多执行几次就会复现了。只要出现了一次,后面就无法删除了。

我重新完整的复现了下:
1、做tpch1的数据加载:

tiup bench tpch -H 192.168.31.201 -P 4000 -U root -p123  -D  tpch1 prepare --sf=1

2、修改默认Resource Group:

ALTER RESOURCE GROUP default QUERY_LIMIT=(EXEC_ELAPSED='100ms', ACTION=KILL,  WATCH=EXACT DURATION='10m');

3、执行并发请求程序(程序见附件
main.go (1.8 KB)
):

go run main.go -host 192.168.31.201 -port 4000 -user root -password "123" -database tpch1

4、观察RUNAWAY_WATCHES并随机REMOVE掉一个,看是否可以REMOVE掉:

mysql> SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id desc limit 10;
+-----+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| ID  | RESOURCE_GROUP_NAME | START_TIME          | END_TIME            | WATCH | WATCH_TEXT                                                    | SOURCE              | ACTION |
+-----+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| 470 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1997-08-13' | 192.168.31.201:4000 | Kill   |
| 469 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1996-02-03' | 192.168.31.201:4000 | Kill   |
| 468 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1996-08-03' | 192.168.31.201:4000 | Kill   |
| 467 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1996-02-26' | 192.168.31.201:4000 | Kill   |
| 466 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1995-07-19' | 192.168.31.201:4000 | Kill   |
| 465 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1994-09-25' | 192.168.31.201:4000 | Kill   |
| 464 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1994-09-18' | 192.168.31.201:4000 | Kill   |
| 463 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1995-03-30' | 192.168.31.201:4000 | Kill   |
| 462 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1994-11-20' | 192.168.31.201:4000 | Kill   |
| 461 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1997-06-29' | 192.168.31.201:4000 | Kill   |
+-----+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
10 rows in set (0.01 sec)


mysql> query watch remove 461;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id desc limit 10;
+-----+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| ID  | RESOURCE_GROUP_NAME | START_TIME          | END_TIME            | WATCH | WATCH_TEXT                                                    | SOURCE              | ACTION |
+-----+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| 470 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1997-08-13' | 192.168.31.201:4000 | Kill   |
| 469 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1996-02-03' | 192.168.31.201:4000 | Kill   |
| 468 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1996-08-03' | 192.168.31.201:4000 | Kill   |
| 467 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1996-02-26' | 192.168.31.201:4000 | Kill   |
| 466 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1995-07-19' | 192.168.31.201:4000 | Kill   |
| 465 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1994-09-25' | 192.168.31.201:4000 | Kill   |
| 464 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1994-09-18' | 192.168.31.201:4000 | Kill   |
| 463 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1995-03-30' | 192.168.31.201:4000 | Kill   |
| 462 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1994-11-20' | 192.168.31.201:4000 | Kill   |
| 460 | default             | 2024-06-17 22:27:05 | 2024-06-17 22:37:05 | Exact | select  count(*) from orders where o_orderdate = '1993-12-25' | 192.168.31.201:4000 | Kill   |
+-----+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
10 rows in set (0.00 sec)

发现可以REMOVE,没有问题。
5、重新执行并发请求程序,让RUNAWAY_WATCHES的ID达到1000以上(可以执行十多秒再ctrl+c终止)

go run main.go -host 192.168.31.201 -port 4000 -user root -password "123" -database tpch1

6、重新REMOVE掉一个ID:

mysql> SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id desc limit 10;
+------+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| ID   | RESOURCE_GROUP_NAME | START_TIME          | END_TIME            | WATCH | WATCH_TEXT                                                    | SOURCE              | ACTION |
+------+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| 1310 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1996-09-16' | 192.168.31.201:4000 | Kill   |
| 1309 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1994-08-25' | 192.168.31.201:4000 | Kill   |
| 1308 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1997-12-13' | 192.168.31.201:4000 | Kill   |
| 1307 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1995-08-03' | 192.168.31.201:4000 | Kill   |
| 1306 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1998-07-03' | 192.168.31.201:4000 | Kill   |
| 1305 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1998-05-29' | 192.168.31.201:4000 | Kill   |
| 1304 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1996-02-20' | 192.168.31.201:4000 | Kill   |
| 1303 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1997-12-06' | 192.168.31.201:4000 | Kill   |
| 1302 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1997-03-26' | 192.168.31.201:4000 | Kill   |
| 1301 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1995-08-23' | 192.168.31.201:4000 | Kill   |
+------+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
10 rows in set (0.01 sec)

mysql> query watch remove 1310;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.RUNAWAY_WATCHES ORDER BY id desc limit 10;
+------+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| ID   | RESOURCE_GROUP_NAME | START_TIME          | END_TIME            | WATCH | WATCH_TEXT                                                    | SOURCE              | ACTION |
+------+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
| 1310 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1996-09-16' | 192.168.31.201:4000 | Kill   |
| 1309 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1994-08-25' | 192.168.31.201:4000 | Kill   |
| 1308 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1997-12-13' | 192.168.31.201:4000 | Kill   |
| 1307 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1995-08-03' | 192.168.31.201:4000 | Kill   |
| 1306 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1998-07-03' | 192.168.31.201:4000 | Kill   |
| 1305 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1998-05-29' | 192.168.31.201:4000 | Kill   |
| 1304 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1996-02-20' | 192.168.31.201:4000 | Kill   |
| 1303 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1997-12-06' | 192.168.31.201:4000 | Kill   |
| 1302 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1997-03-26' | 192.168.31.201:4000 | Kill   |
| 1301 | default             | 2024-06-17 22:29:07 | 2024-06-17 22:39:07 | Exact | select  count(*) from orders where o_orderdate = '1995-08-23' | 192.168.31.201:4000 | Kill   |
+------+---------------------+---------------------+---------------------+-------+---------------------------------------------------------------+---------------------+--------+
10 rows in set (0.01 sec)

可以发现1310执行了remove但未生效。

当remove掉一个ID后,在INFORMATION_SCHEMA.RUNAWAY_WATCHES中还存在,但是在mysql.tidb_runaway_watch中不存在了。

感觉对于remove时候查询的是mysql.tidb_runaway_watch,在监控的时候查询的是INFORMATION_SCHEMA.RUNAWAY_WATCHES。

感谢,我们先尝试复现,定位一下问题。