GC无法正常工作,空间无法释放

【 TiDB 使用环境】生产环境
【 TiDB 版本】v3.1.2
【复现路径】

参照过以下文档进行 调整 region-cache-ttl = 86400的操作,但是无效

重启了tikv,tidb,pd节点

2023年04月13日17:14:29

tmp.log (94.8 KB)

调整了gc life时间

开启了空region的合并

config show 配置如下

{
  "replication": {
    "enable-placement-rules": "false",
    "location-labels": "host",
    "max-replicas": 3,
    "strictly-match-label": "false"
  },
  "schedule": {
    "enable-cross-table-merge": "false",
    "enable-debug-metrics": "false",
    "enable-location-replacement": "true",
    "enable-make-up-replica": "true",
    "enable-one-way-merge": "false",
    "enable-remove-down-replica": "true",
    "enable-remove-extra-replica": "true",
    "enable-replace-offline-replica": "true",
    "high-space-ratio": 0.6,
    "hot-region-cache-hits-threshold": 3,
    "hot-region-schedule-limit": 8,
    "leader-schedule-limit": 10,
    "leader-schedule-policy": "count",
    "low-space-ratio": 0.8,
    "max-merge-region-keys": 500000,
    "max-merge-region-size": 100,
    "max-pending-peer-count": 64,
    "max-snapshot-count": 3,
    "max-store-down-time": "30m0s",
    "merge-schedule-limit": 8,
    "patrol-region-interval": "100ms",
    "region-schedule-limit": 10,
    "replica-schedule-limit": 6,
    "scheduler-max-waiting-operator": 3,
    "schedulers-payload": {
      "balance-hot-region-scheduler": "null",
      "balance-leader-scheduler": "{\"name\":\"balance-leader-scheduler\",\"ranges\":[{\"start-key\":\"\",\"end-key\":\"\"}]}",
      "balance-region-scheduler": "{\"name\":\"balance-region-scheduler\",\"ranges\":[{\"start-key\":\"\",\"end-key\":\"\"}]}",
      "label-scheduler": "{\"name\":\"label-scheduler\",\"ranges\":[{\"start-key\":\"\",\"end-key\":\"\"}]}"
    },
    "schedulers-v2": [
      {
        "args": null,
        "args-payload": "",
        "disable": false,
        "type": "balance-region"
      },
      {
        "args": null,
        "args-payload": "",
        "disable": false,
        "type": "balance-leader"
      },
      {
        "args": null,
        "args-payload": "",
        "disable": false,
        "type": "hot-region"
      },
      {
        "args": null,
        "args-payload": "",
        "disable": false,
        "type": "label"
      }
    ],
    "split-merge-interval": "1h0m0s",
    "store-balance-rate": 15,
    "store-limit-mode": "manual",
    "tolerant-size-ratio": 2.5
  }
}

【遇到的问题:问题现象及影响】
数据表执行过delete删除后无法释放空间

【资源配置】

【附件:截图/日志/监控】

[2023/04/11 21:15:13.227 +08:00] [INFO] [gc_worker.go:274] ["[gc worker] starts the whole job"] [uuid=61dbd8fad1c001c] [safePoint=440721293418692608] [concurrency=20]
[2023/04/11 21:15:13.227 +08:00] [INFO] [gc_worker.go:818] ["[gc worker] start resolve locks"] [uuid=61dbd8fad1c001c] [safePoint=440721293418692608] [concurrency=20]
[2023/04/11 21:16:12.957 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:17:12.931 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:18:12.906 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:19:13.010 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:20:13.038 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:21:12.884 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:22:12.912 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:23:12.836 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dbd8fad1c001c]
[2023/04/11 21:23:25.030 +08:00] [ERROR] [gc_worker.go:832] ["[gc worker] resolve locks failed"] [uuid=61dbd8fad1c001c] [safePoint=440721293418692608] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02\\x13q_i\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x01{       \\xff        \\xff  apiVer\\xffsionCode\\xff:4251,  \\xff        \\xff       p\\xffackageNa\\xffme:com.b\\xffyted.pan\\xffgle,    \\xff        \\xff     min\\xffPluginVe\\xffrsion:42\\xff51,     \\xff        \\xff    inte\\xffrnalPath\\xff:6644354\\xff0,      \\xff        \\xff   inter\\xffnalVersi\\xffonCode:4\\xff251,    \\xff        \\xff     sig\\xffnature:'\\xffMIIDfTCC\\xffAmWgAwIB\\xffAgIEfRwY\\xffPjANBgkq\\xffhkiG9w0B\\xffAQsFADBv\\xffMQswCQYD\\xffVQQGEwJD\\xffTjEQMA4G\\xffA1UECBMH\\xffQmVpamlu\\xffZzEQMA4G\\xffA1UEBxMH\\xffQmVpamlu\\xffZzESMBAG\\xffA1UEChMJ\\xffQnl0ZURh\\xffbmNlMQ8w\\xffDQYDVQQL\\xffEwZQYW5n\\xffbGUxFzAV\\xffBgNVBAMT\\xffDkNodWFu\\xffIFNoYW4g\\xffSmlhMB4X\\xffDTIxMTEw\\xffODA2MjQz\\xffOVoXDTQ2\\xffMTEwMjA2\\xffMjQzOVow\\xffbzELMAkG\\xffA1UEBhMC\\xffQ04xEDAO\\xffBgNVBAgT\\xffB0JlaWpp\\xffbmcxEDAO\\xffBgNVBAcT\\xffB0JlaWpp\\xffbmcxEjAQ\\xffBgNVBAoT\\xffCUJ5dGVE\\xffYW5jZTEP\\xffMA0GA1UE\\xffCxMGUGFu\\xffZ2xlMRcw\\xffFQYDVQQD\\xffEw5DaHVh\\xffbiBTaGFu\\xffIEppYTCC\\xffASIwDQYJ\\xffKoZIhvcN\\xffAQEBBQAD\\xffggEPADCC\\xffAQoCggEB\\xffAIBKeRL+\\xff4mfCn1SL\\xffYv6Oemfw\\xffwItkjlLP\\xffyqOEugkV\\xff6lanFTcZ\\xffgLwEl5LI\\xffkL0y28Un\\xffcPtMX1Mi\\xffi6DzCdJ/\\xffplw7S9+R\\xffT/hYDneu\\xff339IK...\\xff\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xf7\\x03\\x80\\x00\\x00\\x00\\xd9\\x03\\xc4\\x00\", err: rpc error: code = Canceled desc = context canceled"]
[2023/04/11 21:23:25.030 +08:00] [ERROR] [gc_worker.go:492] ["[gc worker] resolve locks returns an error"] [uuid=61dbd8fad1c001c] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02\\x13q_i\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x01{       \\xff        \\xff  apiVer\\xffsionCode\\xff:4251,  \\xff        \\xff       p\\xffackageNa\\xffme:com.b\\xffyted.pan\\xffgle,    \\xff        \\xff     min\\xffPluginVe\\xffrsion:42\\xff51,     \\xff        \\xff    inte\\xffrnalPath\\xff:6644354\\xff0,      \\xff        \\xff   inter\\xffnalVersi\\xffonCode:4\\xff251,    \\xff        \\xff     sig\\xffnature:'\\xffMIIDfTCC\\xffAmWgAwIB\\xffAgIEfRwY\\xffPjANBgkq\\xffhkiG9w0B\\xffAQsFADBv\\xffMQswCQYD\\xffVQQGEwJD\\xffTjEQMA4G\\xffA1UECBMH\\xffQmVpamlu\\xffZzEQMA4G\\xffA1UEBxMH\\xffQmVpamlu\\xffZzESMBAG\\xffA1UEChMJ\\xffQnl0ZURh\\xffbmNlMQ8w\\xffDQYDVQQL\\xffEwZQYW5n\\xffbGUxFzAV\\xffBgNVBAMT\\xffDkNodWFu\\xffIFNoYW4g\\xffSmlhMB4X\\xffDTIxMTEw\\xffODA2MjQz\\xffOVoXDTQ2\\xffMTEwMjA2\\xffMjQzOVow\\xffbzELMAkG\\xffA1UEBhMC\\xffQ04xEDAO\\xffBgNVBAgT\\xffB0JlaWpp\\xffbmcxEDAO\\xffBgNVBAcT\\xffB0JlaWpp\\xffbmcxEjAQ\\xffBgNVBAoT\\xffCUJ5dGVE\\xffYW5jZTEP\\xffMA0GA1UE\\xffCxMGUGFu\\xffZ2xlMRcw\\xffFQYDVQQD\\xffEw5DaHVh\\xffbiBTaGFu\\xffIEppYTCC\\xffASIwDQYJ\\xffKoZIhvcN\\xffAQEBBQAD\\xffggEPADCC\\xffAQoCggEB\\xffAIBKeRL+\\xff4mfCn1SL\\xffYv6Oemfw\\xffwItkjlLP\\xffyqOEugkV\\xff6lanFTcZ\\xffgLwEl5LI\\xffkL0y28Un\\xffcPtMX1Mi\\xffi6DzCdJ/\\xffplw7S9+R\\xffT/hYDneu\\xff339IK...\\xff\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xf7\\x03\\x80\\x00\\x00\\x00\\xd9\\x03\\xc4\\x00\", err: rpc error: code = Canceled desc = context canceled"]
[2023/04/11 21:23:25.031 +08:00] [ERROR] [gc_worker.go:180] ["[gc worker] runGCJob"] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02\\x13q_i\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x01{       \\xff        \\xff  apiVer\\xffsionCode\\xff:4251,  \\xff        \\xff       p\\xffackageNa\\xffme:com.b\\xffyted.pan\\xffgle,    \\xff        \\xff     min\\xffPluginVe\\xffrsion:42\\xff51,     \\xff        \\xff    inte\\xffrnalPath\\xff:6644354\\xff0,      \\xff        \\xff   inter\\xffnalVersi\\xffonCode:4\\xff251,    \\xff        \\xff     sig\\xffnature:'\\xffMIIDfTCC\\xffAmWgAwIB\\xffAgIEfRwY\\xffPjANBgkq\\xffhkiG9w0B\\xffAQsFADBv\\xffMQswCQYD\\xffVQQGEwJD\\xffTjEQMA4G\\xffA1UECBMH\\xffQmVpamlu\\xffZzEQMA4G\\xffA1UEBxMH\\xffQmVpamlu\\xffZzESMBAG\\xffA1UEChMJ\\xffQnl0ZURh\\xffbmNlMQ8w\\xffDQYDVQQL\\xffEwZQYW5n\\xffbGUxFzAV\\xffBgNVBAMT\\xffDkNodWFu\\xffIFNoYW4g\\xffSmlhMB4X\\xffDTIxMTEw\\xffODA2MjQz\\xffOVoXDTQ2\\xffMTEwMjA2\\xffMjQzOVow\\xffbzELMAkG\\xffA1UEBhMC\\xffQ04xEDAO\\xffBgNVBAgT\\xffB0JlaWpp\\xffbmcxEDAO\\xffBgNVBAcT\\xffB0JlaWpp\\xffbmcxEjAQ\\xffBgNVBAoT\\xffCUJ5dGVE\\xffYW5jZTEP\\xffMA0GA1UE\\xffCxMGUGFu\\xffZ2xlMRcw\\xffFQYDVQQD\\xffEw5DaHVh\\xffbiBTaGFu\\xffIEppYTCC\\xffASIwDQYJ\\xffKoZIhvcN\\xffAQEBBQAD\\xffggEPADCC\\xffAQoCggEB\\xffAIBKeRL+\\xff4mfCn1SL\\xffYv6Oemfw\\xffwItkjlLP\\xffyqOEugkV\\xff6lanFTcZ\\xffgLwEl5LI\\xffkL0y28Un\\xffcPtMX1Mi\\xffi6DzCdJ/\\xffplw7S9+R\\xffT/hYDneu\\xff339IK...\\xff\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xf7\\x03\\x80\\x00\\x00\\x00\\xd9\\x03\\xc4\\x00\", err: rpc error: code = Canceled desc = context canceled"]

gc监控图

主要是不知道如何定位导致 loadRegion from PD failed 的数据,把这个找出来,就应该能解决吧?

补充 04月12 日 10:32的 日志

[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"=dIAAAAAAAXDj] ["failed endKey"=dIAAAAAAAXEB] [error="[tikv:9001]PD server timeout"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6l5owAAAAOAAAABGZ7Asg=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6oEcgAAAAOAAAABGaZwaw=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr509+QAAAAOAAAABF2Y7bQ=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr51etwAAAAOAAAABF3MWCA=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr57JbAAAAAOAAAABF81B9Q=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr58oeQAAAAOAAAABF9pa1g=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr5k07gAAAAOAAAABFo879Q=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr5oh9QAAAAOAAAABFrb+CQ=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6gVmwAAAAOAAAABGUY8FA=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6hUWgAAAAOAAAABGVJ9kA=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6qMnAAAAAOAAAABGce+kg=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6rHuAAAAAOAAAABGdR0cw=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr56x1QAAAAOAAAABF8Nlag=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr57JbAAAAAOAAAABF81B9Q=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6RY2AAAAAOAAAABGMB2rw=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6SaUAAAAAOAAAABGM1ktw=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6SaUAAAAAOAAAABGM1ktw=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6Tb0AAAAAOAAAABGNsgCw=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6oEcgAAAAOAAAABGaZwaw=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6oQegAAAAOAAAABGa20dQ=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6Iq5gAAAAOAAAABGFauBQ=="] ["failed endKey"="dIAAAAAAAhNxX2mAAAAAAAAAAgQZr6KJTgAAAAOAAAABGGQGSw=="] [error="context canceled"]
[2023/04/12 10:26:06.973 +08:00] [INFO] [range_task.go:149] ["range task failed"] [name=resolve-locks-runner] [startKey=] [endKey=] ["cost time"=7m36.569428109s] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02\\x13q_i\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x04\\x19\\xaf\\xafV\\xdb\\x00\\x00\\x00\\x03\\x80\\x00\\x00\\x01\\x1a\\xda`4\", err: rpc error: code = Canceled desc = context canceled"]

补充 04月13 日 10:59的 情况

开启了跨表合并仍然后的截图

TIDB-GC没有任何反应,每次 GC 仍然报错

日志信息如下

[2023/04/13 10:51:30.045 +08:00] [INFO] [gc_worker.go:274] ["[gc worker] starts the whole job"] [uuid=61dcec5d244000b] [safePoint=440756780481118208] [concurrency=20]
[2023/04/13 10:51:30.045 +08:00] [INFO] [gc_worker.go:818] ["[gc worker] start resolve locks"] [uuid=61dcec5d244000b] [safePoint=440756780481118208] [concurrency=20]
[2023/04/13 10:51:30.045 +08:00] [INFO] [range_task.go:90] ["range task started"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=20]
[2023/04/13 10:52:22.608 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dcec5d244000b]
[2023/04/13 10:53:22.679 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dcec5d244000b]
[2023/04/13 10:54:22.540 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dcec5d244000b]
[2023/04/13 10:55:22.396 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dcec5d244000b]
[2023/04/13 10:56:22.397 +08:00] [INFO] [gc_worker.go:243] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=61dcec5d244000b]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"=dIAAAAAAAXDj] ["failed endKey"=dIAAAAAAAXEB] [error="[tikv:9001]PD server timeout"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/oalg=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/ocvw=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/oofw=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/oqVA=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/ndIA=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/ne9A=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/m05A=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/m4yw=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/l2NQ=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/l4ow=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/os+w=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/ouWw=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/adZw=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/ag6w=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/oflg=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/oh0Q=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/oqVA=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/os+w=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/ouWw=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/oxrQ=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/WXiA=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/W9pw=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/fYhw=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/fgCA=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/KwzQ=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/KzqA=="] [error="context canceled"]
[2023/04/13 10:56:46.763 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/ocvw=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/oflg=="] [error="context canceled"]
[2023/04/13 10:56:46.763 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/SlJw=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/SqtQ=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/IiyA=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/IkTw=="] [error="context canceled"]
[2023/04/13 10:56:46.763 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/oO9w=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/oQ8Q=="] [error="context canceled"]
[2023/04/13 10:56:46.762 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/m9wQ=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/m/og=="] [error="context canceled"]
[2023/04/13 10:56:46.763 +08:00] [INFO] [range_task.go:254] ["canceling range task because of error"] [name=resolve-locks-runner] ["failed startKey"="dIAAAAAAAlL7X3KAAAAAA/om6A=="] ["failed endKey"="dIAAAAAAAlL7X3KAAAAAA/oofw=="] [error="context canceled"]
[2023/04/13 10:56:46.801 +08:00] [INFO] [range_task.go:149] ["range task failed"] [name=resolve-locks-runner] [startKey=] [endKey=] ["cost time"=5m16.755437286s] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02jX_r\\x80\\x00\\x00\\x00\\fI\\x8a\\n\", err: rpc error: code = Canceled desc = context canceled"]
[2023/04/13 10:56:46.801 +08:00] [ERROR] [gc_worker.go:832] ["[gc worker] resolve locks failed"] [uuid=61dcec5d244000b] [safePoint=440756780481118208] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02jX_r\\x80\\x00\\x00\\x00\\fI\\x8a\\n\", err: rpc error: code = Canceled desc = context canceled"]
[2023/04/13 10:56:46.801 +08:00] [ERROR] [gc_worker.go:492] ["[gc worker] resolve locks returns an error"] [uuid=61dcec5d244000b] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02jX_r\\x80\\x00\\x00\\x00\\fI\\x8a\\n\", err: rpc error: code = Canceled desc = context canceled"]
[2023/04/13 10:56:46.801 +08:00] [ERROR] [gc_worker.go:180] ["[gc worker] runGCJob"] [error="loadRegion from PD failed, key: \"t\\x80\\x00\\x00\\x00\\x00\\x02jX_r\\x80\\x00\\x00\\x00\\fI\\x8a\\n\", err: rpc error: code = Canceled desc = context canceled"]

把空region合并一下可能空间就下来了。

你这个版本region合并是不是还不完善啊?没运维过3.x的版本,等其他人看吧。

不是合并的问题呢,是gc没有运行
正打算这个季度逐步升级到4.x-5.x的,结果出这个问题。。。哎

你的集群Region merge配置参数有点大,可以减小到官方建议值看看,这个操作会加速空region的合并

空间无法释放的原因,可能会有好几个因素在影响

  • 除了上面提到的空region问题
  • 还有可能是GC运行有点慢,根据日志[gc worker] there’s already a gc job running, skipped 可以知道GC有正常触发运行,但是后台GC速度比较慢,在10min内还没跑完当前的GC,下一个GC任务又来了,系统检测到有运行的任务就跳过了最新的GC任务。

问下你的集群TiKV节点数大概有多少?
因为看到tikv_gc_concurrency配置参数是4,如果TiKV节点数超过4个,可以适当调大这个并发。

还有一个大问题,tidb 3.x版本很早就不维护了,我们的生产环境都升级到6.1版本,用了半年多都很稳定、可靠

强烈建议你这边评估升级,我们的生产实际看到,从性能提升、集群稳定性、bug修复、功能特性等全方面,新版本的TiDB集群都有质的提升。

举个不恰当的例子,仅供理解,如果3.x版本成是中学生的知识储备话,6.x版本可以是硕士、博士的水平

集群节点20个tikv,物理机有7台,有多实例的部署,
根据建议调整到了10个。

但是gc日志中有错误
"[gc worker] runGCJob"] [error="loadRegion from PD failed

依据我参考的文档,是因为有个别region异常,导致重复循环 resolve locks ,无法进行下一步,但是目前无法判断到底是哪个region出的问题

的确是预备升级,所以3月初就清理了空间,为升级做准备,结果发现gc无法正常运行了。。。也不好强行升级了

我之前是一位空region太多,导致空间无法释放,所以调大了,但是发现调大也没有减少空间。能看到region在减少,但是空间使用仍然上升

https://docs.pingcap.com/zh/tidb/stable/pd-control#region-key---formatrawencodehex-key
参考下,根据key找到region,再找对应的表信息
不过看着像是region cache的问题,先试下吧

是merge的事儿,空region一合并,空间就下来了。

怎么感觉还是清理锁的时候,失败了导致gc无法正常工作

谢谢!我试试!如果找到了对应的表,后续请问该如何操作呢?

对对,,我根据引用的文章内容,也是这么感觉的,请问有什么思路可以解决吗?

你调大的话,满足条件的region会增多,它会合并比较多的region,大的region通常涉及较多数据的搬迁、整理等,可能会更慢;

我们的目标是尽可能快速合并空的或者比较小的region,空的或者比较小的region通常合并速度会比较快,释放空间也会变快,所以那些相对较大的region就先不纳入合并的操作,按照这个思路,把参数调小一些才合理,可以先试试

谢谢,刚开启了跨表的空region合并,希望有帮助

先把lowspace store告警调整下,让先变绿色,这个值会影响集群整体调度慢。