Leader Balance 和 Region Balance 100%

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:3.0.4
  • 【问题描述】:最近往 TiDB 导入了大量数据,倒完快一个星期了 Leader Balance 和 Region Balance 还维持在 100%

现在大部分查询都变得很慢,以前亚秒级的现在都好几秒十多秒了。

这种情况怎么办,能提供点思路吗?

你好,可以参考下以下的排查思路:

  1. 根据 pd 监控里面的 leader score 以及 region score 看下当前不同 Store 的打分差异较大。PD 的均衡调度策略可以参考这里:【TiDB 最佳实践系列】PD 调度策略最佳实践 可以关注下文章内 Leader/Region 分布不均衡 部分的分析以及处理方法。
  2. 查询变得慢的话,建议可以先通过 slowlog 进行入手,先初步判断下是 SQL 的执行计划还是 TiDB 和 TiKV 引起的问题。具体 slowlog 的解读可以参考这里:慢查询日志

有用,正在根据文章指引做排查和尝试。

注意到读取操作的等待时间过长,这个跟什么有关?

感觉跟热点有关,对这个有什么建议?

在根据这篇文章的指引做调整:TiDB 常见问题处理 - 热点 结果怎样,先观察一阵再说

好的。

手工分裂热点 region 好像没什么效果。 那台机器不知道被什么卡住了

哪些情况会导致查询请求在 store 那排队等待?目前看来写入操作似乎不受影响

  1. 方便的话麻烦在 TiKV-Detail 的监控选中问题的 instances ,导出一份监控上传做进一步分析。
    导出监控为 pdf 的方式:
    1)使用 chrome 浏览器,安装“Full Page Screen Capture”插件:https://chrome.google.com/webstore/detail/full-page-screen-capture/fdpohaocaechififmbbbbbknoalclacl

    2)展开grafana 监控的 “cluster-name-overview” 的所有 dashboard (先按 d 再按 E 可将所有 Rows 的 Panels 打开,需等待一段时间待页面加载完成)

    3)使用插件导出 pdf

  2. 如果确认某台机器是瓶颈的话可以确认下该机器的 TiKV 的日志有没有异常的输出。

http://note.youdao.com/noteshare?id=da2ee7ee5cc08c731f09a431d64d3536

谢谢!

另外可以确认tikv的 err日志输出是空的,

  1. 可以检查一下 tikv.log 是否有 warn 或者 error 级别的日志。
  2. 另外分享的链接无法查看。

分享的链接现在可以看了。另外,在 tikv.log 里搜到一些这样的 error

不过等一下,慢卡的机器貌似转移了

稍等一下,我把这台机器相关的监控和异常日志传上去

这是 172.16.150.121:20180 这个实例的监控

这是 tikv.log 里搜到的 error

[root@data-tikv5 deploy]# grep -E “warn|error” log/tikv.log | tail -n 100 [2020/03/24 19:27:53.088 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 108409, leader may Some(id: 975129 store_id: 947344)” not_leader { region_id: 108409 leader { id: 975129 store_id: 947344 } }”] [2020/03/24 20:11:05.785 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2612595, leader may Some(id: 2877694 store_id: 280006)” not_leader { region_id: 2612595 leader { id: 2877694 store_id: 280006 } }”] [2020/03/24 20:16:31.256 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013439656432396438FF2D386266332D3433FF65352D623037322DFF3765653265383333FF3036363700000000FB038000000014A8B876 lock_version: 415511917240254473 key: 7480000000000007EF5F698000000000000003013037353533363536FF3235313400000000FB038000000014A8B896 lock_ttl: 19641 txn_size: 1”] [2020/03/24 20:18:05.610 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013938373063333838FF2D396235352D3436FF35332D393432632DFF6138613732346363FF6138643200000000FB038000000014A8C30C lock_version: 415511946246488129 key: 7480000000000007EF5F698000000000000003013135313132363130FF3238390000000000FA038000000014A8C31B lock_ttl: 3283 txn_size: 1”] [2020/03/24 20:18:50.398 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013732343031366663FF2D646330392D3434FF32322D383862642DFF3535333136303961FF3633396100000000FB038000000014A8C6B4 lock_version: 415511957977432067 key: 7480000000000007EF5F698000000000000003013037353536313838FF3734303200000000FB038000000014A8C6FD lock_ttl: 3384 txn_size: 1”] [2020/03/24 20:18:53.321 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016235656337643730FF2D376565322D3431FF30352D623135352DFF6237323562383264FF3735313500000000FB038000000014A8CF23 lock_version: 415511953717592086 key: 7480000000000007EF5F698000000000000003013133393135373839FF3039380000000000FA038000000014A8CF6C lock_ttl: 22501 txn_size: 1”] [2020/03/24 20:18:53.322 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016235656337643730FF2D376565322D3431FF30352D623135352DFF6237323562383264FF3735313500000000FB038000000014A8CF23 lock_version: 415511953717592086 key: 7480000000000007EF5F698000000000000003013136353534323938FF3233300000000000FA038000000014A8CF44 lock_ttl: 22501 txn_size: 2”] [2020/03/24 20:18:53.324 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016235656337643730FF2D376565322D3431FF30352D623135352DFF6237323562383264FF3735313500000000FB038000000014A8CF23 lock_version: 415511953717592086 key: 7480000000000007EF5F698000000000000003013138313134353937FF3036350000000000FA038000000014A8CF23 lock_ttl: 22501 txn_size: 1”] [2020/03/24 20:19:41.858 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016465663265626638FF2D623136622D3438FF30352D393131622DFF3566636438646238FF6431663500000000FB038000000014A8D23E lock_version: 415511969236516890 key: 7480000000000007EF5F698000000000000003013035373138383233FF3730303000000000FB038000000014A8D244 lock_ttl: 11930 txn_size: 1”] [2020/03/24 20:20:45.462 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013361356136653233FF2D313138392D3464FF30322D626637302DFF3162316435653036FF3832616100000000FB038000000014A8E6C9 lock_version: 415511983169470572 key: 7480000000000007EF5F698000000000000003013133383137343531FF3832370000000000FA038000000014A8E6DA lock_ttl: 22334 txn_size: 1”] [2020/03/24 20:21:30.820 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016366313662653038FF2D646532642D3435FF33332D626661642DFF3336363431613932FF6566393500000000FB038000000014A8E5E6 lock_version: 415511996761636885 key: 7480000000000007EF5F698000000000000003013133333630303232FF3839340000000000FA038000000014A8E627 lock_ttl: 15872 txn_size: 1”] [2020/03/24 20:21:30.825 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016366313662653038FF2D646532642D3435FF33332D626661642DFF3336363431613932FF6566393500000000FB038000000014A8E5E6 lock_version: 415511996761636885 key: 7480000000000007EF5F698000000000000003013133383232363037FF3332300000000000FA038000000014A8E614 lock_ttl: 15872 txn_size: 5”] [2020/03/24 20:21:30.825 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016366313662653038FF2D646532642D3435FF33332D626661642DFF3336363431613932FF6566393500000000FB038000000014A8E5E6 lock_version: 415511996761636885 key: 7480000000000007EF5F698000000000000003013135313138323336FF3133380000000000FA038000000014A8E600 lock_ttl: 15872 txn_size: 1”] [2020/03/24 20:21:30.830 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016366313662653038FF2D646532642D3435FF33332D626661642DFF3336363431613932FF6566393500000000FB038000000014A8E5E6 lock_version: 415511996761636885 key: 7480000000000007EF5F698000000000000003013138313232373731FF3539320000000000FA038000000014A8E5ED lock_ttl: 15872 txn_size: 1”] [2020/03/24 20:21:30.831 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016366313662653038FF2D646532642D3435FF33332D626661642DFF3336363431613932FF6566393500000000FB038000000014A8E5E6 lock_version: 415511996761636885 key: 7480000000000007EF5F698000000000000003013133383234383532FF3233320000000000FA038000000014A8E601 lock_ttl: 15872 txn_size: 1”] [2020/03/24 20:34:53.333 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016533333564323835FF2D373433322D3434FF66392D613362302DFF3265656562636266FF6464373800000000FB038000000014A98F00 lock_version: 415512208324952101 key: 7480000000000007EF5F698000000000000003013133383530313236FF3132380000000000FA038000000014A98F01 lock_ttl: 11301 txn_size: 1”] [2020/03/24 20:35:43.876 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013762363630383838FF2D316633662D3437FF31382D383937372DFF3734333133646137FF3365303500000000FB038000000014A994E6 lock_version: 415512212742078472 key: 7480000000000007EF5F698000000000000003013037353533363631FF3238333700000000FB038000000014A99A76 lock_ttl: 44889 txn_size: 1”] [2020/03/24 20:36:16.706 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013639353161616564FF2D373933642D3466FF39612D613364382DFF6137316535363461FF6162633700000000FB038000000014A999D5 lock_version: 415512228719755352 key: 7480000000000007EF5F698000000000000003013037353538313233FF3435363700000000FB038000000014A999E3 lock_ttl: 16891 txn_size: 1”] [2020/03/24 20:36:16.707 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013639353161616564FF2D373933642D3466FF39612D613364382DFF6137316535363461FF6162633700000000FB038000000014A999D5 lock_version: 415512228719755352 key: 7480000000000007EF5F698000000000000003013135313138313033FF3035300000000000FA038000000014A99A04 lock_ttl: 16891 txn_size: 1”] [2020/03/24 20:36:36.158 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016337313837303031FF2D356233302D3438FF38392D626337352DFF6661653435373764FF6530366200000000FB038000000014A9A347 lock_version: 415512232547057788 key: 7480000000000007EF5F698000000000000003013032383637383730FF3734340000000000FA038000000014A9A386 lock_ttl: 21649 txn_size: 1”] [2020/03/24 20:36:36.164 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016337313837303031FF2D356233302D3438FF38392D626337352DFF6661653435373764FF6530366200000000FB038000000014A9A347 lock_version: 415512232547057788 key: 7480000000000007EF5F698000000000000003013133333231323036FF3931320000000000FA038000000014A9A471 lock_ttl: 21649 txn_size: 4”] [2020/03/24 20:36:36.164 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016337313837303031FF2D356233302D3438FF38392D626337352DFF6661653435373764FF6530366200000000FB038000000014A9A347 lock_version: 415512232547057788 key: 7480000000000007EF5F698000000000000003013133333539343233FF3532300000000000FA038000000014A9A416 lock_ttl: 21649 txn_size: 4”] [2020/03/24 20:36:36.172 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016337313837303031FF2D356233302D3438FF38392D626337352DFF6661653435373764FF6530366200000000FB038000000014A9A347 lock_version: 415512232547057788 key: 7480000000000007EF5F698000000000000003013133303338373135FF3435350000000000FA038000000014A9A445 lock_ttl: 21649 txn_size: 4”] [2020/03/24 20:37:34.693 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013739346430393531FF2D303039382D3438FF32392D396634392DFF3137343731613238FF3533386400000000FB038000000014A9B096 lock_version: 415512249114558470 key: 7480000000000007EF5F698000000000000003013135313230313231FF3837370000000000FA038000000014A9B0D2 lock_ttl: 17047 txn_size: 2”] [2020/03/24 20:38:12.732 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013637636231323531FF2D663863642D3465FF65382D393432322DFF3835646135366639FF3530336600000000FB038000000014A9B84C lock_version: 415512255327371311 key: 7480000000000007EF5F698000000000000003013133333233333632FF3338360000000000FA038000000014A9B8E0 lock_ttl: 31311 txn_size: 1”] [2020/03/24 20:38:12.732 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013637636231323531FF2D663863642D3465FF65382D393432322DFF3835646135366639FF3530336600000000FB038000000014A9B84C lock_version: 415512255327371311 key: 7480000000000007EF5F698000000000000003013133383230323532FF3436380000000000FA038000000014A9B8CB lock_ttl: 31311 txn_size: 3”] [2020/03/24 20:38:12.735 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013637636231323531FF2D663863642D3465FF65382D393432322DFF3835646135366639FF3530336600000000FB038000000014A9B84C lock_version: 415512255327371311 key: 7480000000000007EF5F698000000000000003013136353537313532FF3136300000000000FA038000000014A9B89A lock_ttl: 31311 txn_size: 2”] [2020/03/24 20:38:57.340 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013736643665616231FF2D663365352D3431FF66642D393639622DFF3038386161343431FF6265343400000000FB038000000014A9C583 lock_version: 415512270033649743 key: 7480000000000007EF5F698000000000000003013133383139363630FF3933370000000000FA038000000014A9C593 lock_ttl: 19856 txn_size: 1”] [2020/03/24 20:52:20.784 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2695131 is missing” region_not_found { region_id: 2695131 }”] [2020/03/24 21:28:04.756 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013338333864326232FF2D316133612D3439FF33322D613235332DFF6538663133386163FF6338393600000000FB038000000014AA43A4 lock_version: 415513041575870473 key: 7480000000000007EF5F698000000000000003013133303431343530FF3537380000000000FA038000000014AA43BF lock_ttl: 24112 txn_size: 1”] [2020/03/24 21:56:32.826 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013362383961353739FF2D396263612D3465FF37312D383363622DFF3033313730303862FF3537393600000000FB038000000014AB4EB8 lock_version: 415513488688676892 key: 7480000000000007EF5F698000000000000003013138313537393939FF3031360000000000FA038000000014AB4ED3 lock_ttl: 26536 txn_size: 1”] [2020/03/24 21:56:55.750 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016363373235643131FF2D306366382D3432FF66322D386533652DFF6164396362633231FF3266613700000000FB038000000014AB5893 lock_version: 415513496474877994 key: 7480000000000007EF5F698000000000000003013133383137323836FF3931340000000000FA038000000014AB58FC lock_ttl: 19752 txn_size: 4”] [2020/03/24 21:57:36.153 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016135353336373762FF2D323234302D3465FF63362D613631372DFF6436366530353863FF6530316200000000FB038000000014AB5D67 lock_version: 415513502765809676 key: 7480000000000007EF5F698000000000000003013133303334383435FF3832360000000000FA038000000014AB5DAD lock_ttl: 36184 txn_size: 1”] [2020/03/24 21:59:47.630 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013061393463333732FF2D343033332D3465FF63352D626561382DFF3335386639313863FF6234666300000000FB038000000014AB7C6B lock_version: 415513535049105438 key: 7480000000000007EF5F698000000000000003013138313438373335FF3836350000000000FA038000000014AB7C6F lock_ttl: 44540 txn_size: 1”] [2020/03/24 21:59:47.630 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013061393463333732FF2D343033332D3465FF63352D626561382DFF3335386639313863FF6234666300000000FB038000000014AB7C6B lock_version: 415513535049105438 key: 7480000000000007EF5F698000000000000003013133333532383338FF3034350000000000FA038000000014AB7CA3 lock_ttl: 44540 txn_size: 1”] [2020/03/24 21:59:47.630 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013061393463333732FF2D343033332D3465FF63352D626561382DFF3335386639313863FF6234666300000000FB038000000014AB7C6B lock_version: 415513535049105438 key: 7480000000000007EF5F698000000000000003013037353536313832FF3936313400000000FB038000000014AB7C6D lock_ttl: 44540 txn_size: 2”] [2020/03/24 22:16:05.236 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016134656638623839FF2D653036642D3436FF34352D396334642DFF3430663766653934FF6261663300000000FB038000000014ABB035 lock_version: 415513798975422479 key: 7480000000000007EF5F698000000000000003013133333136323036FF3831350000000000FA038000000014ABB0AF lock_ttl: 15334 txn_size: 2”] [2020/03/24 22:16:05.245 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016134656638623839FF2D653036642D3436FF34352D396334642DFF3430663766653934FF6261663300000000FB038000000014ABB035 lock_version: 415513798975422479 key: 7480000000000007EF5F698000000000000003013138313338383439FF3830350000000000FA038000000014ABB08D lock_ttl: 15334 txn_size: 1”] [2020/03/24 22:26:09.021 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001013466353438373036FF2D346536342D3463FF39312D393534302DFF3765333565376336FF3239613000000000FB038000000014AC13AF lock_version: 415513957310398482 key: 7480000000000007EF5F698000000000000003013035353132313935FF3532380000000000FA038000000014AC13AF lock_ttl: 15107 txn_size: 3”] [2020/03/24 22:28:16.081 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016432393661343963FF2D646361652D3434FF64652D613431662DFF3663353837353832FF3035396500000000FB038000000014AC24A3 lock_version: 415513991847870491 key: 7480000000000007EF5F698000000000000003013135313535323638FF3832310000000000FA038000000014AC24BB lock_ttl: 10433 txn_size: 2”] [2020/03/24 22:28:16.081 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000007EF5F698000000000000001016432393661343963FF2D646361652D3434FF64652D613431662DFF3663353837353832FF3035396500000000FB038000000014AC24A3 lock_version: 415513991847870491 key: 7480000000000007EF5F698000000000000003013133303334363030FF3738330000000000FA038000000014AC24B9 lock_ttl: 10433 txn_size: 5”] [2020/03/24 22:42:04.576 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 147331, leader may Some(id: 2878948 store_id: 947337)” not_leader { region_id: 147331 leader { id: 2878948 store_id: 947337 } }”] [2020/03/24 22:42:06.493 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 147331, leader may Some(id: 2878948 store_id: 947337)” not_leader { region_id: 147331 leader { id: 2878948 store_id: 947337 } }”] [2020/03/24 22:54:58.610 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 147331, leader may Some(id: 2878961 store_id: 280012)” not_leader { region_id: 147331 leader { id: 2878961 store_id: 280012 } }”] [2020/03/24 23:09:49.186 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2844084 is missing” region_not_found { region_id: 2844084 }”] [2020/03/24 23:09:49.303 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2844084 is missing” region_not_found { region_id: 2844084 }”] [2020/03/24 23:09:50.800 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2844084 is missing” region_not_found { region_id: 2844084 }”] [2020/03/24 23:16:55.880 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2298602 is missing” region_not_found { region_id: 2298602 }”] [2020/03/24 23:27:50.785 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2612595 is missing” region_not_found { region_id: 2612595 }”] [2020/03/24 23:58:33.141 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 162243 is missing” region_not_found { region_id: 162243 }”] [2020/03/25 00:47:53.616 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 117039 is missing” region_not_found { region_id: 117039 }”] [2020/03/25 00:59:59.986 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 155199, leader may Some(id: 1873319 store_id: 280010)” not_leader { region_id: 155199 leader { id: 1873319 store_id: 280010 } }”] [2020/03/25 02:25:15.296 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “stale command””] [2020/03/25 02:25:16.429 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 100169, leader may Some(id: 2201832 store_id: 280008)” not_leader { region_id: 100169 leader { id: 2201832 store_id: 280008 } }”] [2020/03/25 02:25:18.453 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 100169, leader may Some(id: 2201832 store_id: 280008)” not_leader { region_id: 100169 leader { id: 2201832 store_id: 280008 } }”] [2020/03/25 02:25:18.979 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 100169, leader may Some(id: 2201832 store_id: 280008)” not_leader { region_id: 100169 leader { id: 2201832 store_id: 280008 } }”] [2020/03/25 02:25:30.353 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2440706, leader may Some(id: 2440709 store_id: 280006)” not_leader { region_id: 2440706 leader { id: 2440709 store_id: 280006 } }”] [2020/03/25 02:29:38.383 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2839846 is missing” region_not_found { region_id: 2839846 }”] [2020/03/25 02:48:12.516 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 66211 is missing” region_not_found { region_id: 66211 }”] [2020/03/25 02:48:13.057 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 66211 is missing” region_not_found { region_id: 66211 }”] [2020/03/25 02:48:13.249 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 66211 is missing” region_not_found { region_id: 66211 }”] [2020/03/25 03:11:54.278 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2878269, leader may Some(id: 2879064 store_id: 280004)” not_leader { region_id: 2878269 leader { id: 2879064 store_id: 280004 } }”] [2020/03/25 03:23:55.037 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2871520, leader may Some(id: 2879861 store_id: 6)” not_leader { region_id: 2871520 leader { id: 2879861 store_id: 6 } }”] [2020/03/25 03:23:55.541 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2871520, leader may Some(id: 2879861 store_id: 6)” not_leader { region_id: 2871520 leader { id: 2879861 store_id: 6 } }”] [2020/03/25 03:23:55.541 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2871520, leader may Some(id: 2879861 store_id: 6)” not_leader { region_id: 2871520 leader { id: 2879861 store_id: 6 } }”] [2020/03/25 03:28:22.772 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2803666 is missing” region_not_found { region_id: 2803666 }”] [2020/03/25 04:20:52.745 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2865759 is missing” region_not_found { region_id: 2865759 }”] [2020/03/25 04:20:54.087 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2865759, leader may Some(id: 2879886 store_id: 280010)” not_leader { region_id: 2865759 leader { id: 2879886 store_id: 280010 } }”] [2020/03/25 04:20:54.525 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2865759 is missing” region_not_found { region_id: 2865759 }”] [2020/03/25 04:25:47.000 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2392610, leader may Some(id: 2554709 store_id: 947344)” not_leader { region_id: 2392610 leader { id: 2554709 store_id: 947344 } }”] [2020/03/25 04:25:47.510 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2392610, leader may Some(id: 2554709 store_id: 947344)” not_leader { region_id: 2392610 leader { id: 2554709 store_id: 947344 } }”] [2020/03/25 04:37:05.300 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 117787, leader may Some(id: 2880119 store_id: 947342)” not_leader { region_id: 117787 leader { id: 2880119 store_id: 947342 } }”] [2020/03/25 05:22:22.539 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2675974 is missing” region_not_found { region_id: 2675974 }”] [2020/03/25 05:30:10.619 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2878265, leader may Some(id: 2880205 store_id: 947344)” not_leader { region_id: 2878265 leader { id: 2880205 store_id: 947344 } }”] [2020/03/25 06:05:53.875 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2878269, leader may Some(id: 2880332 store_id: 6)” not_leader { region_id: 2878269 leader { id: 2880332 store_id: 6 } }”] [2020/03/25 06:24:59.926 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 134379, leader may Some(id: 2685764 store_id: 947337)” not_leader { region_id: 134379 leader { id: 2685764 store_id: 947337 } }”] [2020/03/25 06:26:24.508 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 89089 is missing” region_not_found { region_id: 89089 }”] [2020/03/25 06:26:26.248 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 89089 is missing” region_not_found { region_id: 89089 }”] [2020/03/25 06:26:26.565 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 89089 is missing” region_not_found { region_id: 89089 }”] [2020/03/25 06:54:13.374 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 123487, leader may Some(id: 2518831 store_id: 280005)” not_leader { region_id: 123487 leader { id: 2518831 store_id: 280005 } }”] [2020/03/25 07:30:09.651 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2878265, leader may Some(id: 2879994 store_id: 4)” not_leader { region_id: 2878265 leader { id: 2879994 store_id: 4 } }”] [2020/03/25 07:35:59.599 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2379726, leader may Some(id: 2379728 store_id: 947342)” not_leader { region_id: 2379726 leader { id: 2379728 store_id: 947342 } }”] [2020/03/25 07:37:18.729 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 107637, leader may Some(id: 2551241 store_id: 280006)” not_leader { region_id: 107637 leader { id: 2551241 store_id: 280006 } }”] [2020/03/25 07:59:44.299 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2432551, leader may Some(id: 2880583 store_id: 280002)” not_leader { region_id: 2432551 leader { id: 2880583 store_id: 280002 } }”] [2020/03/25 08:07:38.415 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2695131 is missing” region_not_found { region_id: 2695131 }”] [2020/03/25 08:13:00.765 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 74800000000000286F5F6980000000000000010380000000007C49C6038000000000000001 lock_version: 415523190454616101 key: 74800000000000286F5F72800000000000B2A7 lock_ttl: 13143 txn_size: 9601”] [2020/03/25 08:13:00.865 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 74800000000000286F5F6980000000000000010380000000007C49C6038000000000000001 lock_version: 415523190454616101 key: 74800000000000286F5F72800000000000C03E lock_ttl: 13143 txn_size: 9601”] [2020/03/25 08:18:26.801 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “EpochNotMatch current epoch of region 1266677 is conf_ver: 50 version: 2944, but you sent conf_ver: 50 version: 2943” epoch_not_match { current_regions { id: 1266677 start_key: 7480000000000028FF715F698000000000FF0000020380000000FF009C5C6E03800000FF000052087E000000FC end_key: 7480000000000028FF715F698000000000FF0000020380000000FF009CAD6003800000FF000032D95B000000FC region_epoch { conf_ver: 50 version: 2944 } peers { id: 1713658 store_id: 280007 } peers { id: 2118612 store_id: 4 } peers { id: 2277403 store_id: 947338 } } current_regions { id: 2880721 start_key: 7480000000000028FF715F698000000000FF0000020380000000FF009BDA3F03800000FF000036BE64000000FC end_key: 7480000000000028FF715F698000000000FF0000020380000000FF009C5C6E03800000FF000052087E000000FC region_epoch { conf_ver: 50 version: 2944 } peers { id: 2880722 store_id: 280007 } peers { id: 2880723 store_id: 4 } peers { id: 2880724 store_id: 947338 } } }”] [2020/03/25 08:44:23.204 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 126515 is missing” region_not_found { region_id: 126515 }”] [2020/03/25 08:47:18.507 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000027D85F69800000000000000103800000000018E893038000000000000B11 lock_version: 415523729999396907 key: 7480000000000027D85F728000000000000298 lock_ttl: 9494 txn_size: 3667”] [2020/03/25 08:47:18.560 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“locked primary_lock: 7480000000000027D85F69800000000000000103800000000018E893038000000000000B11 lock_version: 415523729999396907 key: 7480000000000027D85F728000000000000298 lock_ttl: 9494 txn_size: 3667”] [2020/03/25 08:53:24.545 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 61, leader may Some(id: 2869239 store_id: 280008)” not_leader { region_id: 61 leader { id: 2869239 store_id: 280008 } }”] [2020/03/25 08:56:53.427 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 61 is missing” region_not_found { region_id: 61 }”] [2020/03/25 08:58:28.453 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “region 2479786 is missing” region_not_found { region_id: 2479786 }”] [2020/03/25 09:01:02.780 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2482851, leader may Some(id: 2881671 store_id: 947344)” not_leader { region_id: 2482851 leader { id: 2881671 store_id: 947344 } }”] [2020/03/25 09:04:16.838 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “EpochNotMatch current epoch of region 2877933 is conf_ver: 4004 version: 974, but you sent conf_ver: 4004 version: 973” epoch_not_match { current_regions { id: 2877933 start_key: 7480000000000000FF155F728000000000FF585FB40000000000FA end_key: 7480000000000000FF155F728000000000FF585FC60000000000FA region_epoch { conf_ver: 4004 version: 974 } peers { id: 2877934 store_id: 947337 } peers { id: 2878848 store_id: 947343 } peers { id: 2880528 store_id: 947338 } } current_regions { id: 2881790 start_key: 7480000000000000FF155F728000000000FF585FB20000000000FA end_key: 7480000000000000FF155F728000000000FF585FB40000000000FA region_epoch { conf_ver: 4004 version: 974 } peers { id: 2881791 store_id: 947337 } peers { id: 2881792 store_id: 947343 } peers { id: 2881793 store_id: 947338 } } }”] [2020/03/25 09:04:17.083 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “EpochNotMatch current epoch of region 2877933 is conf_ver: 4004 version: 974, but you sent conf_ver: 4004 version: 973” epoch_not_match { current_regions { id: 2877933 start_key: 7480000000000000FF155F728000000000FF585FB40000000000FA end_key: 7480000000000000FF155F728000000000FF585FC60000000000FA region_epoch { conf_ver: 4004 version: 974 } peers { id: 2877934 store_id: 947337 } peers { id: 2878848 store_id: 947343 } peers { id: 2880528 store_id: 947338 } } current_regions { id: 2881790 start_key: 7480000000000000FF155F728000000000FF585FB20000000000FA end_key: 7480000000000000FF155F728000000000FF585FB40000000000FA region_epoch { conf_ver: 4004 version: 974 } peers { id: 2881791 store_id: 947337 } peers { id: 2881792 store_id: 947343 } peers { id: 2881793 store_id: 947338 } } }”] [2020/03/25 09:04:20.200 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “EpochNotMatch current epoch of region 2877933 is conf_ver: 4004 version: 974, but you sent conf_ver: 4004 version: 973” epoch_not_match { current_regions { id: 2877933 start_key: 7480000000000000FF155F728000000000FF585FB40000000000FA end_key: 7480000000000000FF155F728000000000FF585FC60000000000FA region_epoch { conf_ver: 4004 version: 974 } peers { id: 2877934 store_id: 947337 } peers { id: 2878848 store_id: 947343 } peers { id: 2880528 store_id: 947338 } } current_regions { id: 2881790 start_key: 7480000000000000FF155F728000000000FF585FB20000000000FA end_key: 7480000000000000FF155F728000000000FF585FB40000000000FA region_epoch { conf_ver: 4004 version: 974 } peers { id: 2881791 store_id: 947337 } peers { id: 2881792 store_id: 947343 } peers { id: 2881793 store_id: 947338 } } }”] [2020/03/25 09:06:43.046 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 2878265, leader may Some(id: 2881479 store_id: 280010)” not_leader { region_id: 2878265 leader { id: 2881479 store_id: 280010 } }”] [2020/03/25 09:48:48.112 +08:00] [ERROR] [endpoint.rs:454] [error-response] [err=“region message: “peer is not leader for region 95893, leader may Some(id: 2678075 store_id: 947341)” not_leader { region_id: 95893 leader { id: 2678075 store_id: 947341 } }”]

  1. 从该 TiKV 的日志来看,存在少量的 读写冲突报错以及一些调度引起的 region 的报错。应该不是引起这个region 相应慢的原因 。
  2. 从监控来看这个机器上面的是有 4 个 TiKV 实例的。想请问一下这四个 TiKV 实例的配置有进行过调整吗? 还是按照默认的配置来的?

主要是内存、CPU、硬盘容量都调到了大约1/4 以避免资源冲突,其他的倒没做什么特殊配置。

  1. 建议可以对比一下其他实例的 grpc message count , 以及 coprocessor CPU 是否主要都是这个节点比较高。
  2. 建议可以配合 pd-ctl hot read 来查看一下有多少 hot read region 落到该 store 里面。
  3. 看监控 coprocessor CPU 已经到最高 300% + ,另外建议根据 slowlog 配合查看一下,慢查询涉及的表的 region 是否都落到问题的 store 里面。具体表的 region 信息可以通过 curl http://{TiDBIP}:10080/tables/{db}/{table}/regions 获得。

对比来看,同一台机器的其他 store 实例的 grpc message count、coprocessor CPU 并没有比其他机器高,每台机器都有比他们高的,只有那一个 store 特别突出。

pd-ctl hot read 的查询结果目前看来,热点也不明显,

p.s. 后面查到的那个 region,不是那个 store 上的。

这边有一台 down store,可以处理成 offline 或者恢复上线么?