5.4.1 drop database 卡死

【 TiDB 使用环境`】测试环境
【 TiDB 版本】v5.4.1
【遇到的问题】drop database 卡死
【复现路径】无操作,重现两次
【问题现象及影响】

  1. 集群正常运行,无任何操作,drop database卡住,kill sid 无法终止,admin cancel 终止卡住,发起关闭集群的命令,drop database会在集群关闭的瞬间显示执行成功。
  2. 集群重启后,在创建database,drop database,均正常,之后命令关闭集群,关机
  3. 第三天开启集群,执行drop database 又卡住
  4. 观察监控PD KV 都没有任何操作,regions没有任何变化
    image

【附件】

请提供各个组件的 version 信息,如 cdc/tikv,可通过执行 cdc version/tikv-server --version 获取。
-------------------tidb log-----------------------------------------------------------------------------------------------------------
[2022/05/27 14:32:40.225 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.091859531s] [conn_id=7] [user=root] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“drop database tpcc”]
[2022/05/27 14:32:58.432 +08:00] [INFO] [gc_worker.go:300] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=6040cc8a9280004]
[2022/05/27 14:33:38.676 +08:00] [INFO] [gc_worker.go:701] [“[gc worker] start delete ranges”] [uuid=6040cc8a9280004] [ranges=0]
[2022/05/27 14:33:38.676 +08:00] [INFO] [gc_worker.go:750] [“[gc worker] finish delete ranges”] [uuid=6040cc8a9280004] [“num of ranges”=0] [“cost time”=261ns]
[2022/05/27 14:33:38.677 +08:00] [INFO] [gc_worker.go:773] [“[gc worker] start redo-delete ranges”] [uuid=6040cc8a9280004] [“num of ranges”=0]
[2022/05/27 14:33:38.677 +08:00] [INFO] [gc_worker.go:802] [“[gc worker] finish redo-delete ranges”] [uuid=6040cc8a9280004] [“num of ranges”=0] [“cost time”=246ns]
[2022/05/27 14:33:38.682 +08:00] [INFO] [gc_worker.go:1562] [“[gc worker] sent safe point to PD”] [uuid=6040cc8a9280004] [“safe point”=433489842907906048]


每次都是执行关闭集群,到关闭tidb的瞬间drop database返回结果,再启动后执行drop可以删掉。

可以先参考这个 FAQ 排查一下 [FAQ] DDL 卡住排查经验

谢谢,但是效果不理想,我的DB有三个206.11,12,13

  1. curl http://192.168.206.11:10080/info/all 查到206.11是owner
  2. curl -X POST http://192.168.206.11:10080/ddl/owner/resign 但是这个命令我操作了11,12,13他们都反馈自己不是owner 惊了
    }[tidb@tidb01 ~]$ curl-X POST http://192.168.206.12:10080/ddl/owner/resign
    This node is not a ddl owner, can’t be resigned.[tidb@tidb01 ~]$
    [tidb@tidb01 ~]$ curl -X POST http://192.168.206.13:10080/ddl/owner/resign
    This node is not a ddl owner, can’t be resigned.[tidb@tidb01 ~]$
    [tidb@tidb01 ~]$ curl -X POST http://192.168.206.11:10080/ddl/owner/resign
    This node is not a ddl owner, can’t be resigned.[tidb@tidb01 ~]$


可以试着踢掉 owner,重新选举下。

嗯 我再试试,用第一种方法踢所有节点都说自己不是owner,我用第二种试试。