DDL 被卡住了,创建数据库在队列里无法完成

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.4
  • 【问题描述】:卡在了ddl上,目前队列中只有一个。

在TiDB日志中,创建不下来数据库。

[2020/09/21 11:38:54.505 +00:00] [INFO] [session.go:2130] ["CRUCIAL OPERATION"] [conn=719] [schemaVersion=258] [cur_db=] [sql="create database tt"] [user=root@10.204.11.89]
[2020/09/21 11:38:54.515 +00:00] [INFO] [ddl_worker.go:261] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:226, Type:create schema, State:none, SchemaState:none, SchemaID:225, TableID:0, RowCount:0, ArgLen:1, start time: 2020-09-21 11:38:54.487 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0; "]
[2020/09/21 11:38:54.515 +00:00] [INFO] [ddl.go:477] ["[ddl] start DDL job"] [job="ID:226, Type:create schema, State:none, SchemaState:none, SchemaID:225, TableID:0, RowCount:0, ArgLen:1, start time: 2020-09-21 11:38:54.487 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="create database tt"]
[2020/09/21 11:39:54.551 +00:00] [WARN] [expensivequery.go:168] [expensive_query] [cost_time=60.045631501s] [conn_id=719] [user=root] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="create database tt"]
[2020/09/21 11:40:01.016 +00:00] [INFO] [client_batch.go:633] ["recycle idle connection"] [target=basic1-tikv-1.basic1-tikv-peer.test-namespace1.svc:20160]

目前在系统中只有一个jobs

MySQL [(none)]> admin show ddl jobs
    -> ;
+--------+------------+------------+---------------+--------------+-----------+----------+-----------+---------------------+---------------------+-----------+
| JOB_ID | DB_NAME    | TABLE_NAME | JOB_TYPE      | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | START_TIME          | END_TIME            | STATE     |
+--------+------------+------------+---------------+--------------+-----------+----------+-----------+---------------------+---------------------+-----------+
|    226 | tt         |            | create schema | none         |       225 |        0 |         0 | 2020-09-21 11:38:54 | NULL                | none      |
|    224 | tt         |            | create schema | none         |       223 |        0 |         0 | 2020-09-21 11:24:15 | 2020-09-21 11:26:27 | cancelled |
|    222 | tpcc       |            | create schema | none         |       221 |        0 |         0 | 2020-09-21 11:21:43 | 2020-09-21 11:22:17 | cancelled |
|    220 | warehouses |            | create schema | none         |       219 |        0 |         0 | 2020-09-21 11:16:14 | 2020-09-21 11:21:13 | cancelled |
|    218 | tpcc       |            | create schema | none         |       217 |        0 |         0 | 2020-09-21 11:13:23 | 2020-09-21 11:21:11 | cancelled |
|    216 | tpcc       |            | create schema | none         |       215 |        0 |         0 | 2020-09-21 11:12:18 | 2020-09-21 11:20:49 | cancelled |
|    214 | tpcc       |            | create schema | none         |       213 |        0 |         0 | 2020-09-21 10:15:04 | 2020-09-21 11:20:41 | cancelled |
|    212 | tpcc       |            | create schema | none         |       211 |        0 |         0 | 2020-09-21 10:12:32 | 2020-09-21 11:20:34 | cancelled |
|    210 | tpcc       |            | create schema | none         |       209 |        0 |         0 | 2020-09-21 10:12:14 | 2020-09-21 11:20:12 | cancelled |
|    208 | sbtest     | sbtest9    | add index     | public       |       111 |      173 |  10000000 | 2020-09-21 05:57:57 | 2020-09-21 07:06:57 | synced    |
|    207 | sbtest     | sbtest6    | add index     | public       |       111 |      191 |  10000000 | 2020-09-21 05:57:57 | 2020-09-21 07:02:38 | synced    |
+--------+------------+------------+---------------+--------------+-----------+----------+-----------+---------------------+---------------------+-----------+
11 rows in set (0.02 sec)
MySQL [(none)]> admin show ddl \G;
*************************** 1. row ***************************
   SCHEMA_VER: 258
     OWNER_ID: 766dd74f-bf39-4677-ae50-834c1c03845c
OWNER_ADDRESS: basic1-tidb-1.basic1-tidb-peer.test-namespace1.svc:4000
 RUNNING_JOBS: ID:226, Type:create schema, State:none, SchemaState:none, SchemaID:225, TableID:0, RowCount:0, ArgLen:0, start time: 2020-09-21 11:38:54.487 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0
      SELF_ID: 766dd74f-bf39-4677-ae50-834c1c03845c
        QUERY: create database tt
1 row in set (0.00 sec)

检查ddl jobs只有一个在创建的,卡不出卡在哪里了

MySQL [(none)]> admin show ddl jobs 10000 WHERE STATE not in ("cancelled","synced") ;
+--------+---------+------------+---------------+--------------+-----------+----------+-----------+---------------------+----------+-------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE      | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | START_TIME          | END_TIME | STATE |
+--------+---------+------------+---------------+--------------+-----------+----------+-----------+---------------------+----------+-------+
|    226 | tt      |            | create schema | none         |       225 |        0 |         0 | 2020-09-21 11:38:54 | NULL     | none  |
+--------+---------+------------+---------------+--------------+-----------+----------+-----------+---------------------+----------+-------+
1 row in set (0.04 sec)

这个问题是使用pump下线命令将所有pump下线了,并且将副本数调为0,让operator控制pump po下完了之后无法写入数据导致。。

后来关掉了 TiDBserver enable-binlog 了吗 ?

  1. 如果确认下线 pump,请将 binlog 设置为 false, tiup 中写为 binlog.enable: false , 修改后,reload tidb。

  2. 副本数调整为 0 是什么意思? 具体操作了什么命令?

把pump恢复会去就好了

:+1:

是在k8s集群上的tidb-operator操作的集群。将pump下线,自动关闭了之后发现有问题。

到时候再把pump完全下线,将replicas调成0看下情况。

好的,如果有问题,欢迎反馈

测了几轮,pump下不去,一下去就会卡住。

这一段加上去了依然无效。目前tidb那边的配置不会

移除 Pump 节点前,必须首先需要执行 `kubectl edit tc ${cluster_name} -n ${namespace}` **设置**其中的 `spec.tidb.binlogEnabled` 为 `false`,等待 tidb pod 完成重启更新后再移除 Pump 节点。

如果直接移除 Pump 节点会导致 TiDB 没有可以写入的 Pump 而无法使用。

无论加到config下,还是加到spec.tidb.binlogEnabled 下都无法解决,配置进去config中的不能自动重启tidb

加到config下也用了,tidb并不能把配置加进去

顺便再问个问题,要改tidb配置并且重启,怎么做

请问和 DDL 卡住有关系吗? 没有的话新提交一个问题到 私有云/公有云 的版本吧,k8s 来看下,多谢。

都是 一个问题,写不进去的问题

tidb 修改完配置后,要重启请参考下述命令:

https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/restart-a-tidb-cluster/

按照上面配置的,发现有问题。

  pd:
    baseImage: harbor.fcbox.com/tidb/pingcap/pd
    replicas: 3
    storageClassName: pd-storage
    configUpdateStrategy: RollingUpdate
    enableDashboardInternalProxy: true
    requests:
      storage: "50Gi"
    config: {}
    annotations:
      tidb.pingcap.com/restartedAt: "202009271800"

是计划让 pd 的 pod 重启,但是没有生效,还是有其他问题?

1、tidb-cluster.yaml

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: push-tidb
spec:
  version: v4.0.6
  timezone: Asia/Shanghai
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  discovery: {}
  pd:
    baseImage: harbor.fcbox.com/tidb/pingcap/pd
    replicas: 3
    storageClassName: pd-storage
    configUpdateStrategy: RollingUpdate
    enableDashboardInternalProxy: true
    requests:
      storage: "50Gi"
    config: {}
    annotations:
      tidb.pingcap.com/restartedAt: "202009271800"
  tikv:
    baseImage: harbor.fcbox.com/tidb/pingcap/tikv
    replicas: 3
    storageClassName: kv-storage
    requests:
      storage: "50Gi"
      cpu: 8
      memory: "45GB"
    limit:
      cpu: 8
      memory: "45GB"
    config:
      storage:
        block-cache:
          capacity: "32GB"
      readpool:
        storage:
          high-concurrency: 8
          normal-concurrency: 8
          low-concurrency: 8
  pump:
    baseImage: harbor.fcbox.com/tidb/pingcap/tidb-binlog
    version: v4.0.6
    replicas: 3
    storageClassName: pump-storage
    requests:
      storage: 10Gi
    schedulerName: default-scheduler
    config:
      addr: 0.0.0.0:8250
      gc: 7
      heartbeat-interval: 2
  tidb:
    baseImage: harbor.fcbox.com/tidb/pingcap/tidb
    replicas: 3
    slowLogTailer:
      image: harbor.fcbox.com/tidb/busybox:1.26.2
    storageClassName: tidb-storage
    binlogEnabled: false
    annotations:
      tidb.pingcap.com/restartedAt: "202009271800"
    requests:
      storage: "1Gi"
    service:
      type: ClusterIP
    config:
      binlog:
        enable: true

2、需要让pd重启生效,按照文档修改了之后发现不生效

按照官网的配置直接报错了。。。还是 语法错误

确认下,当前使用的 tidb-operator 的版本是什么?

1.14

目前卡在已有集群修改成,滚动升级问题上了

好的,了解。

另外,上面要关闭 tidb server 的 binlog 参数,已经成功关闭,并且也能够通过 restartedAt 来重启一组 pod ,如 pd 了吗?