TiDB 4.0-RC 执行创建数据库的语句不返回!

今天发现TiDB 4.0-RC 执行创建数据库的语句(create database XXXX;)不返回;

你好,

  1. 如果通过 tiup 部署,请 display 检查下集群状态。
  2. 请上传 tikv.log、tidb.log、pd.log。

要什么级别的日志?

TiKV - logs.tar (3.8 MB)
TiKV - logs.tar (3.8 MB)
TiKV - logs.tar (3.8 MB)

麻烦执行一下 admin show ddl jobs 看看结果?

截图发出来吗?

3218 ubehaviordb create table none 339 3217 0 2020-04-22 11:14:14 none
3220 ubehaviordb create table none 339 3219 0 2020-04-23 11:36:05 none
3222 ubehaviordb create table none 339 3221 0 2020-04-24 10:58:23 none
3224 ubehaviordb create table none 339 3223 0 2020-04-25 10:36:08 none
3226 ubehaviordb create table none 339 3225 0 2020-04-26 09:57:15 none
3228 ubehaviordb create table none 339 3227 0 2020-04-26 09:58:29 none
3230 ubehaviordb create table none 339 3229 0 2020-04-26 10:01:13 none
3232 ubehaviordb create table none 339 3231 0 2020-04-26 10:03:33 none
3234 ubehaviordb create table none 339 3233 0 2020-04-26 10:05:52 none
3236 ubehaviordb create table none 339 3235 0 2020-04-26 10:06:43 none
3237 ubehaviordb user_behavior_acc_20200327 drop table none 339 470 0 2020-04-26 10:10:25 none
3238 ubehaviordb user_behavior_acc_20200327 drop table none 339 470 0 2020-04-26 10:11:21 none
3239 ubehaviordb user_behavior_acc_20200327 drop table none 339 470 0 2020-04-26 10:12:23 none
3240 ubehaviordb user_behavior_acc_20200327 drop table none 339 470 0 2020-04-26 10:12:49 none
3241 ubehaviordb user_behavior_acc_20200321 drop table none 339 483 0 2020-04-26 10:13:11 none
3242 ubehaviordb user_behavior_acc_20200321 drop table none 339 483 0 2020-04-26 10:20:52 none
3243 ubehaviordb user_behavior_acc_20200327 drop table none 339 470 0 2020-04-26 10:56:59 none
3244 ubehaviordb user_behavior_acc_20200327 drop table none 339 470 0 2020-04-26 10:59:45 none
3245 ubehaviordb user_behavior_acc_20200321 drop table none 339 483 0 2020-04-26 11:15:03 none
3246 ubehaviordb user_behavior_acc_20200321 drop table none 339 483 0 2020-04-26 11:23:31 none
3248 ubehaviordb user_behavior_acc_20200321 truncate table none 339 483 0 2020-04-26 11:25:24 none
3250 ubehaviordb create table none 339 3249 0 2020-04-26 11:27:20 none
3252 ubehaviordb create table none 339 3251 0 2020-04-26 11:29:17 none
3254 cdeldca create schema none 3253 0 0 2020-04-26 12:08:09 none
3256 cdeldca create schema none 3255 0 0 2020-04-26 12:15:21 none
3258 cdeldca create schema none 3257 0 0 2020-04-26 12:17:15 none
3260 cdeldca create schema none 3259 0 0 2020-04-26 12:31:35 none
3216 ubehaviordb user_behavior_acc_20200420 create table public 339 3215 0 2020-04-21 10:38:47 2020-04-21 10:38:47 synced
3214 cdel_piwik news_content_data_mobile_week_2015 create table public 2541 3213 0 2020-04-20 15:22:25 2020-04-20 15:22:25 synced
3212 cdel_piwik dailyreport_base create table public 2541 3211 0 2020-04-20 15:22:24 2020-04-20 15:22:25 synced
3210 cdel_piwik transformation_url_info create table public 2541 3209 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced
3208 cdel_piwik news_content_data_week_zikao365_2018 create table public 2541 3207 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced
3206 cdel_piwik indicator_orderpay create table public 2541 3205 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced
3204 cdel_piwik news_content_data_month_zikao365_2020 create table public 2541 3203 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced
3202 cdel_piwik news_content_data_week_chinatat_2015 create table public 2541 3201 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced
3200 cdel_piwik news_content_data_month_for68_2020 create table public 2541 3199 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced
3198 cdel_piwik news_content_data_week_law_2015 create table public 2541 3197 0 2020-04-20 15:22:24 2020-04-20 15:22:24 synced

看起来是卡住了。遇到这种问题先查一下 ddl 现在的状态

admin show ddl jobs

看到里面并不是都是 sync 的

然后看下 ddl owner 在哪个机器。

curl http://{TiDBIP}:10080/info

curl http://{TiDBIP}:10080/info/all

https://github.com/pingcap/tidb/blob/master/docs/tidb_http_api.md

这个好像有 SQL 可以搞?我忘记了。

然后可以抓一下 goroutine 看看 owner 在干嘛。

curl ‘http://{TiDBIP}:10080/debug/pprof/goroutine?debug=2’

麻烦取一下相关信息 @zyw8136

我现在cancel这些都cancel不掉。(admin cancel ddl jobs)

因为 block 住了。cancel 操作也是要排队的。所以需要排查为啥 block 住了。

上面的信息可以收集一下么?

info: { “is_owner”: false, “version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”, “git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”, “ddl_id”: “14b155e8-b391-4e3f-b52b-be666c822a7b”, “ip”: “10.42.50.159”, “listening_port”: 4000, “status_port”: 10080, “lease”: “45s”, “binlog_status”: “Off”, “start_timestamp”: 1587366542 }

info/all

{ “servers_num”: 2, “owner_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”, “is_all_server_version_consistent”: true, “all_servers_info”: { “14b155e8-b391-4e3f-b52b-be666c822a7b”: { “version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”, “git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”, “ddl_id”: “14b155e8-b391-4e3f-b52b-be666c822a7b”, “ip”: “10.42.50.159”, “listening_port”: 4000, “status_port”: 10080, “lease”: “45s”, “binlog_status”: “Off”, “start_timestamp”: 1587366542 }, “16231dab-7f52-4b48-88a0-fc80d9c239d5”: { “version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”, “git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”, “ddl_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”, “ip”: “10.42.50.160”, “listening_port”: 4000, “status_port”: 10080, “lease”: “45s”, “binlog_status”: “Off”, “start_timestamp”: 1587104965 } } }

tidb (463.1 KB)

麻烦再拿一下 TiDB DDL Owner 的日志,谢谢。

info:
{
“is_owner”: true,
“version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”,
“git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”,
“ddl_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”,
“ip”: “10.42.50.160”,
“listening_port”: 4000,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Off”,
“start_timestamp”: 1587104965
}

info/all

{
“servers_num”: 2,
“owner_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”,
“is_all_server_version_consistent”: true,
“all_servers_info”: {
“14b155e8-b391-4e3f-b52b-be666c822a7b”: {
“version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”,
“git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”,
“ddl_id”: “14b155e8-b391-4e3f-b52b-be666c822a7b”,
“ip”: “10.42.50.159”,
“listening_port”: 4000,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Off”,
“start_timestamp”: 1587366542
},
“16231dab-7f52-4b48-88a0-fc80d9c239d5”: {
“version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”,
“git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”,
“ddl_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”,
“ip”: “10.42.50.160”,
“listening_port”: 4000,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Off”,
“start_timestamp”: 1587104965
}
}
}

tidb160 (293.5 KB)

TiDB - logs.tar (2.2 MB)

github.com/pingcap/tidb/util.WithRecovery.func1
		/root/go/src/github.com/pingcap/tidb/util/misc.go:90
runtime.gopanic
		/usr/local/go/src/runtime/panic.go:679
runtime.panicmem
		/usr/local/go/src/runtime/panic.go:199
runtime.sigpanic
		/usr/local/go/src/runtime/signal_unix.go:394
github.com/pingcap/tidb/store/tikv.(*rpcClient).recycleDieConnArray
		/root/go/src/github.com/pingcap/tidb/store/tikv/client_batch.go:658
github.com/pingcap/tidb/store/tikv.(*rpcClient).recycleDieConnArray
		/root/go/src/github.com/pingcap/tidb/store/tikv/client_batch.go:658
github.com/pingcap/tidb/store/tikv.(*rpcClient).SendRequest
		/root/go/src/github.com/pingcap/tidb/store/tikv/client.go:319
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion
		
/root/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:199
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReqCtx
		/root/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:162
github.com/pingcap/tidb/store/tikv.(*clientHelper).SendReqCtx
		/root/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:814
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).get
		/root/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:347
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Get
		/root/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:298
github.com/pingcap/tidb/kv.(*unionStore).Get
		/root/go/src/github.com/pingcap/tidb/kv/union_store.go:201
github.com/pingcap/tidb/store/tikv.(*tikvTxn).Get
		/root/go/src/github.com/pingcap/tidb/store/tikv/txn.go:122
github.com/pingcap/tidb/structure.(*TxStructure).loadListMeta
		/root/go/src/github.com/pingcap/tidb/structure/list.go:217
github.com/pingcap/tidb/structure.(*TxStructure).LIndex
		/root/go/src/github.com/pingcap/tidb/structure/list.go:163
github.com/pingcap/tidb/meta.(*Meta).getDDLJob
		/root/go/src/github.com/pingcap/tidb/meta/meta.go:603
github.com/pingcap/tidb/meta.(*Meta).GetDDLJobByIdx
		/root/go/src/github.com/pingcap/tidb/meta/meta.go:632
github.com/pingcap/tidb/ddl.(*worker).getFirstDDLJob
		/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:272
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue.func1
		/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:435
github.com/pingcap/tidb/kv.RunInNewTxn
		/root/go/src/github.com/pingcap/tidb/kv/txn.go:47
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue
		/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:426
github.com/pingcap/tidb/ddl.(*worker).start
		/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:150
github.com/pingcap/tidb/ddl.(*ddl).start.func2
		/root/go/src/github.com/pingcap/tidb/ddl/ddl.go:353
github.com/pingcap/tidb/util.WithRecovery
		/root/go/src/github.com/pingcap/tidb/util/misc.go:93

应该是这个 bug 导致的

https://github.com/pingcap/tidb/pull/16299

DDL 在执行请求时 panic 了,然后就卡死了

最新已发布的版本(4.0.0-rc.1)里面还没有带入这个 fix

嗯,我重启TiDB了。好了。

暂时的解决可以试一下,用 4.0 branch 最新的编译一个 binary

又或者试一下改配置 把

# Max batch size in gRPC.
max-batch-size = 128

这里改成 0 试试。对性能会有一定影响,不过或许可以恢复。

我感觉还会再遇到问题的。建议配置先改掉。

好的,谢谢!

:handshake: