今天发现TiDB 4.0-RC 执行创建数据库的语句(create database XXXX;)不返回;
你好,
- 如果通过 tiup 部署,请 display 检查下集群状态。
- 请上传 tikv.log、tidb.log、pd.log。
要什么级别的日志?
麻烦执行一下 admin show ddl jobs 看看结果?
截图发出来吗?
3218 | ubehaviordb | create table | none | 339 | 3217 | 0 | 2020-04-22 11:14:14 | none | ||
---|---|---|---|---|---|---|---|---|---|---|
3220 | ubehaviordb | create table | none | 339 | 3219 | 0 | 2020-04-23 11:36:05 | none | ||
3222 | ubehaviordb | create table | none | 339 | 3221 | 0 | 2020-04-24 10:58:23 | none | ||
3224 | ubehaviordb | create table | none | 339 | 3223 | 0 | 2020-04-25 10:36:08 | none | ||
3226 | ubehaviordb | create table | none | 339 | 3225 | 0 | 2020-04-26 09:57:15 | none | ||
3228 | ubehaviordb | create table | none | 339 | 3227 | 0 | 2020-04-26 09:58:29 | none | ||
3230 | ubehaviordb | create table | none | 339 | 3229 | 0 | 2020-04-26 10:01:13 | none | ||
3232 | ubehaviordb | create table | none | 339 | 3231 | 0 | 2020-04-26 10:03:33 | none | ||
3234 | ubehaviordb | create table | none | 339 | 3233 | 0 | 2020-04-26 10:05:52 | none | ||
3236 | ubehaviordb | create table | none | 339 | 3235 | 0 | 2020-04-26 10:06:43 | none | ||
3237 | ubehaviordb | user_behavior_acc_20200327 | drop table | none | 339 | 470 | 0 | 2020-04-26 10:10:25 | none | |
3238 | ubehaviordb | user_behavior_acc_20200327 | drop table | none | 339 | 470 | 0 | 2020-04-26 10:11:21 | none | |
3239 | ubehaviordb | user_behavior_acc_20200327 | drop table | none | 339 | 470 | 0 | 2020-04-26 10:12:23 | none | |
3240 | ubehaviordb | user_behavior_acc_20200327 | drop table | none | 339 | 470 | 0 | 2020-04-26 10:12:49 | none | |
3241 | ubehaviordb | user_behavior_acc_20200321 | drop table | none | 339 | 483 | 0 | 2020-04-26 10:13:11 | none | |
3242 | ubehaviordb | user_behavior_acc_20200321 | drop table | none | 339 | 483 | 0 | 2020-04-26 10:20:52 | none | |
3243 | ubehaviordb | user_behavior_acc_20200327 | drop table | none | 339 | 470 | 0 | 2020-04-26 10:56:59 | none | |
3244 | ubehaviordb | user_behavior_acc_20200327 | drop table | none | 339 | 470 | 0 | 2020-04-26 10:59:45 | none | |
3245 | ubehaviordb | user_behavior_acc_20200321 | drop table | none | 339 | 483 | 0 | 2020-04-26 11:15:03 | none | |
3246 | ubehaviordb | user_behavior_acc_20200321 | drop table | none | 339 | 483 | 0 | 2020-04-26 11:23:31 | none | |
3248 | ubehaviordb | user_behavior_acc_20200321 | truncate table | none | 339 | 483 | 0 | 2020-04-26 11:25:24 | none | |
3250 | ubehaviordb | create table | none | 339 | 3249 | 0 | 2020-04-26 11:27:20 | none | ||
3252 | ubehaviordb | create table | none | 339 | 3251 | 0 | 2020-04-26 11:29:17 | none | ||
3254 | cdeldca | create schema | none | 3253 | 0 | 0 | 2020-04-26 12:08:09 | none | ||
3256 | cdeldca | create schema | none | 3255 | 0 | 0 | 2020-04-26 12:15:21 | none | ||
3258 | cdeldca | create schema | none | 3257 | 0 | 0 | 2020-04-26 12:17:15 | none | ||
3260 | cdeldca | create schema | none | 3259 | 0 | 0 | 2020-04-26 12:31:35 | none | ||
3216 | ubehaviordb | user_behavior_acc_20200420 | create table | public | 339 | 3215 | 0 | 2020-04-21 10:38:47 | 2020-04-21 10:38:47 | synced |
3214 | cdel_piwik | news_content_data_mobile_week_2015 | create table | public | 2541 | 3213 | 0 | 2020-04-20 15:22:25 | 2020-04-20 15:22:25 | synced |
3212 | cdel_piwik | dailyreport_base | create table | public | 2541 | 3211 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:25 | synced |
3210 | cdel_piwik | transformation_url_info | create table | public | 2541 | 3209 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
3208 | cdel_piwik | news_content_data_week_zikao365_2018 | create table | public | 2541 | 3207 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
3206 | cdel_piwik | indicator_orderpay | create table | public | 2541 | 3205 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
3204 | cdel_piwik | news_content_data_month_zikao365_2020 | create table | public | 2541 | 3203 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
3202 | cdel_piwik | news_content_data_week_chinatat_2015 | create table | public | 2541 | 3201 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
3200 | cdel_piwik | news_content_data_month_for68_2020 | create table | public | 2541 | 3199 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
3198 | cdel_piwik | news_content_data_week_law_2015 | create table | public | 2541 | 3197 | 0 | 2020-04-20 15:22:24 | 2020-04-20 15:22:24 | synced |
看起来是卡住了。遇到这种问题先查一下 ddl 现在的状态
admin show ddl jobs
看到里面并不是都是 sync 的
然后看下 ddl owner 在哪个机器。
curl http://{TiDBIP}:10080/info
curl http://{TiDBIP}:10080/info/all
https://github.com/pingcap/tidb/blob/master/docs/tidb_http_api.md
这个好像有 SQL 可以搞?我忘记了。
然后可以抓一下 goroutine 看看 owner 在干嘛。
curl ‘http://{TiDBIP}:10080/debug/pprof/goroutine?debug=2’
麻烦取一下相关信息 @zyw8136
我现在cancel这些都cancel不掉。(admin cancel ddl jobs)
因为 block 住了。cancel 操作也是要排队的。所以需要排查为啥 block 住了。
上面的信息可以收集一下么?
info: { “is_owner”: false, “version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”, “git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”, “ddl_id”: “14b155e8-b391-4e3f-b52b-be666c822a7b”, “ip”: “10.42.50.159”, “listening_port”: 4000, “status_port”: 10080, “lease”: “45s”, “binlog_status”: “Off”, “start_timestamp”: 1587366542 }
info/all
{ “servers_num”: 2, “owner_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”, “is_all_server_version_consistent”: true, “all_servers_info”: { “14b155e8-b391-4e3f-b52b-be666c822a7b”: { “version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”, “git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”, “ddl_id”: “14b155e8-b391-4e3f-b52b-be666c822a7b”, “ip”: “10.42.50.159”, “listening_port”: 4000, “status_port”: 10080, “lease”: “45s”, “binlog_status”: “Off”, “start_timestamp”: 1587366542 }, “16231dab-7f52-4b48-88a0-fc80d9c239d5”: { “version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”, “git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”, “ddl_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”, “ip”: “10.42.50.160”, “listening_port”: 4000, “status_port”: 10080, “lease”: “45s”, “binlog_status”: “Off”, “start_timestamp”: 1587104965 } } }
tidb (463.1 KB)
麻烦再拿一下 TiDB DDL Owner 的日志,谢谢。
info:
{
“is_owner”: true,
“version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”,
“git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”,
“ddl_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”,
“ip”: “10.42.50.160”,
“listening_port”: 4000,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Off”,
“start_timestamp”: 1587104965
}
info/all
{
“servers_num”: 2,
“owner_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”,
“is_all_server_version_consistent”: true,
“all_servers_info”: {
“14b155e8-b391-4e3f-b52b-be666c822a7b”: {
“version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”,
“git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”,
“ddl_id”: “14b155e8-b391-4e3f-b52b-be666c822a7b”,
“ip”: “10.42.50.159”,
“listening_port”: 4000,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Off”,
“start_timestamp”: 1587366542
},
“16231dab-7f52-4b48-88a0-fc80d9c239d5”: {
“version”: “5.7.25-TiDB-v4.0.0-rc-1-g56bb75b”,
“git_hash”: “56bb75b4fc86c2edd524edfb9a2b3723bdf7a735”,
“ddl_id”: “16231dab-7f52-4b48-88a0-fc80d9c239d5”,
“ip”: “10.42.50.160”,
“listening_port”: 4000,
“status_port”: 10080,
“lease”: “45s”,
“binlog_status”: “Off”,
“start_timestamp”: 1587104965
}
}
}
tidb160 (293.5 KB)
github.com/pingcap/tidb/util.WithRecovery.func1
/root/go/src/github.com/pingcap/tidb/util/misc.go:90
runtime.gopanic
/usr/local/go/src/runtime/panic.go:679
runtime.panicmem
/usr/local/go/src/runtime/panic.go:199
runtime.sigpanic
/usr/local/go/src/runtime/signal_unix.go:394
github.com/pingcap/tidb/store/tikv.(*rpcClient).recycleDieConnArray
/root/go/src/github.com/pingcap/tidb/store/tikv/client_batch.go:658
github.com/pingcap/tidb/store/tikv.(*rpcClient).recycleDieConnArray
/root/go/src/github.com/pingcap/tidb/store/tikv/client_batch.go:658
github.com/pingcap/tidb/store/tikv.(*rpcClient).SendRequest
/root/go/src/github.com/pingcap/tidb/store/tikv/client.go:319
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion
/root/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:199
github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReqCtx
/root/go/src/github.com/pingcap/tidb/store/tikv/region_request.go:162
github.com/pingcap/tidb/store/tikv.(*clientHelper).SendReqCtx
/root/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:814
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).get
/root/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:347
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Get
/root/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:298
github.com/pingcap/tidb/kv.(*unionStore).Get
/root/go/src/github.com/pingcap/tidb/kv/union_store.go:201
github.com/pingcap/tidb/store/tikv.(*tikvTxn).Get
/root/go/src/github.com/pingcap/tidb/store/tikv/txn.go:122
github.com/pingcap/tidb/structure.(*TxStructure).loadListMeta
/root/go/src/github.com/pingcap/tidb/structure/list.go:217
github.com/pingcap/tidb/structure.(*TxStructure).LIndex
/root/go/src/github.com/pingcap/tidb/structure/list.go:163
github.com/pingcap/tidb/meta.(*Meta).getDDLJob
/root/go/src/github.com/pingcap/tidb/meta/meta.go:603
github.com/pingcap/tidb/meta.(*Meta).GetDDLJobByIdx
/root/go/src/github.com/pingcap/tidb/meta/meta.go:632
github.com/pingcap/tidb/ddl.(*worker).getFirstDDLJob
/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:272
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue.func1
/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:435
github.com/pingcap/tidb/kv.RunInNewTxn
/root/go/src/github.com/pingcap/tidb/kv/txn.go:47
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue
/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:426
github.com/pingcap/tidb/ddl.(*worker).start
/root/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:150
github.com/pingcap/tidb/ddl.(*ddl).start.func2
/root/go/src/github.com/pingcap/tidb/ddl/ddl.go:353
github.com/pingcap/tidb/util.WithRecovery
/root/go/src/github.com/pingcap/tidb/util/misc.go:93
应该是这个 bug 导致的
https://github.com/pingcap/tidb/pull/16299
DDL 在执行请求时 panic 了,然后就卡死了
最新已发布的版本(4.0.0-rc.1)里面还没有带入这个 fix
嗯,我重启TiDB了。好了。
暂时的解决可以试一下,用 4.0 branch 最新的编译一个 binary
又或者试一下改配置 把
# Max batch size in gRPC.
max-batch-size = 128
这里改成 0 试试。对性能会有一定影响,不过或许可以恢复。
我感觉还会再遇到问题的。建议配置先改掉。
好的,谢谢!