pd crash

Bug 反馈
清晰准确地描述您发现的问题,提供任何可能复现问题的步骤有助于研发同学及时处理问题
pd节点突然crash,日志信息如下:
[2021/10/19 16:26:07.913 +08:00] [WARN] [cluster.go:427] [“store does not have enough disk space”] [store-id=4] [capacity=2147483648000] [available=0]
[2021/10/19 16:26:08.063 +08:00] [WARN] [cluster.go:427] [“store does not have enough disk space”] [store-id=15876626] [capacity=2147483648000] [available=0]
[2021/10/19 16:26:08.164 +08:00] [INFO] [operator_controller.go:419] [“add operator”] [region-id=1018289] [operator="“balance-region {mv peer: store [11] to [10291232
2]} (kind:region,balance, region:1018289(64,2084), createAt:2021-10-19 16:26:08.163947292 +0800 CST m=+28958657.847528480, startAt:0001-01-01 00:00:00 +0000 UTC, curre
ntStep:0, steps:[add learner peer 102912490 on store 102912322, promote learner peer 102912490 on store 102912322 to voter, remove peer on store 11])”"]
[2021/10/19 16:26:08.164 +08:00] [INFO] [operator_controller.go:598] [“send schedule command”] [region-id=1018289] [step="add learner peer 102912490 on store 102912322
"] [source=create]
[2021/10/19 16:26:08.169 +08:00] [INFO] [cluster.go:487] [“region ConfVer changed”] [region-id=1018289] [detail=“Add peer:{id:102912490 store_id:102912322 is_learner:t
rue }”] [old-confver=2084] [new-confver=2085]
[2021/10/19 16:26:08.169 +08:00] [INFO] [operator_controller.go:598] [“send schedule command”] [region-id=1018289] [step=“add learner peer 102912490 on store 102912322
“] [source=heartbeat]
[2021/10/19 16:26:08.179 +08:00] [FATAL] [log.go:292] [panic] [recover=”“invalid memory address or nil pointer dereference””] [stack=“github.com/pingcap/log.Fatal\
t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200117041106-d28c14d3b1cd/global.go:59\ngithub.com/pingcap/pd/v4
/pkg/logutil.LogPanic\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.com/pingcap/pd/pkg/logutil/log.go:292\ runtime.gopanic\ \t/usr/local/
go/src/runtime/panic.go:679\ runtime.panicmem\ \t/usr/local/go/src/runtime/panic.go:199\ runtime.sigpanic\ \t/usr/local/go/src/runtime/signal_unix.go:394\ngithub.com/p
ingcap/pd/v4/server/schedulers.(*balanceSolver).filterDstStores\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.com/pingcap/pd/server/sched
ulers/hot_region.go:818\ngithub.com/pingcap/pd/v4/server/schedulers.(*balanceSolver).solve\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.
com/pingcap/pd/server/schedulers/hot_region.go:605\ngithub.com/pingcap/pd/v4/server/schedulers.(*hotScheduler).balanceHotWriteRegions\ \t/home/jenkins/agent/workspace/
build_pd_multi_branch_v4.0.1/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:468\ngithub.com/pingcap/pd/v4/server/schedulers.(*hotScheduler).dispatch\ \t/
home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:188\ngithub.com/pingcap/pd/v4/server/schedulers.
(*hotScheduler).Schedule\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:175\ngithub.com/pin
gcap/pd/v4/server/cluster.(*scheduleController).Schedule\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.com/pingcap/pd/server/cluster/coor
dinator.go:601\ngithub.com/pingcap/pd/v4/server/cluster.(*coordinator).runScheduler\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.1/go/src/github.com/pin
gcap/pd/server/cluster/coordinator.go:552”]
[2021/10/19 16:52:41.525 +08:00] [INFO] [util.go:50] [“Welcome to Placement Driver (PD)”]
【 Bug 的影响】

【可能的问题复现步骤】

【看到的非预期行为】

【期望看到的行为】

【相关组件及具体版本】
v4.0.1
【其他背景信息或者截图】
如集群拓扑,系统和内核版本,应用 app 信息等;如果问题跟 SQL 有关,请提供 SQL 语句和相关表的 Schema 信息;如果节点日志存在关键报错,请提供相关节点的日志内容或文件;如果一些业务敏感信息不便提供,请留下联系方式,我们与您私下沟通。

是不是在扩容 tikv 后出现的,可能跟 https://github.com/tikv/pd/issues/3868 有关,4.0.12 修复的

1 个赞

是的,扩容之后出现的

建议升级小版本,到 4.0.15…

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。