【 TiDB 使用环境】生产环境
【 TiDB 版本】tikv v6.1.0
【复现路径】环境遭遇断电,TiKV部分节点失效,不断重启。pd节点会不定期panic重启
【遇到的问题:问题现象及影响】
TiKV集群在断电后重启,出现部分节点无法恢复的情况。pd能提供服务,但不定期(十几二十分钟)会重启,panic信息是处理热点region。在此期间,使用tikv-ctl bad-ssts检测坏的sst时会因为pd重启而失败,无法进一步修复TiKV节点。
【附件:截图/日志/监控】pd panic日志:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1bc477e]
goroutine 488 [running]:
github.com/tikv/pd/server.(*Handler).packHotRegions(0xc0005ada70, 0x1a38d66?, {0x2708d13, 0x4})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/handler.go:1050 +0x37e
github.com/tikv/pd/server.(*Handler).PackHistoryHotReadRegions(0xc001f3ee30?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/handler.go:1005 +0x3e
github.com/tikv/pd/server/storage.(*HotRegionStorage).pullHotRegionInfo(0xc0008b8980)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/storage/hot_region_storage.go:258 +0x2e
github.com/tikv/pd/server/storage.(*HotRegionStorage).backgroundFlush(0xc0008b8980)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/storage/hot_region_storage.go:218 +0x195
created by github.com/tikv/pd/server/storage.NewHotRegionsStorage
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/storage/hot_region_storage.go:159 +0x21b