es集群重启某节点后unassigned

es集群重启某节点后,节点上的很多shards unassigned
重启节点只是添加了一些配置(minio做es快照存储的配置,应该和这个没关系,因为已经在其他节点上操作过,没出问题)。


集群allocation和rebalance都设置了all

explain如下:

默认节点磁盘使用率超过85%不允许分配副本,可以看下节点磁盘使用率

尝试过调大这个参数到95%不行,应该不是这个问题,如果是这个问题explain会直接报磁盘空间不足了

那可能就是总shard数大于节点数了,GET _cat/shards?h=index,shard,prirep,state,unassigned.reason 确认下

都是UNASSIGNED NODE_LEFT
应该也不是这个问题,我们集群data节点有22个,分片数最多的才11个分片

分片还有一个副本 那就是22个分片,宕了一个节点,分片就没地方放了。

这个节点已经启来了
应该也不是吧,只要不是同一个分片的主副本,都是可以存在同一个节点上的。 好多未分配的还有挺多空间的

是重启了B1节点吗 看图里面B1节点没有分配任何分片呢。

是的。 我看B1上数据都在,但就是不分配,奇怪啊

那就是有问题 ,集群认为它离开了 ,所以报错分片无法分配。 具体为什么不分配可以看B1节点的es.log

1 Like

b1和master的日志都看过,好像也没啥异常的提示
[2022-12-06T10:37:37,704][WARN ][o.e.l.LicenseService ] [b1]

LICENSE [EXPIRED] ON [WEDNESDAY, FEBRUARY 21, 2018]. IF YOU HAVE A NEW LICENSE, PLEASE UPDATE IT.

OTHERWISE, PLEASE REACH OUT TO YOUR SUPPORT CONTACT.

COMMERCIAL PLUGINS OPERATING WITH REDUCED FUNCTIONALITY

- security

- Cluster health, cluster stats and indices stats operations are blocked

- All data operations (read and write) continue to work

- watcher

- PUT / GET watch APIs are disabled, DELETE watch API continues to work

- Watches execute and write to the history

- The actions of the watches don’t execute

- monitoring

- The agent will stop collecting cluster and indices metrics

- The agent will stop automatically cleaning indices older than [xpack.monitoring.history.duration]

- graph

- Graph explore APIs are disabled

[2022-12-06T10:37:50,168][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:38:50,167][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:39:50,165][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:40:50,166][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:41:50,166][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:42:50,166][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:43:50,165][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:44:50,168][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:45:50,169][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:46:50,168][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:47:37,704][WARN ][o.e.l.LicenseService ] [b1]

LICENSE [EXPIRED] ON [WEDNESDAY, FEBRUARY 21, 2018]. IF YOU HAVE A NEW LICENSE, PLEASE UPDATE IT.

OTHERWISE, PLEASE REACH OUT TO YOUR SUPPORT CONTACT.

COMMERCIAL PLUGINS OPERATING WITH REDUCED FUNCTIONALITY

- security

- Cluster health, cluster stats and indices stats operations are blocked

- All data operations (read and write) continue to work

- watcher

- PUT / GET watch APIs are disabled, DELETE watch API continues to work

- Watches execute and write to the history

- The actions of the watches don’t execute

- monitoring

- The agent will stop collecting cluster and indices metrics

- The agent will stop automatically cleaning indices older than [xpack.monitoring.history.duration]

- graph

- Graph explore APIs are disabled

[2022-12-06T10:47:50,167][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:48:50,167][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:49:50,167][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:50:50,167][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:51:50,168][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:52:50,168][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[2022-12-06T10:53:50,169][INFO ][o.e.i.a.JiebaWordDictionary] Load user dict done. Content: 粉水 100
[hadoop@b1 ~]$

这不就是异常么,提示过期,就没法加入集群。
License Expired - #12 by TimV - Elasticsearch - Discuss the Elastic Stack 可以仔细看看这块操作的风险性

这个过期是xpack,这个没影响的,我们集群每个节点都报这个错。

问题已经解决,是那个临时和持久配置参数的问题,感谢 @wakaka 解答
image

看了半天这讨论的是es呀