磁盘阈值分配控制配置

配置项作用 #

cluster.routing.allocation.disk.threshold_enabled 配置项控制是否启用基于磁盘使用率的分片分配策略。启用后，集群会根据各节点的磁盘使用情况，自动决定是否向该节点分配新的分片，防止磁盘空间不足导致节点故障。

配置项类型 #

该配置项为动态配置，可以在运行时通过集群设置 API 进行修改。

默认值 #

true（启用磁盘阈值检查）

是否必需 #

可选配置项（有默认值）

配置格式 #

# 默认配置（推荐）
cluster.routing.allocation.disk.threshold_enabled: true

# 禁用磁盘阈值检查
cluster.routing.allocation.disk.threshold_enabled: false

配置项	默认值	说明
`cluster.routing.allocation.disk.watermark.low`	85%	低水位线，超过此值不再分配新分片
`cluster.routing.allocation.disk.watermark.high`	90%	高水位线，超过此值将迁移分片
`cluster.routing.allocation.disk.watermark.flood_stage`	95%	洪水水位线，超过此值阻塞所有写操作
`cluster.routing.allocation.disk.reroute_interval`	60s	磁盘检查间隔
`cluster.routing.allocation.disk.watermark.enable_for_single_data_node`	false	单数据节点是否启用

工作原理 #

磁盘阈值分配机制通过三级水位线控制分片分配：

┌─────────────────────────────────────────────────────────────────┐
│                    磁盘使用率水位线                              │
└─────────────────────────────────────────────────────────────────┘

磁盘空闲 0%                    磁盘使用 100%
    │                              │
    ▼                              ▼
├────────────────────────────────────────────────────────────────┤
│  磁盘使用率进展                                                │
├────────────────────────────────────────────────────────────────┤
                                                                   │
    0% ──┬──────────────────────────────────────────────────── 100%
        │
        │    低水位线 (85%)     高水位线 (90%)   洪水线 (95%)
        │        ↓                  ↓              ↓
        │    正常分配       停止分配新分片    迁移现有分片    阻塞写入
        │
        ▼
   ┌────┴────────────────────────────────────────────────────────┐
   │ 操作行为                                                     │
   ├────────────────────────────────────────────────────────────┤
   │ 0% - 85%:  正常状态                                         │
   │            ✓ 接受新分片分配                                 │
   │            ✓ 接受分片迁移                                   │
   │            ✓ 所有操作正常                                   │
   ├────────────────────────────────────────────────────────────┤
   │ 85% - 90%: 警告状态（低水位线触发）                         │
   │            ✗ 不再分配新分片                                 │
   │            ✓ 已有分片可保留                                 │
   │            ✓ 允许写入操作                                   │
   ├────────────────────────────────────────────────────────────┤
   │ 90% - 95%: 严重状态（高水位线触发）                         │
   │            ✗ 不再分配新分片                                 │
   │            ✗ 强制迁移分片到其他节点                         │
   │            ✓ 允许写入操作                                   │
   ├────────────────────────────────────────────────────────────┤
   │ 95% - 100%: 紧急状态（洪水水位线触发）                      │
   │            ✗ 不再分配新分片                                 │
   │            ✗ 迁移分片                                       │
   │            ✗ 阻塞所有索引写入操作                           │
   │            ✓ 只允许删除操作                                 │
   └────────────────────────────────────────────────────────────┘

使用场景 #

1. 生产环境（强烈推荐启用） #

cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

2. 小磁盘节点 #

# 更激进的阈值
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 75%
cluster.routing.allocation.disk.watermark.high: 80%
cluster.routing.allocation.disk.watermark.flood_stage: 90%

3. 大容量存储 #

# 使用绝对值而非百分比
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 500gb
cluster.routing.allocation.disk.watermark.high: 300gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb

4. 临时禁用（不推荐） #

# 仅在特殊情况下使用
cluster.routing.allocation.disk.threshold_enabled: false

配置格式详解 #

百分比格式（推荐）：

cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

字节值格式：

# 至少保留指定空间
cluster.routing.allocation.disk.watermark.low: 500gb
cluster.routing.allocation.disk.watermark.high: 300gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb

混合格式（支持）：

# 混合使用（不推荐，保持一致性）
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 200gb

使用示例 #

通过 API 动态修改：

# 禁用磁盘阈值检查
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.threshold_enabled": false
  }
}

# 重新启用并调整阈值
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.threshold_enabled": true,
    "cluster.routing.allocation.disk.watermark.low": "80%",
    "cluster.routing.allocation.disk.watermark.high": "85%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "90%"
  }
}

使用字节值配置：

# 大容量存储场景
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "1tb",
    "cluster.routing.allocation.disk.watermark.high": "500gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "200gb"
  }
}

水位线说明 #

低水位线（low watermark） #

默认值: 85%
触发条件: 磁盘使用率超过设定值
影响:
- 不再向该节点分配新的主分片或副本分片
- 已有分片可以保留
- 正常的读写操作不受影响
用途: 预警机制，提前防止磁盘填满

高水位线（high watermark） #

默认值: 90%
触发条件: 磁盘使用率超过设定值
影响:
- 不再分配新分片
- 强制将部分分片迁移到磁盘空间充足的节点
- 节点会被标记为"离线分配"
用途: 主动恢复，通过迁移分片释放磁盘空间

洪水水位线（flood stage watermark） #

默认值: 95%
触发条件: 磁盘使用率超过设定值
影响:
- 阻塞所有索引的写操作
- 只允许删除索引的操作
- 集群进入"洪水阶段"（flood stage）
用途: 最后防线，防止磁盘完全填满导致节点崩溃

洪水阶段恢复 #

当节点进入洪水阶段后，需要手动解除阻塞：

# 方法 1: 清理磁盘空间后自动解除
# 删除一些数据或清理日志，使磁盘使用率降到洪水线以下

# 方法 2: 手动解除洪水阶段（谨慎使用）
PUT /_all/_settings
{
  "index.blocks.read_only_allow_delete": null
}

场景	低水位线	高水位线	洪水线	说明
默认/生产	85%	90%	95%	通用推荐配置
小磁盘	75%	80%	90%	更早触发保护
大容量	500GB	300GB	100GB	使用绝对值
测试环境	90%	95%	98%	更宽松的限制

单数据节点配置 #

对于单节点集群，默认不启用磁盘阈值：

# 单节点集群默认配置
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: false

如需启用：

cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true
cluster.routing.allocation.disk.threshold_enabled: true

监控建议 #

# 查看各节点磁盘使用情况
GET /_cat/nodes?v&h=name,disk.used_percent,disk.used,disk.total

# 查看分配解释信息
GET /_cluster/allocation/explain?pretty

# 查看当前磁盘配置
GET /_cluster/settings?filter_path=*.cluster.routing.allocation.disk*

常见问题 #

问题 1：分片未分配

查看分配失败原因：

GET /_cluster/allocation/explain?pretty

可能显示：

"explanation" : "deciders" : [ {
  "decision" : "NO",
  "explanation" : "disk usage is above the high watermark"
} ]

解决方案：

清理磁盘空间
调整水位线阈值

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.high": "95%"
  }
}

问题 2：索引写入被阻塞

错误信息：

ClusterBlockException[index blocked by: [FORBIDDEN/5/index read-only / allow delete]];

原因： 磁盘使用率超过洪水线

解决方案：

清理磁盘空间
检查磁盘使用

GET /_cat/nodes?v&h=name,disk.used_percent

解除阻塞（在磁盘空间释放后）

PUT /_all/_settings
{
  "index.blocks.read_only_allow_delete": null
}

问题 3：是否应该禁用磁盘阈值？

不建议禁用！ 除非有特殊情况：

已有完善的磁盘监控和告警
外部系统负责磁盘管理
临时调试（完成后立即重新启用）

性能影响分析 #

配置	优点	缺点
启用（默认）	自动保护，防止磁盘满	需要定期检查磁盘
禁用	无额外检查开销	可能导致磁盘填满

其他磁盘相关配置 #

cluster.routing.allocation.disk.include_relocations #

配置项作用：控制计算磁盘使用率时是否包含正在迁移的分片大小。

默认值：true

配置类型：动态配置

配置格式：

# 默认配置：包含正在迁移的分片
cluster.routing.allocation.disk.include_relocations: true

# 排除正在迁移的分片
cluster.routing.allocation.disk.include_relocations: false

说明：

true：计算磁盘使用时包含正在迁入和迁出的分片
false：只计算已稳定存在的分片

推荐设置：

默认值 true 适用于大多数场景
如果分片迁移频繁导致磁盘水位误判，可设置为 false

cluster.routing.allocation.disk.reroute_interval #

配置项作用：控制磁盘使用率检查的间隔时间，用于触发基于磁盘水位的分片重分配。

默认值：60s（60秒）

配置类型：动态配置

配置格式：

# 默认配置
cluster.routing.allocation.disk.reroute_interval: 60s

# 更频繁的检查
cluster.routing.allocation.disk.reroute_interval: 30s

# 较少频率的检查
cluster.routing.allocation.disk.reroute_interval: 120s

推荐设置：

场景	推荐值	说明
默认	60s	标准配置
快速响应	30s	更快触发重分配
稳定环境	120s	减少检查频率

cluster.routing.allocation.disk.watermark.enable_for_single_data_node #

配置项作用：控制单数据节点集群是否启用磁盘水位线检查。

默认值：false

配置类型：动态配置

配置格式：

# 默认配置：单节点不启用
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: false

# 单节点也启用磁盘检查
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true

说明：

单节点集群通常无法迁移分片，因此默认不启用
如果希望单节点也有磁盘保护，可以手动启用

使用场景：

# 单节点开发/测试环境
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

完整磁盘配置示例 #

生产环境多节点集群 #

# easysearch.yml

cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
cluster.routing.allocation.disk.include_relocations: true
cluster.routing.allocation.disk.reroute_interval: 60s
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: false

大容量存储集群 #

# 使用字节值而非百分比
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 1tb
cluster.routing.allocation.disk.watermark.high: 500gb
cluster.routing.allocation.disk.watermark.flood_stage: 200gb
cluster.routing.allocation.disk.include_relocations: true
cluster.routing.allocation.disk.reroute_interval: 120s

单节点开发环境 #

# 单节点集群也启用磁盘保护
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
cluster.routing.allocation.disk.include_relocations: false
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true

注意事项 #

动态更新：磁盘阈值配置为动态配置，可在线修改
阈值顺序：必须满足 low < high < flood_stage
混合格式：三个水位线应使用相同格式（全部百分比或全部字节值）
自动恢复：洪水阶段在磁盘空间释放后需要手动解除
监控告警：建议配合外部监控系统对磁盘使用率进行告警
单节点特殊：单数据节点默认不启用，需手动配置
重连间隔：reroute_interval 不宜设置过短，避免频繁检查消耗资源
包含迁移分片：include_relocations 在大规模迁移时可能导致磁盘使用率计算偏高

标签

集群配置磁盘管理分片分配

磁盘阈值分配控制配置

配置项作用 #

配置项类型 #

默认值 #

是否必需 #

配置格式 #

相关配置项 #

工作原理 #

使用场景 #

1. 生产环境（强烈推荐启用） #

2. 小磁盘节点 #

3. 大容量存储 #

4. 临时禁用（不推荐） #

配置格式详解 #

使用示例 #

水位线说明 #

低水位线（low watermark） #

高水位线（high watermark） #

洪水水位线（flood stage watermark） #

洪水阶段恢复 #

推荐设置建议 #

单数据节点配置 #

监控建议 #

常见问题 #

性能影响分析 #

其他磁盘相关配置 #

cluster.routing.allocation.disk.include_relocations #

cluster.routing.allocation.disk.reroute_interval #

cluster.routing.allocation.disk.watermark.enable_for_single_data_node #

完整磁盘配置示例 #

生产环境多节点集群 #

大容量存储集群 #

单节点开发环境 #

注意事项 #