---
title: "磁盘阈值分配控制配置"
date: 2026-01-08
lastmod: 2026-01-08
description: "控制基于磁盘使用率的分片分配策略配置项说明"
tags: ["集群配置", "磁盘管理", "分片分配"]
summary: "配置项作用 #  cluster.routing.allocation.disk.threshold_enabled 配置项控制是否启用基于磁盘使用率的分片分配策略。启用后，集群会根据各节点的磁盘使用情况，自动决定是否向该节点分配新的分片，防止磁盘空间不足导致节点故障。
配置项类型 #  该配置项为动态配置，可以在运行时通过集群设置 API 进行修改。
默认值 #  true（启用磁盘阈值检查） 是否必需 #  可选配置项（有默认值）
配置格式 #  # 默认配置（推荐） cluster.routing.allocation.disk.threshold_enabled: true # 禁用磁盘阈值检查 cluster.routing.allocation.disk.threshold_enabled: false 相关配置项 #     配置项 默认值 说明     cluster.routing.allocation.disk.watermark.low 85% 低水位线，超过此值不再分配新分片   cluster.routing.allocation.disk.watermark.high 90% 高水位线，超过此值将迁移分片   cluster.routing.allocation.disk.watermark.flood_stage 95% 洪水水位线，超过此值阻塞所有写操作   cluster.routing.allocation.disk.reroute_interval 60s 磁盘检查间隔   cluster.routing.allocation.disk.watermark.enable_for_single_data_node false 单数据节点是否启用    工作原理 #  磁盘阈值分配机制通过三级水位线控制分片分配："
---


## 配置项作用

`cluster.routing.allocation.disk.threshold_enabled` 配置项控制是否启用基于磁盘使用率的分片分配策略。启用后，集群会根据各节点的磁盘使用情况，自动决定是否向该节点分配新的分片，防止磁盘空间不足导致节点故障。

## 配置项类型

该配置项为**动态配置**，可以在运行时通过集群设置 API 进行修改。

## 默认值

```
true（启用磁盘阈值检查）
```

## 是否必需

**可选配置项**（有默认值）

## 配置格式

```yaml
# 默认配置（推荐）
cluster.routing.allocation.disk.threshold_enabled: true

# 禁用磁盘阈值检查
cluster.routing.allocation.disk.threshold_enabled: false
```

## 相关配置项

| 配置项 | 默认值 | 说明 |
|-------|-------|------|
| `cluster.routing.allocation.disk.watermark.low` | 85% | 低水位线，超过此值不再分配新分片 |
| `cluster.routing.allocation.disk.watermark.high` | 90% | 高水位线，超过此值将迁移分片 |
| `cluster.routing.allocation.disk.watermark.flood_stage` | 95% | 洪水水位线，超过此值阻塞所有写操作 |
| `cluster.routing.allocation.disk.reroute_interval` | 60s | 磁盘检查间隔 |
| `cluster.routing.allocation.disk.watermark.enable_for_single_data_node` | false | 单数据节点是否启用 |

## 工作原理

磁盘阈值分配机制通过三级水位线控制分片分配：

```
┌─────────────────────────────────────────────────────────────────┐
│                    磁盘使用率水位线                              │
└─────────────────────────────────────────────────────────────────┘

磁盘空闲 0%                    磁盘使用 100%
    │                              │
    ▼                              ▼
├────────────────────────────────────────────────────────────────┤
│  磁盘使用率进展                                                │
├────────────────────────────────────────────────────────────────┤
                                                                   │
    0% ──┬──────────────────────────────────────────────────── 100%
        │
        │    低水位线 (85%)     高水位线 (90%)   洪水线 (95%)
        │        ↓                  ↓              ↓
        │    正常分配       停止分配新分片    迁移现有分片    阻塞写入
        │
        ▼
   ┌────┴────────────────────────────────────────────────────────┐
   │ 操作行为                                                     │
   ├────────────────────────────────────────────────────────────┤
   │ 0% - 85%:  正常状态                                         │
   │            ✓ 接受新分片分配                                 │
   │            ✓ 接受分片迁移                                   │
   │            ✓ 所有操作正常                                   │
   ├────────────────────────────────────────────────────────────┤
   │ 85% - 90%: 警告状态（低水位线触发）                         │
   │            ✗ 不再分配新分片                                 │
   │            ✓ 已有分片可保留                                 │
   │            ✓ 允许写入操作                                   │
   ├────────────────────────────────────────────────────────────┤
   │ 90% - 95%: 严重状态（高水位线触发）                         │
   │            ✗ 不再分配新分片                                 │
   │            ✗ 强制迁移分片到其他节点                         │
   │            ✓ 允许写入操作                                   │
   ├────────────────────────────────────────────────────────────┤
   │ 95% - 100%: 紧急状态（洪水水位线触发）                      │
   │            ✗ 不再分配新分片                                 │
   │            ✗ 迁移分片                                       │
   │            ✗ 阻塞所有索引写入操作                           │
   │            ✓ 只允许删除操作                                 │
   └────────────────────────────────────────────────────────────┘
```

## 使用场景

### 1. 生产环境（强烈推荐启用）

```yaml
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
```

### 2. 小磁盘节点

```yaml
# 更激进的阈值
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 75%
cluster.routing.allocation.disk.watermark.high: 80%
cluster.routing.allocation.disk.watermark.flood_stage: 90%
```

### 3. 大容量存储

```yaml
# 使用绝对值而非百分比
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 500gb
cluster.routing.allocation.disk.watermark.high: 300gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb
```

### 4. 临时禁用（不推荐）

```yaml
# 仅在特殊情况下使用
cluster.routing.allocation.disk.threshold_enabled: false
```

## 配置格式详解

**百分比格式（推荐）：**

```yaml
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
```

**字节值格式：**

```yaml
# 至少保留指定空间
cluster.routing.allocation.disk.watermark.low: 500gb
cluster.routing.allocation.disk.watermark.high: 300gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb
```

**混合格式（支持）：**

```yaml
# 混合使用（不推荐，保持一致性）
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 200gb
```

## 使用示例

**通过 API 动态修改：**

```bash
# 禁用磁盘阈值检查
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.threshold_enabled": false
  }
}

# 重新启用并调整阈值
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.threshold_enabled": true,
    "cluster.routing.allocation.disk.watermark.low": "80%",
    "cluster.routing.allocation.disk.watermark.high": "85%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "90%"
  }
}
```

**使用字节值配置：**

```bash
# 大容量存储场景
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "1tb",
    "cluster.routing.allocation.disk.watermark.high": "500gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "200gb"
  }
}
```

## 水位线说明

### 低水位线（low watermark）

- **默认值**: 85%
- **触发条件**: 磁盘使用率超过设定值
- **影响**:
  - 不再向该节点分配新的主分片或副本分片
  - 已有分片可以保留
  - 正常的读写操作不受影响
- **用途**: 预警机制，提前防止磁盘填满

### 高水位线（high watermark）

- **默认值**: 90%
- **触发条件**: 磁盘使用率超过设定值
- **影响**:
  - 不再分配新分片
  - 强制将部分分片迁移到磁盘空间充足的节点
  - 节点会被标记为"离线分配"
- **用途**: 主动恢复，通过迁移分片释放磁盘空间

### 洪水水位线（flood stage watermark）

- **默认值**: 95%
- **触发条件**: 磁盘使用率超过设定值
- **影响**:
  - 阻塞所有索引的写操作
  - 只允许删除索引的操作
  - 集群进入"洪水阶段"（flood stage）
- **用途**: 最后防线，防止磁盘完全填满导致节点崩溃

## 洪水阶段恢复

当节点进入洪水阶段后，需要手动解除阻塞：

```bash
# 方法 1: 清理磁盘空间后自动解除
# 删除一些数据或清理日志，使磁盘使用率降到洪水线以下

# 方法 2: 手动解除洪水阶段（谨慎使用）
PUT /_all/_settings
{
  "index.blocks.read_only_allow_delete": null
}
```

## 推荐设置建议

| 场景 | 低水位线 | 高水位线 | 洪水线 | 说明 |
|-----|---------|---------|--------|------|
| 默认/生产 | 85% | 90% | 95% | 通用推荐配置 |
| 小磁盘 | 75% | 80% | 90% | 更早触发保护 |
| 大容量 | 500GB | 300GB | 100GB | 使用绝对值 |
| 测试环境 | 90% | 95% | 98% | 更宽松的限制 |

## 单数据节点配置

对于单节点集群，默认不启用磁盘阈值：

```yaml
# 单节点集群默认配置
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: false
```

如需启用：

```yaml
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true
cluster.routing.allocation.disk.threshold_enabled: true
```

## 监控建议

```bash
# 查看各节点磁盘使用情况
GET /_cat/nodes?v&h=name,disk.used_percent,disk.used,disk.total

# 查看分配解释信息
GET /_cluster/allocation/explain?pretty

# 查看当前磁盘配置
GET /_cluster/settings?filter_path=*.cluster.routing.allocation.disk*
```

## 常见问题

**问题 1：分片未分配**

查看分配失败原因：

```bash
GET /_cluster/allocation/explain?pretty
```

可能显示：
```
"explanation" : "deciders" : [ {
  "decision" : "NO",
  "explanation" : "disk usage is above the high watermark"
} ]
```

**解决方案：**

1. **清理磁盘空间**
2. **调整水位线阈值**
```bash
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.high": "95%"
  }
}
```

**问题 2：索引写入被阻塞**

错误信息：
```
ClusterBlockException[index blocked by: [FORBIDDEN/5/index read-only / allow delete]];
```

**原因：** 磁盘使用率超过洪水线

**解决方案：**

1. **清理磁盘空间**
2. **检查磁盘使用**
```bash
GET /_cat/nodes?v&h=name,disk.used_percent
```
3. **解除阻塞**（在磁盘空间释放后）
```bash
PUT /_all/_settings
{
  "index.blocks.read_only_allow_delete": null
}
```

**问题 3：是否应该禁用磁盘阈值？**

**不建议禁用！** 除非有特殊情况：

- 已有完善的磁盘监控和告警
- 外部系统负责磁盘管理
- 临时调试（完成后立即重新启用）

## 性能影响分析

| 配置 | 优点 | 缺点 |
|-----|------|------|
| 启用（默认） | 自动保护，防止磁盘满 | 需要定期检查磁盘 |
| 禁用 | 无额外检查开销 | 可能导致磁盘填满 |

---

## 其他磁盘相关配置

## cluster.routing.allocation.disk.include_relocations

**配置项作用**：控制计算磁盘使用率时是否包含正在迁移的分片大小。

**默认值**：`true`

**配置类型**：动态配置

**配置格式**：
```yaml
# 默认配置：包含正在迁移的分片
cluster.routing.allocation.disk.include_relocations: true

# 排除正在迁移的分片
cluster.routing.allocation.disk.include_relocations: false
```

**说明**：
- `true`：计算磁盘使用时包含正在迁入和迁出的分片
- `false`：只计算已稳定存在的分片

**推荐设置**：
- 默认值 `true` 适用于大多数场景
- 如果分片迁移频繁导致磁盘水位误判，可设置为 `false`

---

## cluster.routing.allocation.disk.reroute_interval

**配置项作用**：控制磁盘使用率检查的间隔时间，用于触发基于磁盘水位的分片重分配。

**默认值**：`60s`（60秒）

**配置类型**：动态配置

**配置格式**：
```yaml
# 默认配置
cluster.routing.allocation.disk.reroute_interval: 60s

# 更频繁的检查
cluster.routing.allocation.disk.reroute_interval: 30s

# 较少频率的检查
cluster.routing.allocation.disk.reroute_interval: 120s
```

**推荐设置**：

| 场景 | 推荐值 | 说明 |
|------|--------|------|
| 默认 | 60s | 标准配置 |
| 快速响应 | 30s | 更快触发重分配 |
| 稳定环境 | 120s | 减少检查频率 |

---

## cluster.routing.allocation.disk.watermark.enable_for_single_data_node

**配置项作用**：控制单数据节点集群是否启用磁盘水位线检查。

**默认值**：`false`

**配置类型**：动态配置

**配置格式**：
```yaml
# 默认配置：单节点不启用
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: false

# 单节点也启用磁盘检查
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true
```

**说明**：
- 单节点集群通常无法迁移分片，因此默认不启用
- 如果希望单节点也有磁盘保护，可以手动启用

**使用场景**：
```yaml
# 单节点开发/测试环境
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
```

---

## 完整磁盘配置示例

## 生产环境多节点集群

```yaml
# easysearch.yml

cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
cluster.routing.allocation.disk.include_relocations: true
cluster.routing.allocation.disk.reroute_interval: 60s
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: false
```

## 大容量存储集群

```yaml
# 使用字节值而非百分比
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 1tb
cluster.routing.allocation.disk.watermark.high: 500gb
cluster.routing.allocation.disk.watermark.flood_stage: 200gb
cluster.routing.allocation.disk.include_relocations: true
cluster.routing.allocation.disk.reroute_interval: 120s
```

## 单节点开发环境

```yaml
# 单节点集群也启用磁盘保护
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
cluster.routing.allocation.disk.include_relocations: false
cluster.routing.allocation.disk.watermark.enable_for_single_data_node: true
```

---

## 注意事项

1. **动态更新**：磁盘阈值配置为动态配置，可在线修改
2. **阈值顺序**：必须满足 low < high < flood_stage
3. **混合格式**：三个水位线应使用相同格式（全部百分比或全部字节值）
4. **自动恢复**：洪水阶段在磁盘空间释放后需要手动解除
5. **监控告警**：建议配合外部监控系统对磁盘使用率进行告警
6. **单节点特殊**：单数据节点默认不启用，需手动配置
7. **重连间隔**：`reroute_interval` 不宜设置过短，避免频繁检查消耗资源
8. **包含迁移分片**：`include_relocations` 在大规模迁移时可能导致磁盘使用率计算偏高