慢任务日志阈值配置

配置项作用 #

cluster.service.slow_task_logging_threshold 配置项控制集群管理任务（如集群状态更新、分片分配等）执行时间的日志记录阈值。当任务执行时间超过此阈值时，会记录警告日志，便于识别性能瓶颈。

配置项类型 #

该配置项为动态配置，可以在运行时通过集群设置 API 进行修改。

默认值 #

30s（30秒）

是否必需 #

可选配置项（有默认值）

取值范围 #

> 0（正数）

配置格式 #

# 默认配置
cluster.service.slow_task_logging_threshold: 30s

# 更敏感的阈值
cluster.service.slow_task_logging_threshold: 10s

# 更宽松的阈值
cluster.service.slow_task_logging_threshold: 60s

工作原理 #

慢任务检测机制：

┌─────────────────────────────────────────────────────────────────┐
│                    慢任务检测流程                                │
└─────────────────────────────────────────────────────────────────┘

集群任务执行开始
    │
    ▼
记录开始时间
    │
    ▼
执行任务（如集群状态更新）
    │
    ▼
记录结束时间
    │
    ▼
计算执行时间
    │
    ├── 执行时间 >= threshold
    │   │
    │   └── 记录警告日志
    │       "task took [Xs] which is longer than the threshold [Ys]"
    │
    └── 执行时间 < threshold
        │
        └── 不记录额外日志

使用场景 #

1. 默认配置（推荐） #

cluster.service.slow_task_logging_threshold: 30s

适用于大多数集群配置。

2. 性能敏感环境 #

cluster.service.slow_task_logging_threshold: 10s

适用场景：

需要快速发现性能问题
低延迟要求的环境
集群规模较小

3. 大规模集群 #

cluster.service.slow_task_logging_threshold: 60s

适用场景：

节点数量多
状态更新频繁
可容忍较慢的任务执行

集群规模	推荐阈值	说明
小型 (< 10 节点)	10-20s	快速发现性能问题
中型 (10-50 节点)	30s	默认配置
大型 (> 50 节点)	60s	减少日志噪音

监控建议 #

# 查看当前配置
GET /_cluster/settings?filter_path=*.cluster.service.slow_task_logging_threshold

# 查看日志中的慢任务
grep "task took.*longer than threshold" logs/easysearch.log

常见问题 #

问题 1：日志中出现大量慢任务警告

可能原因：

阈值设置过低
集群负载过高
节点性能不足

解决方案：

# 增加阈值
PUT /_cluster/settings
{
  "transient": {
    "cluster.service.slow_task_logging_threshold": "60s"
  }
}

问题 2：未发现性能问题

可能原因： 阈值设置过高，慢任务未被记录