Rollover Trigger Configuration

Rollover Trigger Configuration dictates the precise conditions under which OpenSearch Index State Management (ISM) transitions an active write index to a read-only or archived state. In high-throughput log pipelines and distributed search architectures, poorly calibrated triggers cause unbounded shard growth, replication lag, or premature index fragmentation. Effective deployment requires aligning trigger thresholds with underlying storage capacity, shard allocation limits, and Cross-Cluster Replication (CCR) synchronization windows. This guide details the exact API payloads, threshold calibration matrices, and automation patterns required to enforce deterministic rollover behavior. Teams standardizing policy deployment pipelines should integrate these patterns into their broader ISM Policy Implementation & Python Automation workflows.

flowchart TD
    A["Active write index"] --> B{"Any rollover condition met?"}
    B -- "min_size / min_index_age / min_doc_count / min_primary_shard_size" --> R["Roll over to new write index"]
    B -- "none met" --> W["Keep writing; re-check next job cycle"]
    R --> T["Transition rolled index to next phase"]

API Payload Structure for Deterministic Triggers

OpenSearch ISM evaluates rollover conditions through a declarative JSON policy submitted via the _plugins/_ism/policies/ endpoint. The payload must explicitly define state transitions, retry logic, and CCR-safe parameters. A production-grade configuration avoids implicit defaults and enforces strict boundaries to prevent race conditions during index lifecycle shifts.

JSON
PUT _plugins/_ism/policies/log_rollover_policy
{
  "policy": {
    "description": "Production log rollover with CCR alignment",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "retry": {
              "count": 5,
              "backoff": "exponential",
              "delay": "2m"
            },
            "rollover": {
              "min_size": "50gb",
              "min_index_age": "1d",
              "min_doc_count": 50000000
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_rollover_age": "12h"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": { "number_of_replicas": 1 }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": ["logs-*"],
      "priority": 100
    }
  }
}

Critical implementation rules:

  • min_rollover_age in the transition block prevents premature state shifts before background replication tasks finalize.
  • The retry block must reside inside the action scope, not at the policy root, ensuring ISM respects exponential backoff during transient cluster pressure.
  • ism_template.priority must exceed legacy index template priorities (typically > 50) to guarantee attachment during dynamic index creation.

Threshold Calibration Matrix

Trigger thresholds require continuous calibration against shard sizing, JVM heap pressure, and replication throughput. The following matrix outlines operational boundaries for high-throughput ingestion clusters:

Metric Conservative (Low Risk) Aggressive (High Throughput) CCR Consideration
min_size 30 GB 50–75 GB Must not exceed follower node storage headroom
min_index_age 6h 12–24h Align with CCR checkpoint intervals
min_doc_count 25M 50M+ Irrelevant for binary-heavy logs
min_rollover_age 1h 4–6h Prevents warm-state transition mid-sync

Calibrating these values requires understanding how Phase Transition Logic evaluates cluster health before executing state changes. Overly aggressive size thresholds on clusters with limited JVM heap will trigger garbage collection storms, while excessively low document counts fragment search performance and increase query latency across distributed nodes.

Deterministic Execution & Async Handling

ISM does not evaluate triggers synchronously. The ISM background job polls index metadata at a configurable interval (plugins.index_state_management.job_interval, default: 5m). When a condition is met, the rollover action enters an asynchronous execution queue. In CCR environments, this introduces a critical synchronization dependency: the follower cluster must complete its replication checkpoint before the leader transitions the index to a read-only state, or risk data divergence.

To manage this, engineers should leverage Async Execution Patterns that decouple trigger evaluation from downstream orchestration. Implementing explicit polling against the _plugins/_ism/explain/ endpoint allows automation frameworks to verify deterministic state confirmation rather than relying on immediate HTTP response codes. This approach is essential when coordinating policy updates across geographically distributed data centers.

Python Automation Integration

Manual policy deployment does not scale across multi-cluster environments. Python automation builders should wrap ISM API interactions in idempotent, retry-aware clients. The following production-ready script demonstrates how to deploy a rollover policy, validate attachment, and monitor trigger execution using the official opensearch-py client:

Python
import os
import time
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests.auth import HTTPBasicAuth

def deploy_rollover_policy(client, policy_id, payload):
    """Idempotent policy deployment with validation."""
    try:
        response = client.transport.perform_request(
            method="PUT",
            url=f"/_plugins/_ism/policies/{policy_id}",
            body=payload
        )
        return response.get("result") in ("created", "updated")
    except Exception as e:
        raise RuntimeError(f"Policy deployment failed: {e}")

def verify_trigger_execution(client, index_pattern, timeout=300):
    """Polls ISM explain endpoint until rollover condition is met."""
    start = time.time()
    while time.time() - start < timeout:
        explain = client.transport.perform_request(
            method="GET",
            url=f"/_plugins/_ism/explain/{index_pattern}"
        )
        indices = explain.get("total_managed_indices", 0)
        if indices > 0:
            return True
        time.sleep(10)
    return False

# Production usage
client = OpenSearch(
    hosts=[{"host": os.getenv("OPENSEARCH_HOST", "localhost"), "port": 9200}],
    http_auth=HTTPBasicAuth(os.getenv("OS_USER"), os.getenv("OS_PASS")),
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

policy_payload = {
    "policy": {
        "description": "Automated log rollover",
        "default_state": "hot",
        "states": [{
            "name": "hot",
            "actions": [{"rollover": {"min_size": "50gb", "min_index_age": "1d"}}],
            "transitions": [{"state_name": "warm", "conditions": {"min_rollover_age": "12h"}}]
        }, {"name": "warm", "actions": [], "transitions": []}],
        "ism_template": {"index_patterns": ["logs-*"], "priority": 100}
    }
}

if deploy_rollover_policy(client, "log_rollover_policy", policy_payload):
    print("Policy deployed successfully. Monitoring trigger execution...")
    if verify_trigger_execution(client, "logs-*"):
        print("Rollover trigger active and managing indices.")

For teams building continuous deployment pipelines, this approach integrates seamlessly with Writing Python scripts for automated ISM rollover triggers to enforce version-controlled policy rollouts and automated drift detection.

Operational Validation & Troubleshooting

Before promoting a Rollover Trigger Configuration to production, validate the following:

  1. Shard Count Alignment: Ensure min_size does not force indices to exceed the recommended 50 GB per shard limit. Oversized shards degrade recovery times and increase CCR replication latency.
  2. CCR Checkpoint Sync: Monitor _plugins/_replication/follower_stats to confirm follower nodes are not lagging behind leader rollover events.
  3. Stuck State Resolution: If an index fails to transition, query GET _plugins/_ism/explain/<index> and inspect the failed_index_attempts field. Clear transient failures by resetting the policy state via POST _plugins/_ism/retry/<index>.
  4. Template Priority Conflicts: Verify priority values using GET _index_template/ to prevent legacy templates from overriding ISM policy attachment.

Reference the official OpenSearch ISM API documentation for endpoint specifications and Cross-Cluster Replication architecture guidelines for synchronization best practices. Properly configured triggers eliminate manual index management overhead while maintaining predictable storage and query performance across distributed clusters.