Rollover Trigger Configuration

Rollover trigger configuration decides the exact moment OpenSearch Index State Management (ISM) closes an active write index and bootstraps a fresh one behind the same write alias. Get the thresholds wrong and the failure modes are severe and specific: a min_size set too high grows single shards past the recovery-safe ceiling until a node restart takes hours to rejoin; a min_index_age set too low fragments a day of logs across dozens of tiny indices that waste heap on cluster state; and in a Cross-Cluster Replication (CCR) topology a rollover the follower has not yet caught up to leaves the write alias pointing at an index the follower cannot serve. This guide covers how the rollover action is evaluated on the background sweep, the tier the write index must live on, the exact template and policy payloads that make rollover deterministic, an opensearch-py deploy-and-verify workflow, and the failure modes you will actually hit in production. It builds on the ISM Policy Implementation & Python Automation execution model and is the first link in the Phase Transition Logic chain that carries an index from hot through to delete.

Tier alignment for rollover cadence

Rollover only ever happens on the hot tier — it is the action that ends an index’s hot life. That makes hot-node hardware the real constraint on how aggressively you can set thresholds: the node has to absorb full-rate ingest into the current write shard and handle the brief bootstrap of the next index without the indexing thread pool backing up. The table below maps the tiers a rolled index touches to their storage profile, compute ratio, routing attribute, and the role each plays in the rollover lifecycle. The node-role mechanics behind these attributes are covered under Node Role Allocation, and how the tier ratios are sized is the subject of Hot-Warm-Cold Tier Design.

Lifecycle role	Storage profile	vCPU : RAM ratio	Routing attribute	Rollover relevance
Hot (write index)	Local NVMe SSD	1 : 4 (compute-heavy)	`node.attr.data: hot`	Absorbs ingest; `rollover` evaluated here every sweep
Warm (post-roll)	SATA/SAS SSD	1 : 6	`node.attr.data: warm`	Receives the rolled, read-only index via `allocation`
Cold (long-term)	High-density HDD	1 : 8 (storage-heavy)	`node.attr.data: cold`	Age-based archive of rolled indices; watermark-sensitive
CCR follower (hot)	Mirrors leader hot	Matches leader	`node.attr.data: hot`	Must replicate the new write index before it serves reads

The cadence takeaway is that min_size and min_primary_shard_size must be sized against the hot node’s disk headroom, not against an abstract “50 GB shard” rule of thumb. A single node hosting several write shards can roll several indices in the same window, and each new shard lands on the same disk the outgoing one still occupies during relocation.

How the rollover action evaluates its conditions

The rollover action is declarative: you list one or more conditions, and ISM rolls the index when any condition is satisfied — the conditions are OR-ed, not AND-ed. Evaluation is not real-time. A background job scheduler polls index metadata on a fixed interval (plugins.index_state_management.job_interval, default 5 minutes), and the roll happens on the first sweep after a threshold is crossed, never at the instant it is crossed. That lag is the root of most “why did my index grow past min_size?” tickets — between two sweeps a hot index under heavy ingest can overshoot its size target substantially, which is why Threshold Tuning Strategies treat the threshold as a floor with headroom, not an exact ceiling.

The four conditions map to distinct operational goals. min_size targets total index size across all primaries and is the usual driver for recovery-safe sizing. min_primary_shard_size is the more precise control on multi-primary indices, because it caps the individual shard rather than the sum. min_index_age guarantees a predictable roll boundary (one index per day, for example) regardless of volume, which keeps retention math simple. min_doc_count is useful for uniform, small documents but is a poor control for binary- or blob-heavy logs where document size varies wildly. A production policy usually combines a size guard with an age guard so an index rolls on whichever comes first.

Step-by-step rollover configuration

Rollover has a hard prerequisite that trips up first-time deployments: the action operates on a write alias, not on a raw index name, and the alias must be bootstrapped with a numeric suffix before the policy can ever roll it. The four steps below stand up that alias, attach the policy through a template, and verify the trigger is live.

1. Node configuration

Confirm the hot nodes that will host the write index carry the routing attribute the write path targets. Rollover itself needs no special node setting, but the index it bootstraps must be able to allocate on hot hardware, and the fallback behaviour when hot capacity is short is governed by Fallback Routing Strategies.

YAML

# opensearch.yml on each hot data node
node.roles: [ data, ingest ]
node.attr.data: hot
# Tighten the ISM sweep so rollover latency is bounded (cluster-wide, dynamic):
# PUT _cluster/settings { "persistent": { "plugins.index_state_management.job_interval": 5 } }

2. Index template

The template wires three things together: the index pattern, the rollover_alias setting that tells ISM which alias to roll, and the ISM policy attachment. Bootstrap the very first index with is_write_index so the alias has a concrete target to roll from.

JSON

PUT _index_template/logs-rollover-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.number_of_replicas": 1,
      "plugins.index_state_management.rollover_alias": "logs-write"
    }
  },
  "priority": 100
}

JSON

# Bootstrap the first backing index and point the write alias at it.
PUT logs-000001
{
  "aliases": {
    "logs-write": { "is_write_index": true }
  }
}

The template priority must exceed any legacy template matching the same pattern (commonly > 50), or the older template silently wins and the rollover_alias setting never lands on new indices — the single most common reason rollover “does nothing”.

3. Policy JSON

The policy defines the hot state whose rollover action holds the trigger conditions, wraps it in a retry block so transient cluster pressure does not strand the action, and transitions the rolled index onward. Attach it to matching indices through the ism_template block so every future rollover-bootstrapped index inherits it automatically.

JSON

PUT _plugins/_ism/policies/log_rollover_policy
{
  "policy": {
    "description": "Production log rollover with CCR-safe transition",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "retry": { "count": 5, "backoff": "exponential", "delay": "2m" },
            "rollover": {
              "min_size": "50gb",
              "min_primary_shard_size": "25gb",
              "min_index_age": "1d",
              "min_doc_count": 50000000
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": { "min_rollover_age": "12h" }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [ { "replica_count": { "number_of_replicas": 1 } } ],
        "transitions": []
      }
    ],
    "ism_template": [
      { "index_patterns": ["logs-*"], "priority": 100 }
    ]
  }
}

Three rules make this deterministic rather than merely valid. The retry block must live inside the action scope, not at the policy root, so ISM honours exponential backoff on transient failures. min_rollover_age in the transition — distinct from min_index_age in the rollover action — measures time since the roll, which is what keeps a just-rolled index in warm-eligibility only after background replication has settled. And ism_template.priority here must clear the same legacy-template bar as the index template, or the policy never auto-attaches.

4. Verification

Confirm the policy attached, the alias is rolling, and the generation counter is advancing. The explain endpoint is the source of truth for what state each managed index is actually in.

Shell

# Is the policy managing the write alias's backing indices?
GET _plugins/_ism/explain/logs-*

# Has the alias advanced past 000001?
GET _cat/indices/logs-*?v&s=index

# Which index is the current write target?
GET logs-write/_alias

A healthy result shows the current write index in state hot with no failed_index_attempts, and a _cat/indices listing where the highest-numbered index is the write target. If the generation never advances past logs-000001, the trigger is not firing — jump to the troubleshooting section below.

Python automation for deploying and verifying rollover triggers

Manual PUT calls do not scale across environments and drift the instant someone edits a policy in Dashboards. Wrap the deploy in an idempotent, retry-aware client that asserts the policy, then polls explain to confirm the trigger is genuinely live rather than trusting the 200 on the write. The script below deploys the policy, waits for it to attach to the write alias, and reports the current generation — the same pattern the deeper Writing Python scripts for automated ISM rollover triggers walkthrough builds on, and which slots into the CI/CD structure covered under Python Orchestration Frameworks.

Python

import os
import time
import logging
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests.auth import HTTPBasicAuth

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger("ism.rollover")


def deploy_rollover_policy(client: OpenSearch, policy_id: str, payload: dict) -> bool:
    """Idempotent PUT of a rollover policy. Returns True on create or update."""
    try:
        resp = client.transport.perform_request(
            method="PUT",
            url=f"/_plugins/_ism/policies/{policy_id}",
            body=payload,
        )
        result = resp.get("result", "updated")  # existing policy -> version bump
        log.info("policy %s: %s", policy_id, result)
        return True
    except Exception as exc:  # noqa: BLE001 - surface any transport error to the caller
        raise RuntimeError(f"policy deployment failed for {policy_id}: {exc}") from exc


def verify_trigger_live(client: OpenSearch, alias: str, timeout: int = 300) -> bool:
    """Poll _ism/explain until the alias's backing indices are managed."""
    deadline = time.time() + timeout
    while time.time() < deadline:
        explain = client.transport.perform_request(
            method="GET", url=f"/_plugins/_ism/explain/{alias}-*"
        )
        managed = explain.get("total_managed_indices", 0)
        if managed > 0:
            log.info("trigger live: %d managed backing index/indices", managed)
            return True
        log.info("waiting for policy attachment ...")
        time.sleep(10)
    log.error("timeout: no managed indices for %s after %ss", alias, timeout)
    return False


client = OpenSearch(
    hosts=[{"host": os.getenv("OPENSEARCH_HOST", "localhost"), "port": 9200}],
    http_auth=HTTPBasicAuth(os.getenv("OS_USER"), os.getenv("OS_PASS")),
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

policy_payload = {
    "policy": {
        "description": "Automated log rollover",
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "retry": {"count": 5, "backoff": "exponential", "delay": "2m"},
                        "rollover": {"min_size": "50gb", "min_index_age": "1d"},
                    }
                ],
                "transitions": [
                    {"state_name": "warm", "conditions": {"min_rollover_age": "12h"}}
                ],
            },
            {"name": "warm", "actions": [], "transitions": []},
        ],
        "ism_template": [{"index_patterns": ["logs-*"], "priority": 100}],
    }
}

if deploy_rollover_policy(client, "log_rollover_policy", policy_payload):
    if verify_trigger_live(client, alias="logs"):
        log.info("rollover trigger active and managing the write alias")

Because both the deploy and the verify are idempotent, this script is safe to run on every pipeline execution: an unchanged policy bumps its sequence number without side effects, and the verify loop confirms the desired state converged rather than assuming it.

Operational guardrails

Rollover thresholds do not live in isolation — they interact with shard sizing, disk watermarks, and the retry envelope. Size the primary-shard target so a rolled shard stays inside the recovery-safe ceiling. With a target shard size $S_\text{target}$ , an ingest rate $R$ , and $N_p$ primaries, the wall-clock time between size-driven rolls is approximately:

t_\text{roll} = \frac{S_\text{target} \times N_p}{R}

Use that to sanity-check that your job_interval is short enough to catch the roll before the shard overshoots — if $t_\text{roll}$ is only a few multiples of the sweep interval, tighten the interval or lower the threshold. The settings below are the ones that keep rollover deterministic under load.

Setting	Recommended	Why it matters for rollover
`plugins.index_state_management.job_interval`	5m (tighten to 2m under heavy ingest)	Upper bound on how far an index overshoots its threshold before rolling
`rollover.min_primary_shard_size`	25–30 GB	Keeps individual shards inside the recovery-safe window on restart
`rollover.min_size`	40–50 GB	Total-index guard; must fit hot-node disk headroom during bootstrap
`retry.count` / `backoff` / `delay`	5 / exponential / 2m	Rides out transient thread-pool rejection without stranding the action
`cluster.routing.allocation.disk.watermark.low`	82%	Reserves headroom so a new write index can allocate on a busy hot node
`cluster.routing.allocation.disk.watermark.high`	88%	Blocks relocation onto a hot node already near capacity mid-roll

The watermark numbers are deliberately below the single-tier defaults (85% / 90%): a rollover briefly needs room for both the outgoing index and the newly bootstrapped one on the same hot disk, and the tighter watermarks reserve that headroom. The mechanics of how ISM waits on downstream replication before the rolled index advances are covered under Async Execution Patterns.

Troubleshooting rollover failures

Rollover failures are almost always configuration or timing problems, not bugs. The five below account for the overwhelming majority of production incidents; each pairs a diagnosis command with its fix.

1. The alias never rolls (stuck on logs-000001). The write index was not bootstrapped with a numeric suffix, so ISM has nothing to increment.

Shell

GET logs-write/_alias                       # diagnose: is is_write_index set on a *-000001 index?

Recreate the alias against a suffixed index — PUT logs-000001 with "aliases": {"logs-write": {"is_write_index": true}} — then reindex or repoint writes.

2. rollover_alias setting missing on new indices. A legacy index template outranks yours, so the ISM rollover_alias never lands.

Shell

GET logs-000002/_settings/plugins.index_state_management.rollover_alias   # diagnose: null?

Raise your template priority above the conflicting one (GET _index_template/ to find it), or delete the legacy template.

3. Index grows far past min_size before rolling. The sweep interval is too coarse for the ingest rate, so the index overshoots between polls.

Shell

GET _plugins/_ism/explain/logs-write         # diagnose: check last-evaluated timestamp vs now

Lower plugins.index_state_management.job_interval (for example to 2m) or reduce the size threshold so the roll fires earlier.

4. Rollover action stuck in a failed state. A transient thread-pool rejection or allocation stall left the action failed after exhausting retries.

Shell

GET _plugins/_ism/explain/logs-write         # diagnose: inspect failed_index_attempts + info

Clear it with POST _plugins/_ism/retry/logs-write; if it recurs, widen the retry block or resolve the underlying capacity pressure. Bounded recovery for this class is detailed under Error Handling & Retries.

5. CCR follower serving a stale write alias. The leader rolled and re-pointed the alias, but the follower has not replicated the new backing index.

Shell

GET _plugins/_replication/follower_stats     # diagnose: is the follower checkpoint lagging?

Do not force the follower forward — resolve the replication lag and let it catch up, then confirm the alias resolves to the same generation on both clusters.

Frequently asked questions

Are multiple rollover conditions AND-ed or OR-ed together?

They are OR-ed. ISM rolls the index as soon as any listed condition is met — the first of min_size, min_primary_shard_size, min_index_age, or min_doc_count to cross its threshold triggers the roll. Combine a size guard with an age guard so an index rolls on whichever comes first.

Why does my index roll well past the min_size I set?

Rollover is evaluated on the background sweep (job_interval, default 5 minutes), not in real time. Between two sweeps a high-throughput index keeps ingesting and overshoots the threshold. Treat min_size as a floor with headroom and tighten the sweep interval if the overshoot is unacceptable.

What is the difference between min_index_age and min_rollover_age?

min_index_age is a rollover condition measured from the index’s creation date — it decides when to roll. min_rollover_age is a transition condition measured from the moment the index was rolled — it decides how long the rolled index waits before advancing to the next state, which gives background replication time to settle.

Do I still need an index template if I use ism_template on the policy?

Yes. The ism_template block auto-attaches the policy, but the rollover_alias setting that tells ISM which alias to roll lives in the index template. Without the index template (and its priority above any legacy template), rollover has no alias to increment.

Phase Transition Logic — what happens to an index after it rolls out of the hot state.
Threshold Tuning Strategies — calibrating the size and age values these triggers fire on.
Async Execution Patterns — how the sweep dispatches rollover as a non-blocking job.
Error Handling & Retries — recovering a rollover action stuck in a failed state.
Writing Python scripts for automated ISM rollover triggers — the full deploy-and-verify script this page introduces.

Up: ISM Policy Implementation & Python Automation

Rollover Trigger Configuration

Tier alignment for rollover cadence #

How the rollover action evaluates its conditions #

Step-by-step rollover configuration #

1. Node configuration #

2. Index template #

3. Policy JSON #

4. Verification #

Python automation for deploying and verifying rollover triggers #

Operational guardrails #

Troubleshooting rollover failures #

Frequently asked questions #

Related #

Tier alignment for rollover cadence

How the rollover action evaluates its conditions

Step-by-step rollover configuration

1. Node configuration

2. Index template

3. Policy JSON

4. Verification

Python automation for deploying and verifying rollover triggers

Operational guardrails

Troubleshooting rollover failures

Frequently asked questions

Related