Configuring index size and age thresholds for rollover

This guide walks through setting min_size and min_index_age rollover conditions on an OpenSearch Index State Management (ISM) policy so a write index rolls over on the tightest of size or age, without premature rollovers, shard proliferation, or Cross-Cluster Replication (CCR) follower desynchronization.

Rollover threshold configuration is where ingestion velocity, shard topology, and scheduler latency collide. Set the numbers too high and a single primary shard balloons past the recommended ceiling, dragging query latency; set them too low and you flood OpenSearch’s cluster state with tiny indices. This procedure sits under Threshold Tuning Strategies and applies the broader ISM Policy Implementation & Python Automation execution model: it defines the size and age conditions, attaches them through a write alias, calibrates for the background job interval, deploys idempotently from Python, and verifies the rollover actually fired. The mechanics of when the engine chooses to roll over build directly on Rollover Trigger Configuration.

Prerequisites

Confirm every item before you attach a policy to a live write index. A rollover misconfiguration is silent until the first threshold is crossed, then it stalls under load.

A write alias (e.g. logs-app) points to exactly one backing index whose name ends in a rollover-compatible numeric suffix (logs-app-000001), with is_write_index: true.
An Index Template v2 owns the creation-time shard count and tier baseline, so each new backing index inherits number_of_shards and index.routing.allocation.require.data, per Node Role Allocation.
You have sized the target shard against the Hot-Warm-Cold Tier Design ceiling — keep a primary shard at or under 50 GB so min_size divided by number_of_shards stays in range.
The automation service account holds fine-grained access to _plugins/_ism/*, _aliases, and _settings, scoped per Security & Access Boundaries.
You know the effective plugins.index_state_management.job_interval (default 5m), because it sets the overshoot window every threshold must budget for.

Step-by-step procedure

The five steps below stand up size-and-age rollover for a logs-app-* index set. Apply them in order: understand the OR semantics, author the policy, attach it through the write alias, calibrate the numbers against the scheduler, then force a rollover when ingestion outruns the poll cycle.

1. Understand OR evaluation before choosing numbers

The rollover action evaluates its conditions with logical OR: the index rolls over the moment any configured condition — min_size, min_index_age, min_doc_count, or min_primary_shard_size — is satisfied. min_size measures total primary-shard storage and excludes replicas; min_primary_shard_size measures the largest single primary. Because the tightest condition always wins, an aggressive min_index_age fires regardless of how small the index is, so size and age must be chosen together, not in isolation. Getting this wrong is the most common way indices stall or churn, and it feeds directly into Phase Transition Logic downstream.

Use whole-number unit values with standard suffixes (gb, tb, h, d, m) for predictable evaluation. A useful sanity check: the per-shard size at rollover is the size condition divided by the primary shard count.

\text{shard}_\text{roll} = \frac{\texttt{min\_size}}{\texttt{number\_of\_shards}} \le 50\,\text{GB}

For a 3-shard index, a min_size of 45gb yields ~15 GB primaries — comfortably in range. Gotcha: if you raise number_of_shards in the template without revisiting min_size, each rollover produces more, smaller shards and inflates cluster-state overhead.

2. Author the policy with size and age conditions

Define both conditions on the rollover action inside the hot state. The ism_template block auto-attaches the policy to matching new indices so you do not hand-attach every day:

HTTP

PUT _plugins/_ism/policies/log-rollover-policy
{
  "policy": {
    "description": "Deterministic size and age rollover for log indices",
    "default_state": "hot",
    "ism_template": [
      { "index_patterns": ["logs-app-*"], "priority": 100 }
    ],
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "45gb",
              "min_index_age": "24h"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": { "min_rollover_age": "0h" }
          }
        ]
      }
    ]
  }
}

Gotcha: the warm transition keys off min_rollover_age, not min_index_age — after a rollover the age clock for the old backing index resets relative to its rollover time, so using min_index_age here would measure from original creation and advance the index prematurely.

3. Attach the policy through a single write alias

Rollover requires exactly one write target. Attach the policy and confirm the alias resolves to a single index:

Shell

curl -s -X POST "https://<cluster-endpoint>:9200/_plugins/_ism/add/logs-app-000001" \
  -H "Content-Type: application/json" \
  -d '{"policy_id": "log-rollover-policy"}'

Then inspect the alias:

Shell

curl -s "https://<cluster-endpoint>:9200/_alias/logs-app?pretty"

Expected output — one backing index flagged as the write index:

JSON

{
  "logs-app-000001": {
    "aliases": {
      "logs-app": { "is_write_index": true }
    }
  }
}

Gotcha: if the alias resolves to multiple indices with no single is_write_index: true, ISM rejects the rollover with 400 Bad Request on ambiguous write targets. Set the flag explicitly with POST _aliases before the first threshold is reached.

4. Calibrate thresholds against scheduler latency

ISM evaluates thresholds asynchronously on the background job scheduler. With job_interval at 5m, an index can exceed a configured limit by nearly a full interval before the rollover fires — the overshoot window. Budget for it by setting the size condition below the hard per-node capacity, leaving headroom for the volume ingested during one poll cycle:

\texttt{min\_size} \le C_\text{hard} - \left( R_\text{ingest} \times t_\text{interval} \right)

where $C_\text{hard}$ is the hard storage ceiling, $R_\text{ingest}$ is peak ingest rate, and $t_\text{interval}$ is job_interval. In practice, set size and age thresholds 10–15% below hard limits to absorb ingestion spikes inside the evaluation window. Tightening job_interval shrinks the overshoot but increases scheduler load across all managed indices:

Shell

curl -s -X PUT "https://<cluster-endpoint>:9200/_cluster/settings" \
  -H "Content-Type: application/json" \
  -d '{"persistent": {"plugins.index_state_management.job_interval": "2m"}}'

Gotcha: keep cluster.routing.allocation.disk.watermark.low at least 15% above your min_size per shard, or a rollover can mint a new index onto a node already breaching the watermark, leaving its shards UNASSIGNED — the failure that Fallback Routing Strategies exist to absorb.

5. Deploy and force rollover from Python

Production deployments version the policy in git and apply it idempotently. opensearch-py has no .ism namespace, so ISM is driven through transport.perform_request. This helper creates or updates the policy, attaches it, and — when ingestion outruns the scheduler — forces an immediate rollover against the write alias:

Python

import time
import logging
from opensearchpy import OpenSearch, exceptions

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def deploy_rollover_policy(hosts, policy_id, policy_body, write_alias, force=False):
    client = OpenSearch(hosts=hosts, use_ssl=True, verify_certs=True)
    policy_url = f"/_plugins/_ism/policies/{policy_id}"

    # Idempotent create/update (updates require if_seq_no + if_primary_term).
    try:
        client.transport.perform_request("PUT", policy_url, body=policy_body)
        logger.info("Policy %s created.", policy_id)
    except exceptions.RequestError as e:
        if "version_conflict" in str(e) or "already exists" in str(e):
            existing = client.transport.perform_request("GET", policy_url)
            client.transport.perform_request(
                "PUT", policy_url,
                params={"if_seq_no": existing["_seq_no"],
                        "if_primary_term": existing["_primary_term"]},
                body=policy_body,
            )
            logger.info("Policy %s updated in place.", policy_id)
        else:
            raise

    # Force immediate rollover past the scheduler when ingestion spikes.
    if force:
        result = client.transport.perform_request(
            "POST", f"/{write_alias}/_rollover"
        )
        logger.info("Forced rollover: rolled_over=%s new_index=%s",
                    result.get("rolled_over"), result.get("new_index"))
    return True

Run it in CI/CD validation or as a scheduled job. The manual _rollover call targets the write alias, not a backing index, and bypasses the poll cycle entirely. Gotcha: a forced rollover still honours no conditions by default — pass a body with conditions if you want the API to no-op unless the thresholds are already met, so an automated retry does not create empty indices.

Verification

After deploying or forcing a rollover, confirm the write alias advanced and the ISM state is healthy.

Confirm the rollover minted a new backing index:

Shell

curl -s "https://<cluster-endpoint>:9200/_cat/indices/logs-app-*?v&s=index&h=index,docs.count,store.size,pri.store.size"

A healthy result shows the previous index sealed and logs-app-000002 receiving writes. Cross-check that pri.store.size on the sealed index is at or just above your min_size — a large overshoot points to scheduler latency you should tighten in Step 4.

Confirm the managed-index state:

Shell

curl -s "https://<cluster-endpoint>:9200/_plugins/_ism/explain/logs-app-000001?pretty"

Inspect action.name: rollover and step.status. A completed status with a populated rolled_over: true confirms success; a failed status carries the blocking reason in the info message.

Confirm the alias points at the new write index:

Shell

curl -s "https://<cluster-endpoint>:9200/_alias/logs-app?pretty"

Exactly one backing index must carry is_write_index: true, and it must be the newest suffix.

Common failures

Symptom	Root cause	Fix command
Rollover rejected with `400 Bad Request`	Alias resolves to multiple indices, no single write index	`POST _aliases` with `"is_write_index": true` on the newest backing index
Index far exceeds `min_size` before rolling	`job_interval` overshoot under high ingest	`PUT _cluster/settings` lowering `plugins.index_state_management.job_interval` and drop `min_size` 10–15%
ISM stuck, `step.status: failed` on rollover	Target node above `watermark.high`; new index cannot allocate	`GET _cat/allocation?v&s=disk.percent:desc` then scale the tier, then `POST _plugins/_ism/retry/<index>`
Index rolls over almost immediately, repeatedly	`min_index_age` set far below `min_size` reach under OR semantics	Raise `min_index_age` or remove it so `min_size` governs
CCR follower stalls after leader rollover	Follower shard topology diverges from leader	`GET _plugins/_replication/<follower_index>/_status` then pause and resume replication

CCR followers do not execute ISM policies — rollover runs only on the leader, and the follower’s auto-follow pattern picks up the new backing index. If a follower stalls, verify the leader’s write alias points to exactly one index, then resync:

Shell

curl -s -X POST "https://<follower-endpoint>:9200/_plugins/_replication/logs-app-000001-follower/_pause" -H "Content-Type: application/json" -d '{}'
curl -s -X POST "https://<follower-endpoint>:9200/_plugins/_replication/logs-app-000001-follower/_resume" -H "Content-Type: application/json" -d '{}'

Frequently asked questions

Does min_size count replica shards?

No. min_size measures total primary-shard storage only. A 3-shard index with one replica occupies roughly double min_size on disk, so plan node capacity against the replicated footprint even though the rollover condition ignores replicas. Use min_primary_shard_size when you care about the largest single primary rather than the aggregate.

What happens if I set both min_size and min_index_age?

They evaluate with OR — whichever is satisfied first triggers the rollover. Size-based rollover keeps shards bounded during traffic spikes; age-based rollover guarantees a predictable index-per-day cadence during quiet periods. Setting both gives you a size ceiling with a time floor, which is the common production pattern.

Why did my index roll over 5 minutes late?

ISM evaluates conditions on the background scheduler, not continuously. With job_interval at 5m, the index can exceed the threshold by nearly a full interval before the next poll fires the rollover. Lower job_interval to shrink the window, or force an immediate rollover with POST /<write_alias>/_rollover when ingestion outruns the cycle.

Rollover Trigger Configuration — the full set of rollover conditions and when the engine chooses to fire each.
Phase Transition Logic — how min_rollover_age and transition conditions advance an index after it rolls.
Error Handling & Retries — retry blocks and backoff for rollovers that fail against a full or blocked tier.

Up one level: Threshold Tuning Strategies · Automation home: ISM Policy Implementation & Python Automation

Configuring index size and age thresholds for rollover

Prerequisites #

Step-by-step procedure #

1. Understand OR evaluation before choosing numbers #

2. Author the policy with size and age conditions #

3. Attach the policy through a single write alias #

4. Calibrate thresholds against scheduler latency #

5. Deploy and force rollover from Python #

Verification #

Common failures #

Frequently asked questions #

Related guides #

Prerequisites

Step-by-step procedure

1. Understand OR evaluation before choosing numbers

2. Author the policy with size and age conditions

3. Attach the policy through a single write alias

4. Calibrate thresholds against scheduler latency

5. Deploy and force rollover from Python

Verification

Common failures

Frequently asked questions

Related guides