Async Execution Patterns in OpenSearch ISM and Cross-Cluster Replication

OpenSearch never applies an Index State Management (ISM) transition the instant its condition becomes true. Instead a background job scheduler polls index metadata on a fixed interval, evaluates trigger conditions, and dispatches state transitions and Cross-Cluster Replication (CCR) shard transfers as non-blocking jobs. That decoupling is what keeps coordination threads free under heavy ingest, but it also makes every lifecycle action eventually consistent: a rollover, a tier move, or a follower catch-up completes some time after the condition is met, not at the moment of it. When automation treats these actions as synchronous — attaching a policy and immediately assuming the index has rolled, or transitioning to cold before CCR has drained — you get orphaned write aliases, split-brain followers, and indices parked mid-transition. This guide covers the scheduler mechanics, the exact policy and cluster settings that make async execution deterministic, and an opensearch-py orchestration layer that waits for completion instead of guessing. It builds directly on the ISM Policy Implementation & Python Automation execution model, and pairs closely with Rollover Trigger Configuration and Phase Transition Logic.

Async workload profile per tier

The scheduler does not impose the same load on every tier. Hot indices are polled and acted on constantly (rollover, replica changes), while cold and frozen indices sit idle for long stretches and then run one heavy, latency-tolerant job. Aligning hardware to that async profile prevents the management thread pool on a dense, storage-optimized node from being starved by a burst of concurrent relocations. The node-role mechanics behind this table are covered under Node Role Allocation, and the tier economics under Hot-Warm-Cold Tier Design.

Tier	Storage profile	vCPU : RAM ratio	Routing attribute	Async job profile
Hot	Local NVMe SSD	1 : 4 (compute-heavy)	`node.attr.data: hot`	High-frequency: rollover, replica_count, CCR follower writes
Warm	SATA/SAS SSD	1 : 6	`node.attr.data: warm`	Bursty: allocation + force_merge relocations
Cold	High-density HDD	1 : 8 (storage-heavy)	`node.attr.data: cold`	Rare, heavy: snapshot + shrink, long-running jobs
Frozen	Object storage / searchable snapshots	1 : 8 (minimal compute)	`node.attr.data: frozen`	Sporadic: searchable-snapshot mounts

The practical rule is that job concurrency must be bounded per node so a migration wave cannot exhaust the management pool. A node running above its concurrent-recovery ceiling queues incoming ISM jobs; a node whose management queue is saturated rejects them, and a rejected job is an ISM action that silently retries on the next cycle rather than executing now.

Job scheduler architecture and polling mechanics

OpenSearch delegates ISM and CCR execution to the opensearch-job-scheduler plugin. The scheduler operates on a configurable interval (plugins.index_state_management.job_interval, default 5 minutes), scanning registered policies against index metadata. When a condition is satisfied, it enqueues a non-blocking task in the management thread pool. Because evaluation and execution are decoupled, there is inherent latency between trigger satisfaction and the actual state transition — the same eventual-consistency property that Threshold Tuning Strategies must account for when sizing rollover boundaries.

The core execution loop follows a deterministic sequence:

The scheduler polls indices matching policy templates or index patterns.
Trigger conditions (size, age, document count, replication lag) are evaluated against current shard metadata.
If thresholds are met, the scheduler queues an async job for the target phase.
The job executes, updates index settings, and advances the state machine — or exhausts its retry budget and parks the index in a failed action.

Operators must budget for polling latency when designing orchestration pipelines. The scheduler exposes thread-pool metrics and job-queue depth via _nodes/stats/thread_pool/management. Monitoring the queue and rejected counters is essential for detecting backpressure during high-volume rollover windows. For deeper architectural context, refer to the official OpenSearch Index State Management documentation.

How an async action advances

An ISM action is itself a small state machine layered on top of the lifecycle state machine. Each action starts pending, becomes running when the scheduler dispatches it, and resolves to completed or, on error, retrying until its retry budget is exhausted and it becomes failed. Understanding these substates is what lets automation distinguish “still working” from “stuck” — a distinction detailed further in Error Handling & Retries.

Step-by-step configuration

The four steps below make async execution deterministic for a logs-* index set that is also a CCR leader. Apply them in order: scheduler settings on the OpenSearch cluster, a template so new indices are managed from birth, a policy whose conditions leave room for async completion, then a verification pass against the explain API.

1. Cluster and scheduler configuration

Tune the poll interval and management-pool bounds so job dispatch is frequent enough to be responsive but not so aggressive that it saturates the pool. Set these dynamically via _cluster/settings.

JSON

PUT _cluster/settings
{
  "persistent": {
    "plugins.index_state_management.job_interval": 5,          // minutes between policy evaluations
    "plugins.index_state_management.jitter": 0.6,              // spread jobs so cluster-wide evals don't align
    "plugins.index_state_management.coordinator.sweep_period": "10m",
    "cluster.routing.allocation.node_concurrent_recoveries": 2 // cap parallel relocations per node
  }
}

A lower job_interval shortens the latency between a condition being met and the action firing, at the cost of more frequent metadata scans. jitter staggers evaluations so a fleet of indices does not enqueue their jobs in the same instant and overrun the management queue.

2. Index template

Bind the policy to new indices through an ISM template so the write index is managed the moment it is created — never attach policies to hot write indices by hand, which races the first rollover.

JSON

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.number_of_replicas": 1,
      "index.plugins.index_state_management.rollover_alias": "logs"  // async rollover targets this alias
    }
  }
}

3. Policy JSON

The policy below leaves explicit headroom for async completion: every state carries a retry block, and each min_index_age transition exceeds the worst-case CCR lag plus one poll interval so a transition never fires while a follower is still draining.

JSON

PUT _plugins/_ism/policies/log_data_lifecycle
{
  "policy": {
    "description": "Async-safe ISM policy with CCR-aware transition guards",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "retry": { "count": 5, "backoff": "exponential", "delay": "2m" },
            "rollover": {
              "min_index_age": "1d",
              "min_primary_shard_size": "50gb",
              "min_doc_count": 50000000
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": { "min_index_age": "2d" }   // > max CCR lag + job_interval
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          { "retry": { "count": 3, "backoff": "exponential", "delay": "5m" },
            "replica_count": { "number_of_replicas": 1 } },
          { "force_merge": { "max_num_segments": 1 } }   // long-running; runs async, poll for completion
        ],
        "transitions": [
          { "state_name": "cold", "conditions": { "min_index_age": "7d" } }
        ]
      },
      {
        "name": "cold",
        "actions": [
          { "replica_count": { "number_of_replicas": 0 } },
          { "shrink": { "num_new_shards": 1 } }
        ],
        "transitions": []
      }
    ],
    "ism_template": [
      { "index_patterns": ["logs-*"], "priority": 100 }
    ]
  }
}

The safe lower bound on any transition condition is a direct function of the async timing you just configured:

\text{min\_index\_age} \ge L_{\text{ccr}}^{\max} + t_{\text{interval}} + t_{\text{action}}

where $L_{\text{ccr}}^{\max}$ is the maximum observed follower replication lag, $t_{\text{interval}}$ is job_interval, and $t_{\text{action}}$ is the typical action runtime (a force_merge can dominate this term). With a 15-minute CCR lag and a 5-minute interval, a transition of 25m or more prevents premature handoffs that would interrupt shard synchronization.

4. Verification

Confirm the policy is attached and read the action substate — not just the state name — from the explain API. A managed index in state hot with action: rollover and a non-null retry_info is retrying, not idle.

Shell

# Is the policy managing the index, and what is the current action doing?
curl -s "https://<cluster>:9200/_plugins/_ism/explain/logs-000001?pretty"

# CCR follower lag on the leader index — must be below the transition margin
curl -s "https://<cluster>:9200/_plugins/_replication/logs-000001/_status?pretty"

# Management pool health — queue climbing or rejected > 0 means backpressure
curl -s "https://<cluster>:9200/_nodes/stats/thread_pool/management?filter_path=**.management"

Production automation for state tracking

External automation must poll for job completion rather than assuming synchronous execution. The opensearch-py orchestrator below queries _plugins/_ism/explain to track state and action progression, treats a retrying action as in-flight rather than failed, and surfaces a clear timeout so a downstream provisioning step never fires against a half-transitioned index. It complements the reusable clients in Python Orchestration Frameworks.

Python

import logging
import time
from dataclasses import dataclass
from typing import Optional

from opensearchpy import OpenSearch, ConnectionError, TransportError

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
log = logging.getLogger("ism.async")


@dataclass(frozen=True)
class ActionStatus:
    state: Optional[str]
    action: Optional[str]
    retry_count: Optional[int]
    failed: bool


class ISMStateTracker:
    """Track async ISM progression for a single managed index."""

    def __init__(self, client: OpenSearch) -> None:
        self._client = client

    def explain(self, index: str) -> ActionStatus:
        """Read the current state + action substate from the explain API."""
        resp = self._client.transport.perform_request(
            "GET", f"/_plugins/_ism/explain/{index}"
        )
        meta = resp.get(index, {}) or {}
        return ActionStatus(
            state=(meta.get("state") or {}).get("name"),
            action=(meta.get("action") or {}).get("name"),
            retry_count=(meta.get("retry_info") or {}).get("consumed_retries"),
            failed=bool((meta.get("action") or {}).get("failed", False)),
        )

    def wait_for_state(
        self,
        index: str,
        target_state: str,
        timeout_s: int = 1800,
        poll_interval_s: int = 30,
    ) -> bool:
        """Poll until the index reaches target_state, fails, or times out."""
        deadline = time.monotonic() + timeout_s
        while time.monotonic() < deadline:
            try:
                status = self.explain(index)
            except (ConnectionError, TransportError) as exc:
                # Transient cluster error: log and retry within the same budget.
                log.warning("explain failed for %s: %s", index, exc)
                time.sleep(poll_interval_s)
                continue

            if status.failed:
                log.error(
                    "index %s parked in failed action '%s' (state %s)",
                    index, status.action, status.state,
                )
                return False

            if status.state == target_state:
                log.info("index %s reached state '%s'", index, target_state)
                return True

            log.info(
                "index %s in state '%s' action '%s' retries=%s; waiting for '%s'",
                index, status.state, status.action,
                status.retry_count, target_state,
            )
            time.sleep(poll_interval_s)

        log.warning("timeout: %s did not reach '%s' in %ss", index, target_state, timeout_s)
        return False


if __name__ == "__main__":
    client = OpenSearch(
        hosts=[{"host": "opensearch-cluster.internal", "port": 9200}],
        http_compress=True,
        use_ssl=True,
        verify_certs=True,
        max_retries=5,
        retry_on_timeout=True,
    )
    tracker = ISMStateTracker(client)
    ok = tracker.wait_for_state("logs-2026.07.01", target_state="warm")
    raise SystemExit(0 if ok else 1)

Because the tracker reads the failed flag and consumed_retries, a CI/CD job can distinguish an action that is still retrying from one that has genuinely given up, and only trigger post-transition work — snapshot registration, alert routing, storage provisioning — once the target state is confirmed.

Operational guardrails

Async execution is bounded by four levers: how often the scheduler runs, how many jobs a node will run at once, how retries are shaped, and whether disk watermarks will even let a relocation land. Tune them together — a tight job_interval is wasted if the management pool is already saturated.

Setting	Recommended value	Effect on async execution
`plugins.index_state_management.job_interval`	`5m` (`1m` for aggressive rollover)	Latency between condition met and action dispatch
`plugins.index_state_management.jitter`	`0.6`	Spreads evaluations to avoid synchronized job bursts
`cluster.routing.allocation.node_concurrent_recoveries`	`2` (HDD) / `3` (NVMe)	Caps concurrent relocations before jobs queue
ISM action `retry.count` / `backoff`	`3–5` / `exponential`	Bounds retrying before an action is marked failed
`cluster.routing.allocation.disk.watermark.high`	`90%`	A tier above this rejects incoming relocation jobs
`thread_pool.management.queue_size`	`default` (watch `rejected`)	A saturated queue silently defers ISM jobs a cycle

Watermarks matter here because the allocation deciders gate every relocation an async action triggers; a cold tier over its high watermark turns an otherwise-valid transition into an endlessly retrying job. The graceful-degradation options when a target tier has no eligible node are covered in Fallback Routing Strategies, and the roles allowed to run these _plugins/_ism/* actions in Security & Access Boundaries.

Troubleshooting

Transition never fires despite the condition being met. The scheduler is not evaluating the index — usually a detached policy or a stalled coordinator sweep. Confirm management, then force a sweep by re-reading explain:

Shell

curl -s "https://<cluster>:9200/_plugins/_ism/explain/logs-000001?pretty"
# Fix: if "policy_id" is null, (re)attach:
curl -s -X POST "https://<cluster>:9200/_plugins/_ism/add/logs-000001" \
  -H 'Content-Type: application/json' -d '{"policy_id":"log_data_lifecycle"}'

Action stuck in retrying and never completes. The action hits a persistent error (unreachable snapshot repo, watermark block) and consumes retries each cycle. Read the failure reason, fix the cause, then retry the managed index:

Shell

curl -s "https://<cluster>:9200/_plugins/_ism/explain/logs-000001?pretty" | grep -i info
curl -s -X POST "https://<cluster>:9200/_plugins/_ism/retry/logs-000001"
# Fix: clear the underlying cause (repo/watermark), then the retry above resumes the action

Management thread pool rejecting jobs. A migration wave saturated the queue, so ISM jobs are deferred a cycle and progress appears frozen cluster-wide. Check rejected, then lower concurrency:

Shell

curl -s "https://<cluster>:9200/_nodes/stats/thread_pool/management?filter_path=**.management"
# Fix: reduce node_concurrent_recoveries, raise jitter, or stagger policy attachment

Transition fired before CCR drained, leaving an orphaned follower. The min_index_age margin was smaller than the follower lag, so the leader moved tier while the follower was still catching up. Inspect lag, then widen the transition guard:

Shell

curl -s "https://<cluster>:9200/_plugins/_replication/logs-000001/_status?pretty"
# Fix: raise min_index_age above max observed lag + job_interval (see the formula above)

Rollover alias missing after an index-template edit. A template change dropped rollover_alias, so the async rollover action fails to resolve its write target. Verify the setting, then restore it:

Shell

curl -s "https://<cluster>:9200/logs-000001/_settings?filter_path=**.rollover_alias"
# Fix: PUT the rollover_alias setting back, or roll the alias forward manually once

Deeper recovery playbooks — dead-letter tracking, circuit breakers that pause evaluation during yellow/red states, and automated rollback — are covered in Handling async ISM policy execution failures.

Frequently asked questions

Why does my ISM action take minutes to run after the condition is met?

Because ISM is poll-based, not event-driven. The scheduler only re-evaluates each index every job_interval (default 5 minutes), so worst-case dispatch latency is roughly one full interval plus any queue wait in the management pool. Lower job_interval for faster reaction, but keep an eye on the management queue counter so more frequent scans do not cause rejections.

How do I tell "still retrying" apart from "genuinely stuck"?

Read the action substate from _plugins/_ism/explain, not just the state name. A non-null retry_info.consumed_retries below the action’s retry.count means the action is still inside its budget and will run again next cycle. Once failed: true appears, the budget is exhausted and the index is parked until you POST to _plugins/_ism/retry.

What is a safe transition margin when the index is a CCR leader?

Set min_index_age (or min_rollover_age) above your maximum observed follower lag plus one job_interval, using the formula in the policy section. This guarantees the follower has drained the current segment set before the leader rewrites its routing attribute, which prevents orphaned followers and split-brain reads.

Should retries be exponential or constant for async ISM actions?

Exponential. Transient causes — a briefly unreachable snapshot repository, a watermark blip during relocation — clear on their own, and exponential backoff avoids hammering OpenSearch each cycle while they do. Reserve short constant delays for actions you expect to succeed almost immediately, such as replica_count.

Rollover Trigger Configuration — calibrate the conditions the scheduler evaluates each async cycle.
Phase Transition Logic — how states advance once an async action completes.
Threshold Tuning Strategies — size age and shard thresholds against poll latency.
Error Handling & Retries — retry shaping and failure classification for parked actions.
Python Orchestration Frameworks — reusable opensearch-py clients that wrap this polling pattern.
Handling async ISM policy execution failures — recovery playbooks for stuck and failed async jobs.

Up: ISM Policy Implementation & Python Automation

Async Execution Patterns in OpenSearch ISM and Cross-Cluster Replication

Async workload profile per tier #

Job scheduler architecture and polling mechanics #

How an async action advances #

Step-by-step configuration #

1. Cluster and scheduler configuration #

2. Index template #

3. Policy JSON #

4. Verification #

Production automation for state tracking #

Operational guardrails #

Troubleshooting #

Frequently asked questions #

Related #

Async workload profile per tier

Job scheduler architecture and polling mechanics

How an async action advances

Step-by-step configuration

1. Cluster and scheduler configuration

2. Index template

3. Policy JSON

4. Verification

Production automation for state tracking

Operational guardrails

Troubleshooting

Frequently asked questions

Related