Fallback Routing Strategies

Fallback routing strategies define the secondary shard-placement pathways OpenSearch follows when its primary allocation constraints cannot be satisfied during an Index State Management (ISM) phase transition. In production log and search platforms, strict tier affinity combined with disk watermark pressure regularly leaves indices stranded in an UNASSIGNED state: a hot→warm allocation action targets a full warm tier, the transition blocks, downstream rollovers back up, and Cross-Cluster Replication (CCR) checkpoints drift. Without a deterministic fallback path, a single saturated tier stalls the entire lifecycle. This guide sits within OpenSearch ISM Architecture & Fundamentals and shows how to encode overflow tiers, calibrate the thresholds that trigger them, and automate remediation so lifecycle progression degrades gracefully instead of halting.

Tier and Fallback Node Alignment

Fallback routing only works when every data tier declares both a primary attribute and an overflow target that maps to real hardware. The table below aligns each tier with its storage profile, compute ratio, routing attribute, and the fallback tier the allocator should try when the primary is full. This alignment is the physical foundation described in the Node Role Allocation model, and the ratios should match the capacity envelope you set in Hot-Warm-Cold Tier Design.

Tier	Storage profile	vCPU : RAM ratio	Primary routing attr	Fallback target	Primary workload
Hot	NVMe SSD, high IOPS	1 : 4 (compute-heavy)	`data: hot`	`warm`	Active ingest, real-time search, rollover
Warm	SATA SSD	1 : 8	`data: warm`	`cold`	Recent history, moderate query load
Cold	HDD / large SSD	1 : 16 (storage-heavy)	`data: cold`	`frozen`	Infrequent access, retention hold
Frozen	Object storage (S3) snapshots	1 : 16	`data: frozen`	none (terminal)	Searchable snapshots, archive

The fallback target is never a downgrade in durability — it is the next-cheapest tier that still satisfies query SLAs. A shard that overflows from warm to cold remains fully searchable; it simply pays a latency penalty until capacity frees up and a later evaluation cycle relocates it back. Terminal tiers (frozen) have no fallback and must instead trigger an alert plus a delete or snapshot action, which ties into the retention rules covered in Index Lifecycle Basics.

Routing Failure Modes and the Fallback Decision Path

Fallback routing activates when the allocation engine cannot satisfy a primary placement rule. The allocator evaluates constraints in a fixed precedence — require → include → exclude — and a strict require filter with no overflow will block indefinitely. The most common triggers are:

Disk watermark breaches (cluster.routing.allocation.disk.watermark.flood_stage) that push a tier read-only.
index.routing.allocation.require.* attribute mismatches after a node role change.
CCR leader–follower partitions or checkpoint drift that strand follower shards.
ISM action timeouts during shrink, force_merge, or rollover while a tier is saturated.

The lifecycle state machine below shows where each fallback branch sits relative to the normal hot→warm→cold→delete progression. A transition never skips a phase; instead the allocation action inside a phase is what falls back, keeping the state graph deterministic.

The decisive design choice is moving from strict require placement to capacity-aware include placement the moment a primary tier is unavailable. require is an all-or-nothing predicate; include is a preference that lets the allocator spill onto listed overflow nodes. Encoding that switch — rather than leaving the default blocking behaviour — is what prevents cascading stalls during node maintenance or scaling events. The deep procedural walkthrough lives in Implementing fallback routing for ISM phase transitions.

1. Node Configuration

Each data node must publish an immutable tier attribute in opensearch.yml. Fallback nodes advertise the same attribute value as the tier they absorb overflow for, or a dedicated overflow attribute the policy can target with include.

YAML

# opensearch.yml — warm tier node that also accepts warm overflow
node.roles: [ data, ingest ]
node.attr.data: warm          # primary tier identity
node.attr.overflow: warm_pool # targetable by include on fallback
cluster.routing.allocation.disk.threshold_enabled: true

Declare the matching attribute on the cold nodes that back warm overflow, so an include filter listing both warm and cold resolves to real capacity. Attribute values must match the index template and policy verbatim — a typo surfaces as ALLOCATION_FAILED, not a validation error.

2. Index Template

Bind new indices to a baseline tier and shard layout through an index template. The template sets the initial require on the hot tier; the policy is responsible for the fallback switch on later transitions.

HTTP

PUT _index_template/logs-fallback-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.number_of_replicas": 1,
      "index.routing.allocation.require.data": "hot",
      "plugins.index_state_management.policy_id": "tiered_fallback_policy"
    }
  }
}

3. ISM Policy JSON

The policy encodes the fallback contract. Each warm/cold allocation action uses require for the primary tier, but the transition and remediation layer (Step 5) demote it to include on failure. Setting wait_for: false stops the ISM worker from blocking indefinitely, so the next evaluation cycle re-attempts placement or hands off to automation.

HTTP

PUT _plugins/_ism/policies/tiered_fallback_policy
{
  "policy": {
    "description": "Hot-warm-cold lifecycle with explicit fallback routing",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          { "rollover": { "min_index_age": "2d", "min_primary_shard_size": "50gb" } }
        ],
        "transitions": [
          { "state_name": "warm", "conditions": { "min_index_age": "7d" } }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "retry": { "count": 5, "backoff": "exponential", "delay": "15m" },
            "allocation": {
              "require": { "data": "warm" },
              "wait_for": false
            }
          }
        ],
        "transitions": [
          { "state_name": "cold", "conditions": { "min_index_age": "30d" } }
        ]
      },
      {
        "name": "cold",
        "actions": [
          {
            "retry": { "count": 5, "backoff": "exponential", "delay": "30m" },
            "allocation": {
              "require": { "data": "cold" },
              "wait_for": false
            }
          }
        ],
        "transitions": [
          { "state_name": "delete", "conditions": { "min_index_age": "90d" } }
        ]
      },
      {
        "name": "delete",
        "actions": [ { "delete": {} } ]
      }
    ]
  }
}

4. Verification

Confirm the policy attached and that the allocator can satisfy — or fall back on — each require filter before relying on it in production. The _plugins/_ism/explain endpoint reports the managed index’s current state; _cluster/allocation/explain reports why a specific shard is or is not placed.

Shell

# Confirm the policy is attached and which state each index is in
curl -s "https://<cluster>:9200/_plugins/_ism/explain/logs-*?pretty"

# Explain placement for a shard that should have moved to warm
curl -s -X POST "https://<cluster>:9200/_cluster/allocation/explain" \
  -H "Content-Type: application/json" \
  -d '{"index": "logs-2026.07.01", "shard": 0, "primary": true}'

A healthy fallback shows current_state: warm in the ISM explain output and an allocation decision of YES against a warm node — or, under pressure, YES against a listed overflow node rather than a NO with a require rejection.

Production Automation for Validation and Remediation

Manual routing edits do not scale. The following opensearch-py manager watches unassigned-shard counts, pulls the allocation explanation, and demotes a stalled index’s require filter to an include filter that spans the primary and overflow tiers. It uses structured logging and narrow exception handling so it is safe to run on a schedule. Before deploying it, scope its service account per Security & Access Boundaries so it holds only indices:admin/settings/update on the target patterns.

Python

import logging
from opensearchpy import OpenSearch, exceptions

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)

# Ordered fallback map: primary tier -> tiers the allocator may include on overflow.
FALLBACK_TIERS = {
    "warm": ["warm", "cold"],
    "cold": ["cold", "frozen"],
}

class RoutingFallbackManager:
    def __init__(self, client: OpenSearch):
        self.client = client

    def explain_first_unassigned(self) -> dict:
        """Return the allocation explanation for the first unassigned shard, if any."""
        try:
            return self.client.transport.perform_request(
                "POST", "/_cluster/allocation/explain",
                body={"include_disk_info": True},
            )
        except exceptions.ConnectionError:
            logger.error("Failed to connect to OpenSearch cluster.")
            return {}
        except exceptions.TransportError as exc:
            # 400 with no unassigned shards is the healthy path, not an error.
            if getattr(exc, "status_code", None) == 400:
                return {}
            raise

    def apply_fallback_routing(self, index_name: str, primary_tier: str) -> bool:
        """Demote require -> include so the shard can spill onto overflow tiers."""
        include_targets = ",".join(FALLBACK_TIERS.get(primary_tier, [primary_tier]))
        payload = {
            "index.routing.allocation.require.data": None,   # clear the strict filter
            "index.routing.allocation.include.data": include_targets,
        }
        try:
            self.client.indices.put_settings(index=index_name, body=payload)
            logger.info("Fallback routing applied to %s -> include=%s", index_name, include_targets)
            return True
        except exceptions.TransportError as exc:
            logger.error("Failed to apply fallback routing for %s: %s", index_name, exc)
            return False

    def remediate_stalled_indices(self, max_unassigned: int = 5, primary_tier: str = "warm") -> None:
        """Trigger fallback routing when unassigned shards breach the threshold."""
        health = self.client.cluster.health()
        unassigned = health.get("unassigned_shards", 0)
        if unassigned <= max_unassigned:
            logger.info("Allocation healthy: %d unassigned shard(s).", unassigned)
            return
        explain = self.explain_first_unassigned()
        index_name = explain.get("index")
        if index_name:
            self.apply_fallback_routing(index_name, primary_tier)
            logger.warning("Fallback triggered for %s (%d unassigned).", index_name, unassigned)

if __name__ == "__main__":
    client = OpenSearch(
        hosts=[{"host": "localhost", "port": 9200}],
        http_auth=("admin", "admin"),
        use_ssl=True,
        verify_certs=True,
    )
    RoutingFallbackManager(client).remediate_stalled_indices()

Schedule the manager as a cron or Kubernetes CronJob and export its counters to your metrics backend for alerting. Aligning this loop with Index Lifecycle Basics guarantees phase transitions never block on transient node loss. For connection pooling and client tuning, see the opensearch-py documentation.

Operational Guardrails

Fallback routing is only safe when the thresholds that trigger it leave enough headroom to actually place the overflowing shards. Define the node headroom ratio as

H = 1 - \frac{U_{disk}}{C_{disk}}

where $U_{disk}$ is used bytes and $C_{disk}$ is node capacity. Trigger fallback while $H > (1 - \text{watermark}_{high})$ so shards relocate before the flood stage forces the tier read-only. The settings below give a multi-tier deployment that buffer.

Setting	Recommended value	Purpose
`cluster.routing.allocation.disk.watermark.low`	`82%`	Stop new shards landing on a filling node
`cluster.routing.allocation.disk.watermark.high`	`88%`	Begin relocating shards off the node
`cluster.routing.allocation.disk.watermark.flood_stage`	`93%`	Read-only lock — fallback must fire before this
`cluster.routing.allocation.node_concurrent_recoveries`	`2` (HDD) / `4` (NVMe)	Cap recovery I/O during a spill
`cluster.routing.allocation.total_shards_per_node`	tier-specific	Prevent hot-spotting on the overflow tier
ISM `retry` block	`count: 5, backoff: exponential`	Absorb transient failures before alerting

Keep the overflow tier at least one node below total_shards_per_node so it has room to absorb a spill. Capacity ratios between tiers should follow the envelope in Data Tier Routing Patterns; an undersized cold tier turns warm overflow into a second stall.

Troubleshooting

Failure mode	Diagnosis command	Fix command
Shards `UNASSIGNED` after `warm` transition	`POST _cluster/allocation/explain {"index":"<idx>","shard":0,"primary":true}`	`PUT <idx>/_settings {"index.routing.allocation.require.data":null,"index.routing.allocation.include.data":"warm,cold"}`
Transition stuck in `WAITING` state	`GET _plugins/_ism/explain/<idx>`	`POST _plugins/_ism/retry/<idx>` after freeing capacity or lifting the watermark
Tier read-only from flood stage	`GET _cat/allocation?v&h=node,disk.percent`	`PUT <idx>/_settings {"index.blocks.read_only_allow_delete":null}` then relocate shards
Fallback `include` still rejected	`POST _cluster/allocation/explain` (check `total_shards_per_node`)	`PUT _cluster/settings {"transient":{"cluster.routing.allocation.total_shards_per_node":<n+1>}}`
CCR follower shards stranded	`GET _plugins/_replication/follower/_status`	`PUT <follower>/_settings {"index.routing.allocation.include.data":"warm,cold"}` on the follower

For CCR specifically, follower indices inherit the leader’s routing rules, so a follower cluster with a differently-sized tier will retry-loop even when the leader is healthy. Override index.routing.allocation on the follower with an include filter during replication start, and tune plugins.replication.follower.index.recovery.chunk_size to trade sync throughput against fallback tolerance.

Node Role Allocation — how the allocator maps require/include filters to node attributes and watermarks.
Hot-Warm-Cold Tier Design — sizing the tiers and overflow capacity that fallback routing depends on.
Data Tier Routing Patterns — the placement patterns that determine where overflow shards land.
Index Lifecycle Basics — the phase model whose transitions trigger allocation actions.
Implementing fallback routing for ISM phase transitions — the step-by-step procedure for wiring this up.

Up one level: OpenSearch ISM Architecture & Fundamentals.

Fallback Routing Strategies

Tier and Fallback Node Alignment #

Routing Failure Modes and the Fallback Decision Path #

1. Node Configuration #

2. Index Template #

3. ISM Policy JSON #

4. Verification #

Production Automation for Validation and Remediation #

Operational Guardrails #

Troubleshooting #

Related #