Mapping data tiers to OpenSearch node roles

Deterministic tier routing prevents disk watermark breaches, eliminates Index State Management (ISM) transition deadlocks, and enforces strict I/O boundaries across distributed clusters. Misaligned tier mappings force hot nodes to absorb cold workloads, trigger uncontrolled shard relocation storms, and stall Cross-Cluster Replication (CCR) pipelines. This reference provides exact routing payloads, validation commands, and Python enforcement patterns to lock tier-to-role alignment at scale.

Node Role Declaration & State Verification

OpenSearch evaluates routing decisions against explicit node role tags (data_hot, data_warm, data_cold, data_frozen). Legacy deployments using the generic data role bypass tier-aware allocation and default to capacity-based placement, which breaks ISM lifecycle guarantees.

Configure explicit tier roles in opensearch.yml on each data node before cluster initialization or during rolling upgrades:

YAML
node.roles: ["data_hot", "ingest"]
node.attr.rack_id: "az-1"

Restart nodes sequentially to maintain quorum. Verify role propagation using the cluster state API:

Shell
curl -s -X GET "https://<cluster-endpoint>:9200/_cluster/state?filter_path=nodes.nodes.*.roles" | jq '.nodes[].roles'

Expected output must contain tier-specific strings. If roles resolve to data or data_content, routing will ignore _tier_preference directives. Correct role boundaries are foundational to OpenSearch ISM Architecture & Fundamentals and must be validated before deploying lifecycle policies.

Index Template v2 Routing Payloads

New indices inherit routing constraints from Index Template v2. The _tier_preference array dictates creation-time shard placement and establishes the baseline for subsequent ISM transitions.

Deploy a versioned template with explicit tier priority:

JSON
PUT _index_template/observability-hot-warm-cold
{
  "index_patterns": ["logs-app-*", "metrics-infra-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.routing.allocation.require._tier_preference": ["data_hot", "data_warm", "data_cold"],
      "index.refresh_interval": "15s",
      "index.plugins.index_state_management.policy_id": "observability-lifecycle"
    }
  },
  "priority": 500,
  "version": 2,
  "_meta": {
    "description": "Enforces hot-warm-cold routing for application telemetry"
  }
}

Operational constraints:

  • _tier_preference evaluates strictly left-to-right. The first available tier matching the array receives primary shards.
  • Omitting this setting routes indices to any node with a data role, bypassing tier watermarks and exhausting NVMe storage.
  • Template priority must exceed 100 to override legacy v1 templates and component templates.

ISM Policy Transition Mapping

Index templates anchor creation routing; ISM policies govern lifecycle migration. Each state transition must explicitly declare the target tier via the allocation action. Without explicit tier requirements, shards remain pinned to their origin node despite age or size thresholds.

JSON
PUT _plugins/_ism/policies/observability-lifecycle
{
  "policy": {
    "description": "Automated tier migration with strict allocation boundaries",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          { "rollover": { "min_size": "50gb", "min_index_age": "1d" } }
        ],
        "transitions": [{ "state_name": "warm", "conditions": { "min_index_age": "7d" } }]
      },
      {
        "name": "warm",
        "actions": [
          {
            "allocation": {
              "require": { "_tier_preference": "data_warm" },
              "wait_for": true
            }
          },
          { "shrink": { "num_new_shards": 1 } }
        ],
        "transitions": [{ "state_name": "cold", "conditions": { "min_index_age": "30d" } }]
      },
      {
        "name": "cold",
        "actions": [
          {
            "allocation": {
              "require": { "_tier_preference": "data_cold" },
              "wait_for": true
            }
          },
          { "force_merge": { "max_num_segments": 1 } }
        ]
      }
    ]
  }
}

Critical routing behaviors:

  • wait_for: true blocks the policy until shard relocation completes, preventing premature state advancement.
  • ISM evaluates allocation.require against current node roles. If no nodes match the target tier, the index enters UNASSIGNED state and halts the lifecycle.
  • Shard relocation respects cluster.routing.allocation.disk.watermark.low thresholds. Ensure warm/cold nodes provision sufficient headroom before policy activation.

Cross-Cluster Replication Tier Alignment

CCR follower indices inherit routing constraints from the leader cluster only if explicitly configured. Default CCR behavior replicates shard placement, which causes tier mismatch when leader and follower clusters use heterogeneous hardware profiles.

Configure follower index settings to override inherited routing:

JSON
PUT _plugins/_replication/<follower-index>/_start
{
  "leader_alias": "prod-cluster",
  "leader_index": "logs-app-2024.01",
  "settings": {
    "index.routing.allocation.require._tier_preference": ["data_warm", "data_cold"]
  }
}

Follower indices must target tiers with equivalent or greater storage capacity than the leader. Misaligned CCR routing triggers ALLOCATION_FAILED errors and stalls replication checkpoints. Validate cross-cluster tier parity before enabling auto-follow patterns.

Python Enforcement & Drift Correction

Automated tier drift occurs when manual shard relocations, node decommissions, or template overrides bypass ISM controls. The following script audits active indices, detects routing mismatches, and applies corrective allocation settings via the OpenSearch API.

Python
import requests
import json
from urllib3.util import Retry
from requests.adapters import HTTPAdapter

CLUSTER_URL = "https://<cluster-endpoint>:9200"
AUTH = ("admin", "secure-password")
EXPECTED_TIERS = {"data_hot", "data_warm", "data_cold"}

def get_session():
    session = requests.Session()
    session.auth = AUTH
    session.verify = "/path/to/ca-bundle.crt"
    retries = Retry(total=3, backoff_factor=0.5, status_forcelist=[502, 503, 504])
    session.mount("https://", HTTPAdapter(max_retries=retries))
    return session

def audit_tier_alignment(session):
    indices = session.get(f"{CLUSTER_URL}/_cat/indices?format=json&h=index,settings.index.routing.allocation.require._tier_preference").json()
    misaligned = []
    for idx in indices:
        name = idx.get("index")
        tier_pref = idx.get("settings.index.routing.allocation.require._tier_preference")
        if not tier_pref or not any(t in EXPECTED_TIERS for t in tier_pref.split(",")):
            misaligned.append(name)
    return misaligned

def enforce_routing(session, index_name, target_tier="data_warm"):
    payload = {
        "transient": {
            "index.routing.allocation.require._tier_preference": target_tier
        }
    }
    resp = session.put(f"{CLUSTER_URL}/{index_name}/_settings", json=payload)
    resp.raise_for_status()
    return resp.json()

if __name__ == "__main__":
    session = get_session()
    drift_indices = audit_tier_alignment(session)
    if drift_indices:
        print(f"Detected {len(drift_indices)} misaligned indices. Applying warm-tier routing...")
        for idx in drift_indices:
            try:
                enforce_routing(session, idx, "data_warm")
                print(f"  ✅ {idx} routed to data_warm")
            except requests.exceptions.HTTPError as e:
                print(f"  ❌ Failed to update {idx}: {e.response.text}")
    else:
        print("✅ All indices aligned with expected tier routing.")

Schedule this script via cron or Kubernetes CronJob to run every 15 minutes. Integrate with alerting pipelines to trigger PagerDuty notifications when drift exceeds a configurable threshold.

Debugging Routing Failures & Watermark Breaches

When shards fail to relocate or indices stall in UNASSIGNED state, isolate the routing decision using the allocation explain API:

Shell
curl -s -X POST "https://<cluster-endpoint>:9200/_cluster/allocation/explain" -H 'Content-Type: application/json' -d '{
  "index": "logs-app-2024.01",
  "shard": 0,
  "primary": true
}'

Parse the deciders array in the response. Common failure modes:

  • tier_preference: No nodes match the required _tier_preference string. Verify node roles and restart failed nodes.
  • disk_threshold: Node disk usage exceeds cluster.routing.allocation.disk.watermark.flood_stage. Clear disk space or scale cold-tier capacity.
  • same_shard: Replica placement conflicts with primary node. Adjust index.routing.allocation.require._tier_preference to include multiple tiers.

Apply temporary fallback routing to unblock stuck indices:

JSON
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%"
  }
}

Revert watermark thresholds immediately after shard recovery. Persistent watermark breaches indicate capacity planning failures, not routing misconfigurations.

For comprehensive routing policy definitions and cluster topology validation, reference Node Role Allocation guidelines. Official OpenSearch documentation provides additional context on data tier architecture and ISM policy syntax.