Security & Access Boundaries

Every Index State Management (ISM) action runs as someone — the security context of whoever attached the policy is the context that later deletes indices, rewrites routing filters, and drives replication checkpoints, all unattended in a background scheduler. When those boundaries are loose, an over-privileged service account turns a routine rollover into a cross-tenant data-loss event: a policy scoped with cluster:admin inherits the power to detach retention locks, a shared credential lets a follower cluster issue administrative calls against its leader, and an unaudited _plugins/_ism/* endpoint becomes the quietest privilege-escalation path in the OpenSearch cluster. This guide fixes those boundaries in place — the permission-to-action mapping, the scoped roles, the Cross-Cluster Replication (CCR) credential split, the Python that deploys them, and the audit trail that proves they held — building directly on the OpenSearch ISM Architecture & Fundamentals execution model that schedules every managed transition.

Permission-to-action alignment

Least-privilege for ISM is not a single role — it is a per-phase contract. Each lifecycle action a policy runs maps to a specific security-plugin permission, and that permission should live on a dedicated principal that holds only what its phase needs. The table below is the contract the rest of this page enforces: for each managed action, the exact permission, the principal that should carry it, and the tier context it operates in. Grant nothing above the row a principal actually executes.

Lifecycle action	Required permission	Executing principal	Tier context
Author / update policy	`cluster:admin/opendistro/ism/policy/write`	Policy architect (human, MFA)	Cluster-wide, deploy-time only
Attach / detach policy	`indices:admin/opendistro/ism/managedindex`	CI/CD deploy account	Scoped to `logs-*` index patterns
Rollover (hot)	`indices:admin/opendistro/ism/*`, `indices:admin/rollover`	ISM scheduler context	`data_hot`
Allocation reroute (warm/cold)	`indices:admin/settings/update`	ISM scheduler context	`data_warm`, `data_cold`
Force merge / shrink	`indices:admin/forcemerge`, `indices:admin/resize`	ISM scheduler context	`data_warm`
Delete (retention)	`indices:admin/delete`	ISM scheduler context (locked)	`data_cold`, `data_frozen`
Manual transition	`cluster:admin/opendistro/ism/change`	On-call operator, break-glass	Any, audited
CCR follow / checkpoint	`indices:admin/plugins/replication/*`	Dedicated CCR principal	Follower cluster only

The decisive column is executing principal. When a rollover permission and a delete permission share one account, a compromised ingest token can erase compliance data; when they are separate roles, the blast radius of that token is a single rollover. The tiers this table references — and the routing attributes that bind an index to a tier — come from Node Role Allocation, and the sizing of each tier is set in Hot-Warm-Cold Tier Design. This page governs who is allowed to move an index between them. The full role-definition walkthrough, with copy-paste roles.yml and roles_mapping.yml, lives in Securing ISM policies with role-based access control.

How the security context propagates

The subtle part of ISM security is that authorization is evaluated twice, at two different times, under two different identities. At attach time, the security plugin checks the calling principal against the index-level permission for managedindex. At execution time — minutes or weeks later, inside the background scheduler — each action re-authorizes against the captured context that was stored with the managed index. That stored context is what actually deletes your index, not the account you were logged in as when the transition fired.

Two rules follow directly. First, the context captured at attach time must already be least-privilege, because there is no interactive identity to fall back on when the scheduler runs at 03:00. Second, a broad context captured once is a standing liability: if a policy is attached by an admin, every future delete on that index runs with admin rights, silently overriding the index-level require filters that Data Tier Routing Patterns depends on. Attach policies with the scoped deploy account, never with a superuser.

The captured-context lifecycle mirrors the index lifecycle itself — it is created at attach, replayed at each phase, and only released at delete:

A transition that lands in Blocked is the system working correctly — an under-scoped context refused an action rather than escalating. The recovery is to fix the role and retry the managed index, never to widen the policy’s captured context. When a reroute is blocked because a tier has no eligible node rather than because of a permission, that is a routing problem handled by Fallback Routing Strategies; the Index Lifecycle Basics transition conditions decide when each of these gates is evaluated.

Step-by-step boundary configuration

The four steps below stand up enforceable boundaries for a logs-prod-* index set: a scoped scheduler role, a role mapping that binds it to a service account, an isolated CCR credential, and a verification pass that proves the boundary rejects what it should.

1. Scoped ISM roles

Define a role that carries exactly the lifecycle permissions the scheduler needs, restricted to the target index pattern — nothing cluster-wide except the ISM read scope. Deny the policy-write permission on this role; policy authoring belongs to a separate human-owned role.

HTTP

PUT _plugins/_security/api/roles/ism_scheduler_logs
{
  "cluster_permissions": [
    "cluster:monitor/opendistro/ism/policy/read"
  ],
  "index_permissions": [
    {
      "index_patterns": ["logs-prod-*"],
      "allowed_actions": [
        "indices:admin/opendistro/ism/*",
        "indices:admin/rollover",
        "indices:admin/settings/update",
        "indices:admin/forcemerge",
        "indices:admin/delete"
      ]
    }
  ]
}

The index_patterns scope is the boundary. A role that grants indices:admin/delete on * is functionally a superuser for data destruction; the same permission scoped to logs-prod-* cannot touch a single index outside that pattern, no matter what policy captures it.

2. Service-account role mapping

Bind the role to a dedicated backend identity — never a human user, never the ingest account. Use a backend role so the mapping survives credential rotation.

HTTP

PUT _plugins/_security/api/rolesmapping/ism_scheduler_logs
{
  "backend_roles": ["svc-ism-scheduler"],
  "hosts": [],
  "users": []
}

Keep the policy-author mapping separate and human-only, gated behind MFA at the identity provider. The deploy account that runs attach operations (step 4 of the automation below) gets its own mapping with indices:admin/opendistro/ism/managedindex and nothing else.

3. CCR credential isolation and policy JSON

CCR is a distinct security perimeter: the follower authenticates to the leader on its own principal, scoped to indices:admin/plugins/replication/* and the replicated pattern only. Never reuse ingest, query, or ISM-scheduler credentials on a replication endpoint, and never let a follower hold administrative permissions on the leader. Enforce mutual TLS between clusters and pin the CCR port to a dedicated security group so a leaked follower token cannot open a general control channel.

The ISM policy itself declares no credentials — it inherits the captured scheduler context — but it should declare the retention hold that the delete gate checks, so destruction is never a race:

HTTP

PUT _plugins/_ism/policies/logs-prod-secure
{
  "policy": {
    "description": "Least-privilege lifecycle for logs-prod-*, delete gated on retention",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          { "rollover": { "min_primary_shard_size": "50gb", "min_index_age": "1d" } }
        ],
        "transitions": [{ "state_name": "warm", "conditions": { "min_index_age": "7d" } }]
      },
      {
        "name": "warm",
        "actions": [
          {
            "retry": { "count": 3, "backoff": "exponential", "delay": "15m" },
            "allocation": { "require": { "data": "warm" }, "wait_for": true }
          },
          { "force_merge": { "max_num_segments": 1 } }
        ],
        "transitions": [{ "state_name": "delete", "conditions": { "min_index_age": "90d" } }]
      },
      {
        "name": "delete",
        "actions": [{ "delete": {} }]
      }
    ]
  }
}

The 90-day gate on the delete transition is a security control, not just a lifecycle one: it bounds how quickly the captured scheduler context can act destructively, giving audit and drift detection a window to catch a mis-scoped policy before data is gone. Credentials for the deploy account and CCR principal are injected from a secrets manager at runtime, following OWASP Secrets Management guidelines — never baked into the policy or the image.

4. Verification

A boundary you have not tested to fail is not a boundary. Confirm both that the scheduler role can do its job and that it is refused everything outside its scope.

Shell

# The scheduler role can read and manage its own pattern (expect 200)
curl -s -u svc-ism-scheduler:*** \
  "https://<cluster>:9200/_plugins/_ism/explain/logs-prod-2026.07?pretty"

# The scheduler role is refused an out-of-scope index (expect 403 security_exception)
curl -s -u svc-ism-scheduler:*** \
  -X DELETE "https://<cluster>:9200/payments-2026.07"

# The scheduler role cannot author policies (expect 403)
curl -s -u svc-ism-scheduler:*** \
  -X PUT "https://<cluster>:9200/_plugins/_ism/policies/should-fail" \
  -H "Content-Type: application/json" -d '{"policy":{"default_state":"x","states":[]}}'

If the second or third call returns anything other than 403, the role is over-scoped — narrow index_patterns or remove the offending allowed_action before deploying.

Production automation with opensearch-py

Manual attachment leaks inconsistent security contexts — one operator attaches as admin, the next as the deploy account, and the captured contexts diverge silently. The class below makes attachment deterministic: it authenticates only as the scoped deploy account, verifies TLS against a pinned CA, deploys and attaches policies idempotently with backed-off retries, and reads back the captured context so CI can assert the boundary before merge.

Python

import os
import logging
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

logging.basicConfig(level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s")
logger = logging.getLogger(__name__)


class ISMBoundaryDeployer:
    """Deploy and attach ISM policies as a scoped, non-admin service account."""

    def __init__(self, base_url: str, ca_cert: str) -> None:
        self.base_url = base_url.rstrip("/")
        self.session = requests.Session()
        # Credentials come from the secrets manager, never the source tree.
        self.session.auth = (
            os.environ["OPENSEARCH_DEPLOY_USER"],
            os.environ["OPENSEARCH_DEPLOY_PASSWORD"],
        )
        self.session.verify = ca_cert  # pinned CA, not `True` and never `False`
        retry = Retry(total=3, backoff_factor=1.5,
                      status_forcelist=[429, 500, 502, 503, 504],
                      allowed_methods=["GET", "PUT", "POST"])
        self.session.mount("https://", HTTPAdapter(max_retries=retry))

    def deploy_policy(self, policy_id: str, policy: dict) -> bool:
        """Idempotently create/update a policy; 409 means already current."""
        url = f"{self.base_url}/_plugins/_ism/policies/{policy_id}"
        try:
            resp = self.session.put(url, json=policy, timeout=10)
            if resp.status_code == 409:
                logger.info("Policy '%s' already at current version", policy_id)
                return True
            resp.raise_for_status()
            logger.info("Deployed policy '%s'", policy_id)
            return True
        except requests.exceptions.RequestException as exc:
            logger.error("Policy deploy failed for '%s': %s", policy_id, exc)
            return False

    def attach_policy(self, policy_id: str, index_pattern: str) -> bool:
        """Attach as the scoped deploy account, capturing a least-privilege context."""
        url = f"{self.base_url}/_plugins/_ism/add/{index_pattern}"
        try:
            self.session.post(url, json={"policy_id": policy_id}, timeout=10).raise_for_status()
            logger.info("Attached '%s' to '%s'", policy_id, index_pattern)
            return True
        except requests.exceptions.RequestException as exc:
            logger.error("Attach failed: %s", exc)
            return False

    def assert_scope(self, index: str) -> str:
        """Read back the captured context so CI can gate on the boundary."""
        url = f"{self.base_url}/_plugins/_ism/explain/{index}"
        try:
            resp = self.session.get(url, timeout=10)
            resp.raise_for_status()
            body = resp.json()
            info = body.get(index, {})
            role = info.get("policy_id", "UNKNOWN")
            logger.info("Managed '%s' under policy '%s'", index, role)
            return role
        except requests.exceptions.RequestException as exc:
            logger.error("Scope assertion failed: %s", exc)
            return "ERROR"


if __name__ == "__main__":
    deployer = ISMBoundaryDeployer(
        base_url=os.getenv("OPENSEARCH_URL", "https://localhost:9200"),
        ca_cert=os.getenv("OPENSEARCH_CA_CERT", "/etc/opensearch/ca.pem"),
    )
    policy = {"policy": {"description": "scoped", "default_state": "hot", "states": [
        {"name": "hot", "actions": [{"rollover": {"min_index_age": "1d"}}], "transitions": []}
    ]}}
    if deployer.deploy_policy("logs-prod-secure", policy):
        deployer.attach_policy("logs-prod-secure", "logs-prod-*")
        deployer.assert_scope("logs-prod-2026.07")

Run this from CI on every policy change so attachment always happens under the same scoped identity, and on a short cron to catch drift. For the full permission model behind the roles it assumes, see the OpenSearch Security Plugin Documentation.

Operational guardrails

Boundaries erode between deploys. The settings below keep the audit trail intact and the transport locked, so a mis-scoped policy or a leaked credential surfaces as an alert rather than a post-mortem. Enable audit logging before the first policy is attached — you cannot reconstruct a context you never logged.

Setting	Recommended value	Effect
`plugins.security.audit.type`	`internal_opensearch`	Persists audit events to a searchable index
`plugins.security.audit.config.enable_transport`	`true`	Logs transport-layer ISM scheduler actions, not just REST
`plugins.security.audit.config.resolve_bulk_requests`	`true`	Expands bulk deletes so retention actions are individually logged
`plugins.security.ssl.transport.enforce_hostname_verification`	`true`	Blocks a rogue follower from impersonating a node
`plugins.security.restapi.roles_enabled`	`["all_access"]` (author role only)	Limits who can edit roles at runtime
`plugins.replication.follower.metadata_sync_interval`	`60s`	Bounds how stale a follower’s inherited settings can drift

Beyond settings, treat access boundaries as immutable configuration: reconcile deployed policies against a Git source of truth on a schedule, and require every index template to declare index.plugins.index_state_management.policy_id explicitly so no index is ever managed by an unexpected — and unaudited — captured context. The routing attributes those templates also declare are covered in Node Role Allocation; here the point is that the policy_id binding is itself a security control.

Troubleshooting access failures

Each failure below pairs a diagnosis command with the corrective action. The recurring anti-pattern to resist: never fix a 403 by widening the policy’s captured context — fix the role.

security_exception on a scheduled transition. The captured scheduler context lacks the permission for this phase’s action. Read which action was refused, then add that single permission to the scoped role — not to the policy.

Shell

curl -s "https://<cluster>:9200/_plugins/_ism/explain/logs-prod-2026.07?pretty" | grep -A3 '"info"'
# Fix: PUT the missing allowed_action onto ism_scheduler_logs, then _ism/retry the index

Policy attached with an over-broad context. An index is being managed under an admin context because it was attached interactively. Detach and re-attach as the scoped deploy account to recapture a least-privilege context.

Shell

curl -s "https://<cluster>:9200/_plugins/_ism/explain/logs-prod-2026.07?pretty"
# Fix: POST _plugins/_ism/remove/logs-prod-2026.07 then re-add via the CI deploy account

CCR follower rejected by the leader. The follower principal is missing the replication permission, or its mTLS certificate failed hostname verification. Check the leader’s audit index, then re-scope the CCR principal — do not grant it cluster admin.

Shell

curl -s "https://<leader>:9200/security-auditlog-*/_search?q=privilege:replication+AND+NOT+granted:true&size=5"
# Fix: grant indices:admin/plugins/replication/* on the follower pattern only, verify the cert SAN

Manual transition silently ignored. An operator issued _ism/change but lacks cluster:admin/opendistro/ism/change, so the request was authorized-away without an obvious error. Confirm the permission, then use the audited break-glass role.

Shell

curl -s -u <operator>:*** -X POST "https://<cluster>:9200/_plugins/_ism/change_policy/logs-prod-2026.07" \
  -H "Content-Type: application/json" -d '{"policy_id":"logs-prod-secure","state":"warm"}'
# Fix: assume the break-glass role that carries ism/change, which logs the actor and reason

Audit log has gaps around scheduler actions. Transitions are firing but only REST calls appear in the audit index. Transport-layer audit is disabled, so scheduler-driven deletes are invisible. Enable transport audit and re-test.

Shell

curl -s "https://<cluster>:9200/_cluster/settings?include_defaults=true&filter_path=**.audit.config.enable_transport"
# Fix: PUT plugins.security.audit.config.enable_transport=true, then confirm scheduler deletes log

Frequently asked questions

Which identity actually runs an ISM delete — the attacher or the scheduler?

Neither, exactly. At attach time the security plugin captures a security context and stores it with the managed index; the background scheduler replays that captured context at each phase, including delete. So the effective identity is whoever attached the policy, frozen at attach time. This is why you must attach with a scoped deploy account and never with a superuser — the admin rights, once captured, drive every future destructive action on that index.

Can I restrict ISM to specific index patterns rather than the whole cluster?

Yes, and you should. Put the lifecycle permissions (indices:admin/opendistro/ism/*, rollover, delete, forcemerge, settings/update) inside an index_permissions block scoped to your pattern, e.g. logs-prod-*. A role scoped this way cannot touch an index outside the pattern regardless of what policy captures it — the FGAC decider evaluates the index-level permission before the action runs.

Why keep CCR credentials separate from ISM and ingest credentials?

Because they cross a trust boundary the others do not. A CCR principal authenticates from the follower cluster to the leader; if it shares a credential with ingest or the ISM scheduler, a compromise on the follower becomes a foothold on the leader with lifecycle and write rights. Scope the CCR principal to indices:admin/plugins/replication/* on the replicated pattern only, enforce mutual TLS, and store the credential in a secrets manager injected at runtime.

How do I prove to an auditor that a retention delete was authorized?

Enable internal audit logging with enable_transport: true and resolve_bulk_requests: true before any policy is attached, so scheduler-driven deletes are logged individually at the transport layer. Each delete then carries the captured context’s identity, the index, and the timestamp in the security-auditlog-* index. Ship those to your SIEM and gate alerts on any delete whose context is not the scoped scheduler role.

Securing ISM policies with role-based access control — the full roles.yml / roles_mapping.yml walkthrough behind this page’s permission model.
Node Role Allocation — the tier attributes and node roles the permission table maps each action onto.
Hot-Warm-Cold Tier Design — sizing the tiers whose transitions these boundaries authorize.
Data Tier Routing Patterns — how ISM stamps routing filters that a broad captured context can override.
Index Lifecycle Basics — the transition conditions that decide when each authorization gate is evaluated.
Fallback Routing Strategies — distinguishing a permission block from a no-eligible-node block during a transition.

Up: OpenSearch ISM Architecture & Fundamentals

Security & Access Boundaries

Permission-to-action alignment #

How the security context propagates #

Step-by-step boundary configuration #

1. Scoped ISM roles #

2. Service-account role mapping #

3. CCR credential isolation and policy JSON #

4. Verification #

Production automation with opensearch-py #

Operational guardrails #

Troubleshooting access failures #

Frequently asked questions #

Related #

Permission-to-action alignment

How the security context propagates

Step-by-step boundary configuration

1. Scoped ISM roles

2. Service-account role mapping

3. CCR credential isolation and policy JSON

4. Verification

Production automation with opensearch-py

Operational guardrails

Troubleshooting access failures

Frequently asked questions

Related