Ensure OpenSearch Domains Have Fault-Tolerant Data Nodes
Overview
This check verifies that your Amazon OpenSearch Service domains are configured for fault tolerance. A fault-tolerant domain requires:
- At least 3 data nodes to maintain quorum and data availability
- Zone Awareness enabled to distribute data across multiple Availability Zones
Without these settings, a single node or zone failure could cause data loss or service outages.
Risk
If your OpenSearch domain lacks fault tolerance:
- A single node failure could make data shards unavailable
- An Availability Zone outage could take down your entire cluster
- Write operations may fail during node recovery
- Data inconsistency can occur during rebalancing
- Your search and analytics applications may experience downtime
Remediation Steps
Prerequisites
- Access to the AWS Console with permissions to modify OpenSearch domains, OR
- AWS CLI configured with appropriate credentials
- Your domain must support at least 3 nodes (check instance type limits)
AWS Console Method
- Sign in to the AWS Console and navigate to Amazon OpenSearch Service
- Select your domain from the list
- Click Edit domain
- Under Cluster configuration:
- Set Number of data nodes to 3 or more
- Enable Zone Awareness
- Set Availability Zones to 3 (recommended) or 2
- Review your changes and click Submit
Note: Changes may take several minutes to complete. The domain status will show "Processing" during the update.
AWS CLI (optional)
Update your domain to enable fault tolerance:
aws opensearch update-domain-config \
--domain-name <your-domain-name> \
--cluster-config '{
"InstanceCount": 3,
"ZoneAwarenessEnabled": true,
"ZoneAwarenessConfig": {
"AvailabilityZoneCount": 3
}
}' \
--region us-east-1
Replace <your-domain-name> with your actual domain name.
For 2 Availability Zones (if 3-AZ is not available in your region):
aws opensearch update-domain-config \
--domain-name <your-domain-name> \
--cluster-config '{
"InstanceCount": 3,
"ZoneAwarenessEnabled": true,
"ZoneAwarenessConfig": {
"AvailabilityZoneCount": 2
}
}' \
--region us-east-1
Check the current configuration:
aws opensearch describe-domain \
--domain-name <your-domain-name> \
--query 'DomainStatus.ClusterConfig' \
--region us-east-1
CloudFormation (optional)
Use this template to create or update an OpenSearch domain with fault-tolerant configuration:
AWSTemplateFormatVersion: '2010-09-09'
Description: OpenSearch domain with fault-tolerant configuration
Parameters:
DomainName:
Type: String
Description: Name of the OpenSearch domain
Default: my-opensearch-domain
InstanceType:
Type: String
Description: Instance type for data nodes
Default: r6g.large.search
Resources:
OpenSearchDomain:
Type: AWS::OpenSearchService::Domain
Properties:
DomainName: !Ref DomainName
EngineVersion: OpenSearch_2.11
ClusterConfig:
InstanceType: !Ref InstanceType
InstanceCount: 3
ZoneAwarenessEnabled: true
ZoneAwarenessConfig:
AvailabilityZoneCount: 3
DedicatedMasterEnabled: true
DedicatedMasterType: m6g.large.search
DedicatedMasterCount: 3
EBSOptions:
EBSEnabled: true
VolumeType: gp3
VolumeSize: 100
NodeToNodeEncryptionOptions:
Enabled: true
EncryptionAtRestOptions:
Enabled: true
DomainEndpointOptions:
EnforceHTTPS: true
TLSSecurityPolicy: Policy-Min-TLS-1-2-2019-07
Outputs:
DomainArn:
Description: ARN of the OpenSearch domain
Value: !GetAtt OpenSearchDomain.Arn
DomainEndpoint:
Description: Endpoint of the OpenSearch domain
Value: !GetAtt OpenSearchDomain.DomainEndpoint
Deploy the stack:
aws cloudformation deploy \
--template-file opensearch-fault-tolerant.yaml \
--stack-name opensearch-fault-tolerant \
--parameter-overrides DomainName=<your-domain-name> \
--region us-east-1
Terraform (optional)
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
variable "domain_name" {
description = "Name of the OpenSearch domain"
type = string
default = "my-opensearch-domain"
}
variable "instance_type" {
description = "Instance type for data nodes"
type = string
default = "r6g.large.search"
}
resource "aws_opensearch_domain" "main" {
domain_name = var.domain_name
engine_version = "OpenSearch_2.11"
cluster_config {
instance_type = var.instance_type
instance_count = 3
zone_awareness_enabled = true
zone_awareness_config {
availability_zone_count = 3
}
dedicated_master_enabled = true
dedicated_master_type = "m6g.large.search"
dedicated_master_count = 3
}
ebs_options {
ebs_enabled = true
volume_type = "gp3"
volume_size = 100
}
encrypt_at_rest {
enabled = true
}
node_to_node_encryption {
enabled = true
}
domain_endpoint_options {
enforce_https = true
tls_security_policy = "Policy-Min-TLS-1-2-2019-07"
}
tags = {
Environment = "production"
}
}
output "domain_arn" {
description = "ARN of the OpenSearch domain"
value = aws_opensearch_domain.main.arn
}
output "domain_endpoint" {
description = "Endpoint of the OpenSearch domain"
value = aws_opensearch_domain.main.endpoint
}
Apply the configuration:
terraform init
terraform apply -var="domain_name=<your-domain-name>"
Verification
After making changes, verify your domain is fault-tolerant:
- In the AWS Console, navigate to Amazon OpenSearch Service
- Select your domain
- Under Cluster configuration, confirm:
- Number of data nodes is 3 or more
- Zone Awareness is enabled
- Availability Zones shows 2 or 3
CLI verification
aws opensearch describe-domain \
--domain-name <your-domain-name> \
--query 'DomainStatus.ClusterConfig.{InstanceCount:InstanceCount,ZoneAwarenessEnabled:ZoneAwarenessEnabled,AvailabilityZoneCount:ZoneAwarenessConfig.AvailabilityZoneCount}' \
--region us-east-1
Expected output for a fault-tolerant domain:
{
"InstanceCount": 3,
"ZoneAwarenessEnabled": true,
"AvailabilityZoneCount": 3
}
Additional Resources
- Configuring a multi-AZ domain in Amazon OpenSearch Service
- Best practices for Amazon OpenSearch Service
- Sizing Amazon OpenSearch Service domains
Notes
- Node count multiples: Use node counts in multiples of 3 when using 3 Availability Zones (e.g., 3, 6, 9) to ensure even distribution across zones.
- Replica shards: Configure your indices with at least 1 replica shard to take full advantage of fault tolerance.
- Dedicated master nodes: For production workloads, enable dedicated master nodes (3 is recommended) to improve cluster stability.
- VPC considerations: If your domain is in a VPC, ensure you have subnets in at least as many Availability Zones as your ZoneAwarenessConfig specifies.
- Cost impact: Enabling fault tolerance increases costs due to additional nodes and cross-zone data transfer. Plan your capacity accordingly.
- Update time: Domain configuration changes can take 15-30 minutes to complete. During this time, the domain remains available but may have reduced performance.