Ensure ElastiCache Redis Cluster Has Multi-AZ Enabled
Overview
This check verifies that your Amazon ElastiCache for Redis replication groups have Multi-AZ with automatic failover enabled. Multi-AZ distributes your primary and replica nodes across separate Availability Zones, providing high availability and automatic recovery if an outage occurs.
Risk
Without Multi-AZ enabled, your Redis cluster is vulnerable to:
- Service outages: If the Availability Zone hosting your Redis nodes fails, your application loses access to cached data
- Cold-cache rebuilds: After a failure, rebuilding the cache from scratch puts heavy load on your backend databases
- Cascading failures: Database overload from cache rebuilds can cause timeouts and failures across your entire application
- Data loss: Recent writes may be lost during failure events before they can be replicated
Remediation Steps
Prerequisites
- Access to the AWS Console with permissions to modify ElastiCache resources, OR
- AWS CLI installed and configured with appropriate credentials
- Your Redis cluster must have at least one replica (Multi-AZ requires replicas)
AWS Console Method
- Open the Amazon ElastiCache console
- In the left navigation, click Redis OSS caches
- Select the replication group you want to modify
- Click the Modify button
- In the Availability section, set Multi-AZ to Enabled
- Ensure Auto-failover is also set to Enabled (required for Multi-AZ)
- Under Schedule modifications, select Apply immediately (or schedule for your maintenance window)
- Click Modify
AWS CLI Method
Use the modify-replication-group command to enable Multi-AZ:
aws elasticache modify-replication-group \
--replication-group-id <your-replication-group-id> \
--multi-az-enabled \
--automatic-failover-enabled \
--apply-immediately \
--region us-east-1
Replace <your-replication-group-id> with your actual replication group identifier.
To find your replication group IDs:
aws elasticache describe-replication-groups \
--query "ReplicationGroups[*].[ReplicationGroupId,MultiAZ,AutomaticFailover]" \
--output table \
--region us-east-1
CloudFormation
To create a new Redis replication group with Multi-AZ enabled:
AWSTemplateFormatVersion: '2010-09-09'
Description: ElastiCache Redis replication group with Multi-AZ enabled
Parameters:
ReplicationGroupId:
Type: String
Description: Identifier for the replication group
Default: example-redis-cluster
SubnetGroupName:
Type: String
Description: Name of the cache subnet group
SecurityGroupId:
Type: String
Description: Security group ID for the cluster
Resources:
RedisReplicationGroup:
Type: AWS::ElastiCache::ReplicationGroup
Properties:
ReplicationGroupId: !Ref ReplicationGroupId
ReplicationGroupDescription: Redis replication group with Multi-AZ enabled
Engine: redis
EngineVersion: '7.0'
CacheNodeType: cache.t3.micro
NumCacheClusters: 2
MultiAZEnabled: true
AutomaticFailoverEnabled: true
Port: 6379
CacheSubnetGroupName: !Ref SubnetGroupName
SecurityGroupIds:
- !Ref SecurityGroupId
Tags:
- Key: Environment
Value: production
Outputs:
PrimaryEndpoint:
Description: Primary endpoint address
Value: !GetAtt RedisReplicationGroup.PrimaryEndPoint.Address
ReaderEndpoint:
Description: Reader endpoint address
Value: !GetAtt RedisReplicationGroup.ReaderEndPoint.Address
Key properties for Multi-AZ:
MultiAZEnabled: true- Enables Multi-AZ deploymentAutomaticFailoverEnabled: true- Required for Multi-AZ; enables automatic promotion of replica to primaryNumCacheClusters: 2- Must be at least 2 (one primary + one replica) for Multi-AZ
Terraform
resource "aws_elasticache_replication_group" "example" {
replication_group_id = "example-redis-cluster"
description = "Redis replication group with Multi-AZ enabled"
engine = "redis"
engine_version = "7.0"
node_type = "cache.t3.micro"
num_cache_clusters = 2
# Enable Multi-AZ with automatic failover
multi_az_enabled = true
automatic_failover_enabled = true
port = 6379
subnet_group_name = aws_elasticache_subnet_group.example.name
security_group_ids = [aws_security_group.redis.id]
tags = {
Environment = "production"
}
}
Key arguments for Multi-AZ:
multi_az_enabled = true- Enables Multi-AZ deploymentautomatic_failover_enabled = true- Required for Multi-AZ; enables automatic failovernum_cache_clusters = 2- Must be at least 2 for Multi-AZ (primary + replica)
Verification
After applying the changes, verify Multi-AZ is enabled:
- In the ElastiCache console, select your replication group
- Check the Multi-AZ column shows Enabled
- Verify Auto-failover also shows Enabled
CLI Verification
aws elasticache describe-replication-groups \
--replication-group-id <your-replication-group-id> \
--query "ReplicationGroups[0].[MultiAZ,AutomaticFailover]" \
--output text \
--region us-east-1
Expected output:
enabled enabled
Additional Resources
- AWS Documentation: High Availability for ElastiCache
- AWS Documentation: Minimizing Downtime in ElastiCache
- AWS Documentation: modify-replication-group CLI Reference
Notes
- Replica requirement: Multi-AZ requires at least one replica node. If your cluster has no replicas, you must add one before enabling Multi-AZ.
- Automatic failover: Multi-AZ and automatic failover work together. Enabling Multi-AZ automatically requires automatic failover to be enabled.
- Brief interruption: Enabling Multi-AZ on an existing cluster may cause a brief interruption during the modification. Consider scheduling this during a maintenance window.
- Cost consideration: Multi-AZ requires additional replica nodes in different Availability Zones, which increases costs. However, this is a worthwhile investment for production workloads requiring high availability.
- Application configuration: Ensure your application uses the primary endpoint for writes and optionally the reader endpoint for read scaling. This allows seamless failover without application changes.