Ensure ElastiCache Redis Clusters Have Automatic Failover Enabled
Overview
This check verifies that your Amazon ElastiCache Redis replication groups have automatic failover enabled. When automatic failover is enabled, ElastiCache automatically promotes a read replica to become the new primary node if the current primary fails or becomes unreachable.
Risk
Without automatic failover:
- A failure in the primary node requires manual intervention to restore service
- Your application may experience significant downtime while you manually promote a replica
- Redis uses asynchronous replication, so delays in recovery increase the risk of data loss or stale data
- Client applications may experience timeouts and connection failures during extended outages
Severity: Medium
Remediation Steps
Prerequisites
- Access to the AWS Console with permissions to modify ElastiCache clusters, OR
- AWS CLI installed and configured with appropriate credentials
- Your Redis cluster must have at least one replica (automatic failover requires replicas)
AWS Console Method
- Sign in to the AWS Console and navigate to ElastiCache
- In the left navigation, select Redis OSS caches (or Redis clusters in older console versions)
- Select the replication group you want to modify
- Click the Modify button
- Find the Auto failover setting and change it to Enabled
- Check the box for Apply immediately if you want the change to take effect right away
- Click Save changes
Note: Enabling automatic failover is recommended alongside Multi-AZ deployment. If Multi-AZ is not already enabled, consider enabling it at the same time for best resilience.
AWS CLI (optional)
Enable Automatic Failover
Run the following command to enable automatic failover for an existing replication group:
aws elasticache modify-replication-group \
--replication-group-id <your-replication-group-id> \
--automatic-failover-enabled \
--apply-immediately \
--region us-east-1
Replace <your-replication-group-id> with the ID of your Redis replication group.
Enable Both Automatic Failover and Multi-AZ
For best availability, enable both settings together:
aws elasticache modify-replication-group \
--replication-group-id <your-replication-group-id> \
--automatic-failover-enabled \
--multi-az-enabled \
--apply-immediately \
--region us-east-1
Check Current Status
To verify your current configuration:
aws elasticache describe-replication-groups \
--replication-group-id <your-replication-group-id> \
--query 'ReplicationGroups[0].{AutomaticFailover:AutomaticFailover,MultiAZ:MultiAZ}' \
--region us-east-1
CloudFormation (optional)
Use the following CloudFormation template to create a Redis replication group with automatic failover enabled:
AWSTemplateFormatVersion: '2010-09-09'
Description: ElastiCache Redis Replication Group with Automatic Failover
Parameters:
ReplicationGroupId:
Type: String
Description: Identifier for the replication group
Default: my-redis-cluster
Resources:
RedisReplicationGroup:
Type: AWS::ElastiCache::ReplicationGroup
Properties:
ReplicationGroupDescription: Redis cluster with automatic failover enabled
ReplicationGroupId: !Ref ReplicationGroupId
AutomaticFailoverEnabled: true
MultiAZEnabled: true
CacheNodeType: cache.t3.micro
Engine: redis
NumCacheClusters: 2
Port: 6379
Outputs:
ReplicationGroupId:
Description: The ID of the replication group
Value: !Ref RedisReplicationGroup
Key properties:
AutomaticFailoverEnabled: true- Enables automatic failoverMultiAZEnabled: true- Distributes replicas across availability zones (recommended)NumCacheClusters: 2- At least 2 clusters required (1 primary + 1 replica)
To update an existing stack, modify the template to set AutomaticFailoverEnabled: true and update the stack.
Terraform (optional)
Use the following Terraform configuration:
resource "aws_elasticache_replication_group" "redis" {
replication_group_id = "my-redis-cluster"
description = "Redis cluster with automatic failover enabled"
engine = "redis"
node_type = "cache.t3.micro"
num_cache_clusters = 2
port = 6379
automatic_failover_enabled = true
multi_az_enabled = true
tags = {
Environment = "production"
}
}
Key arguments:
automatic_failover_enabled = true- Enables automatic failovermulti_az_enabled = true- Distributes replicas across availability zones (recommended)num_cache_clusters = 2- At least 2 required for automatic failover (1 primary + 1 replica)
For existing resources, add or update the automatic_failover_enabled = true argument and run terraform apply.
Verification
After making changes, verify that automatic failover is enabled:
- In the AWS Console, navigate to ElastiCache > Redis OSS caches
- Select your replication group
- In the Details tab, confirm that Auto failover shows Enabled
CLI Verification
aws elasticache describe-replication-groups \
--replication-group-id <your-replication-group-id> \
--query 'ReplicationGroups[0].AutomaticFailover' \
--region us-east-1
The output should return "enabled".
Additional Resources
- Minimizing downtime in ElastiCache with Multi-AZ
- Testing automatic failover
- ElastiCache best practices
- AWS::ElastiCache::ReplicationGroup CloudFormation reference
Notes
- Replica requirement: Automatic failover requires at least one replica node. If your cluster only has a primary node, you must add a replica before enabling automatic failover.
- Multi-AZ recommended: For best resilience, enable Multi-AZ along with automatic failover. This ensures replicas are in different availability zones, protecting against zone-level failures.
- Brief interruption: During a failover event (automatic or manual), there may be a brief interruption (typically a few seconds) while DNS updates propagate. Design your application to handle brief connection interruptions with retry logic.
- Test regularly: AWS provides a test failover feature. Use it periodically to validate your application handles failover gracefully.
- Cluster mode considerations: For Redis clusters with cluster mode enabled, automatic failover works at the shard level. Each shard must have at least one replica.