Skip to main content

Ensure ElastiCache Redis Clusters Have Automatic Failover Enabled

Overview

This check verifies that your Amazon ElastiCache Redis replication groups have automatic failover enabled. When automatic failover is enabled, ElastiCache automatically promotes a read replica to become the new primary node if the current primary fails or becomes unreachable.

Risk

Without automatic failover:

  • A failure in the primary node requires manual intervention to restore service
  • Your application may experience significant downtime while you manually promote a replica
  • Redis uses asynchronous replication, so delays in recovery increase the risk of data loss or stale data
  • Client applications may experience timeouts and connection failures during extended outages

Severity: Medium

Remediation Steps

Prerequisites

  • Access to the AWS Console with permissions to modify ElastiCache clusters, OR
  • AWS CLI installed and configured with appropriate credentials
  • Your Redis cluster must have at least one replica (automatic failover requires replicas)

AWS Console Method

  1. Sign in to the AWS Console and navigate to ElastiCache
  2. In the left navigation, select Redis OSS caches (or Redis clusters in older console versions)
  3. Select the replication group you want to modify
  4. Click the Modify button
  5. Find the Auto failover setting and change it to Enabled
  6. Check the box for Apply immediately if you want the change to take effect right away
  7. Click Save changes

Note: Enabling automatic failover is recommended alongside Multi-AZ deployment. If Multi-AZ is not already enabled, consider enabling it at the same time for best resilience.

AWS CLI (optional)

Enable Automatic Failover

Run the following command to enable automatic failover for an existing replication group:

aws elasticache modify-replication-group \
--replication-group-id <your-replication-group-id> \
--automatic-failover-enabled \
--apply-immediately \
--region us-east-1

Replace <your-replication-group-id> with the ID of your Redis replication group.

Enable Both Automatic Failover and Multi-AZ

For best availability, enable both settings together:

aws elasticache modify-replication-group \
--replication-group-id <your-replication-group-id> \
--automatic-failover-enabled \
--multi-az-enabled \
--apply-immediately \
--region us-east-1

Check Current Status

To verify your current configuration:

aws elasticache describe-replication-groups \
--replication-group-id <your-replication-group-id> \
--query 'ReplicationGroups[0].{AutomaticFailover:AutomaticFailover,MultiAZ:MultiAZ}' \
--region us-east-1
CloudFormation (optional)

Use the following CloudFormation template to create a Redis replication group with automatic failover enabled:

AWSTemplateFormatVersion: '2010-09-09'
Description: ElastiCache Redis Replication Group with Automatic Failover

Parameters:
ReplicationGroupId:
Type: String
Description: Identifier for the replication group
Default: my-redis-cluster

Resources:
RedisReplicationGroup:
Type: AWS::ElastiCache::ReplicationGroup
Properties:
ReplicationGroupDescription: Redis cluster with automatic failover enabled
ReplicationGroupId: !Ref ReplicationGroupId
AutomaticFailoverEnabled: true
MultiAZEnabled: true
CacheNodeType: cache.t3.micro
Engine: redis
NumCacheClusters: 2
Port: 6379

Outputs:
ReplicationGroupId:
Description: The ID of the replication group
Value: !Ref RedisReplicationGroup

Key properties:

  • AutomaticFailoverEnabled: true - Enables automatic failover
  • MultiAZEnabled: true - Distributes replicas across availability zones (recommended)
  • NumCacheClusters: 2 - At least 2 clusters required (1 primary + 1 replica)

To update an existing stack, modify the template to set AutomaticFailoverEnabled: true and update the stack.

Terraform (optional)

Use the following Terraform configuration:

resource "aws_elasticache_replication_group" "redis" {
replication_group_id = "my-redis-cluster"
description = "Redis cluster with automatic failover enabled"

engine = "redis"
node_type = "cache.t3.micro"
num_cache_clusters = 2
port = 6379

automatic_failover_enabled = true
multi_az_enabled = true

tags = {
Environment = "production"
}
}

Key arguments:

  • automatic_failover_enabled = true - Enables automatic failover
  • multi_az_enabled = true - Distributes replicas across availability zones (recommended)
  • num_cache_clusters = 2 - At least 2 required for automatic failover (1 primary + 1 replica)

For existing resources, add or update the automatic_failover_enabled = true argument and run terraform apply.

Verification

After making changes, verify that automatic failover is enabled:

  1. In the AWS Console, navigate to ElastiCache > Redis OSS caches
  2. Select your replication group
  3. In the Details tab, confirm that Auto failover shows Enabled
CLI Verification
aws elasticache describe-replication-groups \
--replication-group-id <your-replication-group-id> \
--query 'ReplicationGroups[0].AutomaticFailover' \
--region us-east-1

The output should return "enabled".

Additional Resources

Notes

  • Replica requirement: Automatic failover requires at least one replica node. If your cluster only has a primary node, you must add a replica before enabling automatic failover.
  • Multi-AZ recommended: For best resilience, enable Multi-AZ along with automatic failover. This ensures replicas are in different availability zones, protecting against zone-level failures.
  • Brief interruption: During a failover event (automatic or manual), there may be a brief interruption (typically a few seconds) while DNS updates propagate. Design your application to handle brief connection interruptions with retry logic.
  • Test regularly: AWS provides a test failover feature. Use it periodically to validate your application handles failover gracefully.
  • Cluster mode considerations: For Redis clusters with cluster mode enabled, automatic failover works at the shard level. Each shard must have at least one replica.