Ensure ElastiCache Redis Clusters Have Automatic Failover Enabled

Overview

This check verifies that your Amazon ElastiCache Redis replication groups have automatic failover enabled. When automatic failover is enabled, ElastiCache automatically promotes a read replica to become the new primary node if the current primary fails or becomes unreachable.

Risk

Without automatic failover:

A failure in the primary node requires manual intervention to restore service
Your application may experience significant downtime while you manually promote a replica
Redis uses asynchronous replication, so delays in recovery increase the risk of data loss or stale data
Client applications may experience timeouts and connection failures during extended outages

Severity: Medium

Remediation Steps

Prerequisites

Access to the AWS Console with permissions to modify ElastiCache clusters, OR
AWS CLI installed and configured with appropriate credentials
Your Redis cluster must have at least one replica (automatic failover requires replicas)

AWS Console Method

Sign in to the AWS Console and navigate to ElastiCache
In the left navigation, select Redis OSS caches (or Redis clusters in older console versions)
Select the replication group you want to modify
Click the Modify button
Find the Auto failover setting and change it to Enabled
Check the box for Apply immediately if you want the change to take effect right away
Click Save changes

Note: Enabling automatic failover is recommended alongside Multi-AZ deployment. If Multi-AZ is not already enabled, consider enabling it at the same time for best resilience.

AWS CLI (optional)

Enable Automatic Failover

Run the following command to enable automatic failover for an existing replication group:

aws elasticache modify-replication-group \
    --replication-group-id <your-replication-group-id> \
    --automatic-failover-enabled \
    --apply-immediately \
    --region us-east-1

Replace <your-replication-group-id> with the ID of your Redis replication group.

Enable Both Automatic Failover and Multi-AZ

For best availability, enable both settings together:

aws elasticache modify-replication-group \
    --replication-group-id <your-replication-group-id> \
    --automatic-failover-enabled \
    --multi-az-enabled \
    --apply-immediately \
    --region us-east-1

Check Current Status

To verify your current configuration:

aws elasticache describe-replication-groups \
    --replication-group-id <your-replication-group-id> \
    --query 'ReplicationGroups[0].{AutomaticFailover:AutomaticFailover,MultiAZ:MultiAZ}' \
    --region us-east-1

CloudFormation (optional)

Use the following CloudFormation template to create a Redis replication group with automatic failover enabled:

AWSTemplateFormatVersion: '2010-09-09'
Description: ElastiCache Redis Replication Group with Automatic Failover

Parameters:
  ReplicationGroupId:
    Type: String
    Description: Identifier for the replication group
    Default: my-redis-cluster

Resources:
  RedisReplicationGroup:
    Type: AWS::ElastiCache::ReplicationGroup
    Properties:
      ReplicationGroupDescription: Redis cluster with automatic failover enabled
      ReplicationGroupId: !Ref ReplicationGroupId
      AutomaticFailoverEnabled: true
      MultiAZEnabled: true
      CacheNodeType: cache.t3.micro
      Engine: redis
      NumCacheClusters: 2
      Port: 6379

Outputs:
  ReplicationGroupId:
    Description: The ID of the replication group
    Value: !Ref RedisReplicationGroup

Key properties:

AutomaticFailoverEnabled: true - Enables automatic failover
MultiAZEnabled: true - Distributes replicas across availability zones (recommended)
NumCacheClusters: 2 - At least 2 clusters required (1 primary + 1 replica)

To update an existing stack, modify the template to set AutomaticFailoverEnabled: true and update the stack.

Terraform (optional)

Use the following Terraform configuration:

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id = "my-redis-cluster"
  description          = "Redis cluster with automatic failover enabled"

  engine             = "redis"
  node_type          = "cache.t3.micro"
  num_cache_clusters = 2
  port               = 6379

  automatic_failover_enabled = true
  multi_az_enabled           = true

  tags = {
    Environment = "production"
  }
}

Key arguments:

automatic_failover_enabled = true - Enables automatic failover
multi_az_enabled = true - Distributes replicas across availability zones (recommended)
num_cache_clusters = 2 - At least 2 required for automatic failover (1 primary + 1 replica)

For existing resources, add or update the automatic_failover_enabled = true argument and run terraform apply.

Verification

After making changes, verify that automatic failover is enabled:

In the AWS Console, navigate to ElastiCache > Redis OSS caches
Select your replication group
In the Details tab, confirm that Auto failover shows Enabled

CLI Verification

aws elasticache describe-replication-groups \
    --replication-group-id <your-replication-group-id> \
    --query 'ReplicationGroups[0].AutomaticFailover' \
    --region us-east-1

The output should return "enabled".

Additional Resources

Notes

Replica requirement: Automatic failover requires at least one replica node. If your cluster only has a primary node, you must add a replica before enabling automatic failover.
Multi-AZ recommended: For best resilience, enable Multi-AZ along with automatic failover. This ensures replicas are in different availability zones, protecting against zone-level failures.
Brief interruption: During a failover event (automatic or manual), there may be a brief interruption (typically a few seconds) while DNS updates propagate. Design your application to handle brief connection interruptions with retry logic.
Test regularly: AWS provides a test failover feature. Use it periodically to validate your application handles failover gracefully.
Cluster mode considerations: For Redis clusters with cluster mode enabled, automatic failover works at the shard level. Each shard must have at least one replica.

Overview​

Risk​

Remediation Steps​

Prerequisites​

AWS Console Method​

Enable Automatic Failover​

Enable Both Automatic Failover and Multi-AZ​

Check Current Status​

Verification​

Additional Resources​

Notes​