Skip to main content

AWS Elastic Disaster Recovery Job Exists

Overview

This check verifies that AWS Elastic Disaster Recovery (DRS) is enabled in your region and that at least one recovery or drill job has been executed. DRS helps you quickly recover your on-premises and cloud-based applications by replicating them to AWS.

Having a recovery job on record proves that your disaster recovery setup has been tested. Simply configuring DRS is not enough - you need to verify that failover actually works before a real disaster strikes.

Risk

Without DRS enabled or tested recovery jobs, your organization faces significant business continuity risks:

  • Untested recovery: If you have never run a recovery job, you do not know if your disaster recovery plan will work when needed
  • Extended downtime: During an outage, untested recovery procedures may fail or take much longer than expected
  • Data loss: Without validated replication, critical data may not be recoverable
  • Compliance gaps: Many compliance frameworks require documented and tested disaster recovery procedures
  • Increased RTO/RPO: Recovery Time Objective and Recovery Point Objective metrics cannot be validated without actual test runs

Testing your disaster recovery setup with periodic drills is essential for maintaining business resilience.

Remediation Steps

Prerequisites

You need:

  • AWS Console access with permissions to configure DRS
  • At least one source server (on-premises or cloud) that you want to protect
  • A VPC with subnets configured for DRS staging and recovery
Required IAM permissions (for administrators)

Your IAM user or role needs these permissions:

  • drs:InitializeService
  • drs:CreateReplicationConfigurationTemplate
  • drs:DescribeSourceServers
  • drs:StartRecovery
  • drs:DescribeJobs
  • ec2:DescribeSubnets
  • ec2:DescribeSecurityGroups
  • ec2:CreateSecurityGroup
  • iam:CreateServiceLinkedRole (for initial setup)

For full functionality, consider using the AWS managed policy AWSElasticDisasterRecoveryConsoleFullAccess.

Network requirements

DRS requires network connectivity between your source servers and AWS:

  • Outbound ports: Source servers need outbound access on TCP port 443 (HTTPS) and TCP port 1500 (replication)
  • Staging subnet: A subnet in your AWS VPC for the staging area (where replicated data is stored)
  • Recovery subnet: A subnet for launching recovery instances during drills or actual recovery

AWS Console Method

Step 1: Initialize DRS in Your Region

  1. Go to AWS Elastic Disaster Recovery Console in us-east-1
  2. If this is your first time, you will see a welcome page
  3. Click Set default replication settings
  4. Configure the replication settings:
    • Select a Staging area subnet from your VPC
    • Choose an Instance type for replication servers (t3.small is usually sufficient)
    • Configure EBS encryption (recommended: enable with AWS managed key)
  5. Click Configure and initialize

Step 2: Add a Source Server

  1. In the DRS Console, click Source servers in the left sidebar
  2. Click Add server
  3. You will see installation instructions for the AWS Replication Agent
  4. Copy the installation command provided (it includes your region and credentials)
  5. Run the installation command on your source server:
    • For Linux: Run the curl/wget command as root
    • For Windows: Run the PowerShell command as Administrator
  6. Wait for the server to appear in the console with status Not ready then Healthy

Step 3: Wait for Initial Sync to Complete

  1. Monitor the source server in the DRS Console
  2. The Data replication status will progress through stages:
    • Initiating - Setting up replication
    • Initial sync - Copying data (this may take hours depending on data size)
    • Healthy - Replication is active and current
  3. Wait until the server shows Healthy before proceeding

Step 4: Run a Recovery Drill

  1. In the DRS Console, click Source servers
  2. Select the checkbox next to your source server
  3. Click Initiate recovery dropdown and select Initiate drill
  4. Review the recovery settings and click Initiate drill
  5. The job will start and you can monitor progress in Recovery job history

Step 5: Verify the Job Completed

  1. Click Recovery job history in the left sidebar
  2. Find your drill job in the list
  3. Verify the Status shows Completed
  4. You now have a documented recovery job that satisfies this check

Step 6: Clean Up Drill Resources (Important)

After verifying the drill worked, clean up to avoid ongoing charges:

  1. Go to Recovery instances in the left sidebar
  2. Select the drill instance
  3. Click Actions > Disconnect from AWS
  4. Then terminate the EC2 instance in the EC2 Console
AWS CLI (optional)

Step 1: Initialize DRS

aws drs initialize-service --region us-east-1

This command returns no output on success.

Step 2: Verify Initialization

aws drs describe-replication-configuration-templates \
--region us-east-1

If DRS is initialized, this returns your replication configuration templates.

Step 3: List Source Servers

aws drs describe-source-servers --region us-east-1

This shows all source servers configured for replication.

Step 4: Start a Recovery Drill

Replace <source-server-id> with your actual source server ID (format: s-0123456789abcdef0):

aws drs start-recovery \
--source-servers sourceServerID=<source-server-id> \
--is-drill \
--region us-east-1

Step 5: Check Recovery Jobs

aws drs describe-jobs --region us-east-1

This lists all recovery jobs, including drills. Look for jobs with status COMPLETED.

Step 6: Filter Jobs by Date Range

aws drs describe-jobs \
--filters fromDate=2024-01-01T00:00:00Z,toDate=2024-12-31T23:59:59Z \
--region us-east-1
CloudFormation (optional)

CloudFormation can configure DRS replication settings, but the actual recovery jobs must be initiated manually or through automation. This template sets up the replication configuration:

AWSTemplateFormatVersion: '2010-09-09'
Description: AWS Elastic Disaster Recovery replication configuration template

Parameters:
StagingSubnetId:
Type: AWS::EC2::Subnet::Id
Description: Subnet ID for the DRS staging area

ReplicationServerInstanceType:
Type: String
Default: t3.small
Description: Instance type for replication servers
AllowedValues:
- t3.small
- t3.medium
- t3.large

Resources:
DRSReplicationConfigTemplate:
Type: AWS::DRS::ReplicationConfigurationTemplate
Properties:
AssociateDefaultSecurityGroup: true
BandwidthThrottling: 0
CreatePublicIP: false
DataPlaneRouting: PRIVATE_IP
DefaultLargeStagingDiskType: GP3
EbsEncryption: DEFAULT
ReplicationServerInstanceType: !Ref ReplicationServerInstanceType
ReplicationServersSecurityGroupsIDs: []
StagingAreaSubnetId: !Ref StagingSubnetId
StagingAreaTags:
Application: DisasterRecovery
ManagedBy: CloudFormation
UseDedicatedReplicationServer: false

Outputs:
ReplicationConfigTemplateId:
Description: ID of the DRS replication configuration template
Value: !Ref DRSReplicationConfigTemplate

Deploy with:

aws cloudformation deploy \
--template-file drs-replication-config.yaml \
--stack-name drs-replication-setup \
--parameter-overrides \
StagingSubnetId=subnet-0123456789abcdef0 \
--region us-east-1

Note: After deploying this template, you still need to:

  1. Install the AWS Replication Agent on source servers
  2. Manually initiate a recovery drill to satisfy the drs_job_exist check
Terraform (optional)
# Variables
variable "staging_subnet_id" {
description = "Subnet ID for the DRS staging area"
type = string
}

variable "replication_instance_type" {
description = "Instance type for replication servers"
type = string
default = "t3.small"
}

# DRS Replication Configuration Template
resource "aws_drs_replication_configuration_template" "main" {
associate_default_security_group = true
bandwidth_throttling = 0
create_public_ip = false
data_plane_routing = "PRIVATE_IP"
default_large_staging_disk_type = "GP3"
ebs_encryption = "DEFAULT"
replication_server_instance_type = var.replication_instance_type
replication_servers_security_groups_ids = []
staging_area_subnet_id = var.staging_subnet_id
use_dedicated_replication_server = false

staging_area_tags = {
Application = "DisasterRecovery"
ManagedBy = "Terraform"
}

tags = {
Name = "DRS-Replication-Config"
Environment = "production"
}
}

# Output
output "replication_config_template_id" {
description = "ID of the DRS replication configuration template"
value = aws_drs_replication_configuration_template.main.id
}

Deploy with:

terraform init
terraform plan -var="staging_subnet_id=subnet-0123456789abcdef0"
terraform apply -var="staging_subnet_id=subnet-0123456789abcdef0"

Note: After deploying this configuration, you still need to:

  1. Install the AWS Replication Agent on source servers
  2. Manually initiate a recovery drill to satisfy the drs_job_exist check

Verification

After running a recovery drill, verify the check will pass:

  1. Check the Recovery Job History:

    • Go to DRS Console > Recovery job history
    • Verify at least one job shows Completed status
  2. Confirm Source Server Health:

    • Go to DRS Console > Source servers
    • Verify your source servers show Healthy replication status
  3. Re-run the Prowler Check:

    • Run: prowler aws --checks drs_job_exist -r us-east-1
    • The check should now pass
CLI verification commands

Check if DRS is initialized:

aws drs describe-replication-configuration-templates \
--region us-east-1 \
--query 'items[0].replicationConfigurationTemplateID'

If this returns a template ID, DRS is initialized.

List all recovery jobs:

aws drs describe-jobs \
--region us-east-1 \
--query 'items[*].{JobID:jobID,Status:status,Type:type,Created:creationDateTime}'

Look for at least one job with Status: COMPLETED.

Check source server status:

aws drs describe-source-servers \
--region us-east-1 \
--query 'items[*].{ServerID:sourceServerID,Hostname:sourceProperties.identificationHints.hostname,DataReplicationState:dataReplicationInfo.dataReplicationState}'

Healthy servers show DataReplicationState: CONTINUOUS.

Additional Resources

Notes

  • Regular drill schedule: Do not just run one drill to pass this check. Establish a regular schedule (quarterly is common) to ensure your disaster recovery procedures remain valid as your infrastructure changes.

  • Drill vs. actual recovery: The --is-drill flag marks the job as a test. Use drills for regular validation; actual recovery is for real disaster scenarios.

  • Cost considerations: DRS charges for:

    • Replicated source servers (per server per hour)
    • Staging area EBS storage
    • Recovery instances when launched

    Clean up drill instances promptly to minimize costs.

  • Replication lag: Monitor replication lag on your source servers. High lag means your recovery point may be older than expected during an actual recovery.

  • Multi-region strategy: Consider replicating to a different region than your primary workloads for true disaster resilience. If us-east-1 has an outage, recovering to us-east-1 would not help.

  • Application consistency: For databases and applications requiring consistent state, coordinate with application-level backup strategies. DRS provides crash-consistent recovery, not application-consistent.

  • Failback planning: After a recovery event, you will eventually want to fail back to your primary site. Test failback procedures during drills as well.

  • Agent updates: Keep the AWS Replication Agent updated on your source servers. AWS releases updates that improve performance and fix issues.