AWS Elastic Disaster Recovery Job Exists

Overview

This check verifies that AWS Elastic Disaster Recovery (DRS) is enabled in your region and that at least one recovery or drill job has been executed. DRS helps you quickly recover your on-premises and cloud-based applications by replicating them to AWS.

Having a recovery job on record proves that your disaster recovery setup has been tested. Simply configuring DRS is not enough - you need to verify that failover actually works before a real disaster strikes.

Risk

Without DRS enabled or tested recovery jobs, your organization faces significant business continuity risks:

Untested recovery: If you have never run a recovery job, you do not know if your disaster recovery plan will work when needed
Extended downtime: During an outage, untested recovery procedures may fail or take much longer than expected
Data loss: Without validated replication, critical data may not be recoverable
Compliance gaps: Many compliance frameworks require documented and tested disaster recovery procedures
Increased RTO/RPO: Recovery Time Objective and Recovery Point Objective metrics cannot be validated without actual test runs

Testing your disaster recovery setup with periodic drills is essential for maintaining business resilience.

Remediation Steps

Prerequisites

You need:

AWS Console access with permissions to configure DRS
At least one source server (on-premises or cloud) that you want to protect
A VPC with subnets configured for DRS staging and recovery

Required IAM permissions (for administrators)

Your IAM user or role needs these permissions:

drs:InitializeService
drs:CreateReplicationConfigurationTemplate
drs:DescribeSourceServers
drs:StartRecovery
drs:DescribeJobs
ec2:DescribeSubnets
ec2:DescribeSecurityGroups
ec2:CreateSecurityGroup
iam:CreateServiceLinkedRole (for initial setup)

For full functionality, consider using the AWS managed policy AWSElasticDisasterRecoveryConsoleFullAccess.

Network requirements

DRS requires network connectivity between your source servers and AWS:

Outbound ports: Source servers need outbound access on TCP port 443 (HTTPS) and TCP port 1500 (replication)
Staging subnet: A subnet in your AWS VPC for the staging area (where replicated data is stored)
Recovery subnet: A subnet for launching recovery instances during drills or actual recovery

AWS Console Method

Step 1: Initialize DRS in Your Region

Go to AWS Elastic Disaster Recovery Console in us-east-1
If this is your first time, you will see a welcome page
Click Set default replication settings
Configure the replication settings:
- Select a Staging area subnet from your VPC
- Choose an Instance type for replication servers (t3.small is usually sufficient)
- Configure EBS encryption (recommended: enable with AWS managed key)
Click Configure and initialize

Step 2: Add a Source Server

In the DRS Console, click Source servers in the left sidebar
Click Add server
You will see installation instructions for the AWS Replication Agent
Copy the installation command provided (it includes your region and credentials)
Run the installation command on your source server:
- For Linux: Run the curl/wget command as root
- For Windows: Run the PowerShell command as Administrator
Wait for the server to appear in the console with status Not ready then Healthy

Step 3: Wait for Initial Sync to Complete

Monitor the source server in the DRS Console
The Data replication status will progress through stages:
- Initiating - Setting up replication
- Initial sync - Copying data (this may take hours depending on data size)
- Healthy - Replication is active and current
Wait until the server shows Healthy before proceeding

Step 4: Run a Recovery Drill

In the DRS Console, click Source servers
Select the checkbox next to your source server
Click Initiate recovery dropdown and select Initiate drill
Review the recovery settings and click Initiate drill
The job will start and you can monitor progress in Recovery job history

Step 5: Verify the Job Completed

Click Recovery job history in the left sidebar
Find your drill job in the list
Verify the Status shows Completed
You now have a documented recovery job that satisfies this check

Step 6: Clean Up Drill Resources (Important)

After verifying the drill worked, clean up to avoid ongoing charges:

Go to Recovery instances in the left sidebar
Select the drill instance
Click Actions > Disconnect from AWS
Then terminate the EC2 instance in the EC2 Console

AWS CLI (optional)

Step 1: Initialize DRS

aws drs initialize-service --region us-east-1

This command returns no output on success.

Step 2: Verify Initialization

aws drs describe-replication-configuration-templates \
    --region us-east-1

If DRS is initialized, this returns your replication configuration templates.

Step 3: List Source Servers

aws drs describe-source-servers --region us-east-1

This shows all source servers configured for replication.

Step 4: Start a Recovery Drill

Replace <source-server-id> with your actual source server ID (format: s-0123456789abcdef0):

aws drs start-recovery \
    --source-servers sourceServerID=<source-server-id> \
    --is-drill \
    --region us-east-1

Step 5: Check Recovery Jobs

aws drs describe-jobs --region us-east-1

This lists all recovery jobs, including drills. Look for jobs with status COMPLETED.

Step 6: Filter Jobs by Date Range

aws drs describe-jobs \
    --filters fromDate=2024-01-01T00:00:00Z,toDate=2024-12-31T23:59:59Z \
    --region us-east-1

CloudFormation (optional)

CloudFormation can configure DRS replication settings, but the actual recovery jobs must be initiated manually or through automation. This template sets up the replication configuration:

AWSTemplateFormatVersion: '2010-09-09'
Description: AWS Elastic Disaster Recovery replication configuration template

Parameters:
  StagingSubnetId:
    Type: AWS::EC2::Subnet::Id
    Description: Subnet ID for the DRS staging area

  ReplicationServerInstanceType:
    Type: String
    Default: t3.small
    Description: Instance type for replication servers
    AllowedValues:
      - t3.small
      - t3.medium
      - t3.large

Resources:
  DRSReplicationConfigTemplate:
    Type: AWS::DRS::ReplicationConfigurationTemplate
    Properties:
      AssociateDefaultSecurityGroup: true
      BandwidthThrottling: 0
      CreatePublicIP: false
      DataPlaneRouting: PRIVATE_IP
      DefaultLargeStagingDiskType: GP3
      EbsEncryption: DEFAULT
      ReplicationServerInstanceType: !Ref ReplicationServerInstanceType
      ReplicationServersSecurityGroupsIDs: []
      StagingAreaSubnetId: !Ref StagingSubnetId
      StagingAreaTags:
        Application: DisasterRecovery
        ManagedBy: CloudFormation
      UseDedicatedReplicationServer: false

Outputs:
  ReplicationConfigTemplateId:
    Description: ID of the DRS replication configuration template
    Value: !Ref DRSReplicationConfigTemplate

Deploy with:

aws cloudformation deploy \
    --template-file drs-replication-config.yaml \
    --stack-name drs-replication-setup \
    --parameter-overrides \
        StagingSubnetId=subnet-0123456789abcdef0 \
    --region us-east-1

Note: After deploying this template, you still need to:

Install the AWS Replication Agent on source servers
Manually initiate a recovery drill to satisfy the drs_job_exist check

Terraform (optional)

# Variables
variable "staging_subnet_id" {
  description = "Subnet ID for the DRS staging area"
  type        = string
}

variable "replication_instance_type" {
  description = "Instance type for replication servers"
  type        = string
  default     = "t3.small"
}

# DRS Replication Configuration Template
resource "aws_drs_replication_configuration_template" "main" {
  associate_default_security_group  = true
  bandwidth_throttling              = 0
  create_public_ip                  = false
  data_plane_routing                = "PRIVATE_IP"
  default_large_staging_disk_type   = "GP3"
  ebs_encryption                    = "DEFAULT"
  replication_server_instance_type  = var.replication_instance_type
  replication_servers_security_groups_ids = []
  staging_area_subnet_id            = var.staging_subnet_id
  use_dedicated_replication_server  = false

  staging_area_tags = {
    Application = "DisasterRecovery"
    ManagedBy   = "Terraform"
  }

  tags = {
    Name        = "DRS-Replication-Config"
    Environment = "production"
  }
}

# Output
output "replication_config_template_id" {
  description = "ID of the DRS replication configuration template"
  value       = aws_drs_replication_configuration_template.main.id
}

Deploy with:

terraform init
terraform plan -var="staging_subnet_id=subnet-0123456789abcdef0"
terraform apply -var="staging_subnet_id=subnet-0123456789abcdef0"

Note: After deploying this configuration, you still need to:

Install the AWS Replication Agent on source servers
Manually initiate a recovery drill to satisfy the drs_job_exist check

Verification

After running a recovery drill, verify the check will pass:

Check the Recovery Job History:
- Go to DRS Console > Recovery job history
- Verify at least one job shows Completed status
Confirm Source Server Health:
- Go to DRS Console > Source servers
- Verify your source servers show Healthy replication status
Re-run the Prowler Check:
- Run: prowler aws --checks drs_job_exist -r us-east-1
- The check should now pass

CLI verification commands

Check if DRS is initialized:

aws drs describe-replication-configuration-templates \
    --region us-east-1 \
    --query 'items[0].replicationConfigurationTemplateID'

If this returns a template ID, DRS is initialized.

List all recovery jobs:

aws drs describe-jobs \
    --region us-east-1 \
    --query 'items[*].{JobID:jobID,Status:status,Type:type,Created:creationDateTime}'

Look for at least one job with Status: COMPLETED.

Check source server status:

aws drs describe-source-servers \
    --region us-east-1 \
    --query 'items[*].{ServerID:sourceServerID,Hostname:sourceProperties.identificationHints.hostname,DataReplicationState:dataReplicationInfo.dataReplicationState}'

Healthy servers show DataReplicationState: CONTINUOUS.

Additional Resources

Notes

Regular drill schedule: Do not just run one drill to pass this check. Establish a regular schedule (quarterly is common) to ensure your disaster recovery procedures remain valid as your infrastructure changes.
Drill vs. actual recovery: The --is-drill flag marks the job as a test. Use drills for regular validation; actual recovery is for real disaster scenarios.
Cost considerations: DRS charges for:
- Replicated source servers (per server per hour)
- Staging area EBS storage
- Recovery instances when launched
Clean up drill instances promptly to minimize costs.
Replication lag: Monitor replication lag on your source servers. High lag means your recovery point may be older than expected during an actual recovery.
Multi-region strategy: Consider replicating to a different region than your primary workloads for true disaster resilience. If us-east-1 has an outage, recovering to us-east-1 would not help.
Application consistency: For databases and applications requiring consistent state, coordinate with application-level backup strategies. DRS provides crash-consistent recovery, not application-consistent.
Failback planning: After a recovery event, you will eventually want to fail back to your primary site. Test failback procedures during drills as well.
Agent updates: Keep the AWS Replication Agent updated on your source servers. AWS releases updates that improve performance and fix issues.

Overview​

Risk​

Remediation Steps​

Prerequisites​

AWS Console Method​

Step 1: Initialize DRS in Your Region​

Step 2: Add a Source Server​

Step 3: Wait for Initial Sync to Complete​

Step 4: Run a Recovery Drill​

Step 5: Verify the Job Completed​

Step 6: Clean Up Drill Resources (Important)​

Step 1: Initialize DRS​

Step 2: Verify Initialization​

Step 3: List Source Servers​

Step 4: Start a Recovery Drill​

Step 5: Check Recovery Jobs​

Step 6: Filter Jobs by Date Range​

Verification​

Additional Resources​

Notes​