Skip to main content

EMR Cluster Master Nodes No Public IP

Overview

This check identifies Amazon EMR clusters where primary (master) or worker nodes have public IP addresses assigned. EMR clusters should run in private subnets to prevent direct internet exposure.

Public IP addresses on EMR nodes make administrative interfaces and management ports directly reachable from the internet, which is rarely necessary and increases your attack surface.

Risk

EMR clusters with public IP addresses face several serious threats:

  • Admin UI exposure: EMR web interfaces (YARN, Spark, Ganglia) become accessible from the internet, potentially leaking job information and cluster details
  • SSH brute force attacks: The SSH port on master nodes becomes a target for automated credential-guessing attacks
  • Service exploitation: Unpatched vulnerabilities in Hadoop, Spark, or other EMR components can be exploited directly
  • Data exfiltration: A compromised master node could alter jobs to exfiltrate data from S3 or HDFS
  • Workload disruption: Attackers could terminate jobs, corrupt data, or use cluster resources for cryptomining

Running EMR clusters in private subnets with no public IPs significantly reduces these risks.

Remediation Steps

Prerequisites

You need:

  • AWS Console access with permissions to manage EMR clusters and VPC resources
  • An existing VPC with private subnets (subnets without a route to an Internet Gateway)
  • A way to access private resources (Session Manager, bastion host, or VPN)
Required IAM permissions (for administrators)

Your IAM user or role needs these permissions:

  • elasticmapreduce:ListClusters
  • elasticmapreduce:DescribeCluster
  • elasticmapreduce:RunJobFlow
  • elasticmapreduce:TerminateJobFlows
  • ec2:DescribeSubnets
  • ec2:DescribeVpcs
  • ec2:CreateSecurityGroup
  • ec2:AuthorizeSecurityGroupIngress
  • ec2:AuthorizeSecurityGroupEgress
Network requirements for private EMR clusters

Before launching EMR in private subnets, ensure your VPC has:

  1. NAT Gateway or NAT Instance - EMR nodes need outbound internet access for package downloads and AWS API calls (alternatively, use VPC endpoints)

  2. VPC Endpoints (recommended) - Create endpoints for:

    • com.amazonaws.us-east-1.s3 (Gateway endpoint)
    • com.amazonaws.us-east-1.elasticmapreduce (Interface endpoint)
    • com.amazonaws.us-east-1.logs (Interface endpoint, for CloudWatch)
    • com.amazonaws.us-east-1.monitoring (Interface endpoint, for CloudWatch metrics)
  3. Service Access Security Group - Required for EMR to communicate with clusters in private subnets. EMR creates this automatically if you enable "Use security group for service access."

  4. DNS support - VPC must have DNS resolution and DNS hostnames enabled.

AWS Console Method

Important: Existing EMR clusters cannot have their network configuration changed. You must terminate the current cluster and launch a new one in a private subnet.

Step 1: Identify affected clusters

  1. Go to EMR Console in us-east-1
  2. Click Clusters in the left sidebar
  3. For each active cluster, click the cluster name
  4. Click the Instances tab
  5. Check if any instances show a Public IP address
  6. Note the cluster name and configuration for recreation

Step 2: Create a new cluster in a private subnet

  1. In the EMR Console, click Create cluster
  2. Choose Go to advanced options (link at top of page)
  3. Configure your software and steps as needed, then click Next
  4. On the Hardware Configuration step:
    • Under Network, select your VPC
    • Under EC2 Subnet, select a private subnet (one without a route to an Internet Gateway)
  5. Click Next to continue configuration
  6. On the General Cluster Settings step:
    • Under Security Options, ensure Use security group for service access is checked (required for private subnets)
  7. Complete the remaining configuration and click Create cluster

Step 3: Verify and migrate

  1. Wait for the new cluster to reach Waiting or Running state
  2. Verify the instances have no public IPs (check the Instances tab)
  3. Migrate any jobs or data from the old cluster
  4. Terminate the old cluster:
    • Select the old cluster
    • Click Terminate
    • Confirm termination
AWS CLI (optional)

List clusters and check for public IPs

List all active clusters:

aws emr list-clusters \
--active \
--query "Clusters[*].[Id,Name,Status.State]" \
--output table \
--region us-east-1

Get cluster details including network configuration:

aws emr describe-cluster \
--cluster-id <cluster-id> \
--query "Cluster.{Name:Name,Ec2InstanceAttributes:Ec2InstanceAttributes}" \
--region us-east-1

Look for Ec2SubnetId in the output. If the subnet has a route to an Internet Gateway and auto-assign public IP is enabled, instances will have public IPs.

Create a cluster in a private subnet

aws emr create-cluster \
--name "Private-EMR-Cluster" \
--release-label emr-7.0.0 \
--applications Name=Spark Name=Hadoop \
--instance-type m5.xlarge \
--instance-count 3 \
--use-default-roles \
--ec2-attributes SubnetId=subnet-xxxxxxxxxxxxxxxxx,ServiceAccessSecurityGroup=sg-xxxxxxxxxxxxxxxxx,EmrManagedMasterSecurityGroup=sg-xxxxxxxxxxxxxxxxx,EmrManagedSlaveSecurityGroup=sg-xxxxxxxxxxxxxxxxx \
--region us-east-1

Replace the placeholders:

  • subnet-xxxxxxxxxxxxxxxxx - Your private subnet ID
  • Security group IDs - Your EMR security groups (or let EMR create defaults by omitting them)

Note: The ServiceAccessSecurityGroup is required for clusters in private subnets. If not specified, EMR will create one automatically.

Terminate the old cluster

aws emr terminate-clusters \
--cluster-ids <old-cluster-id> \
--region us-east-1
CloudFormation (optional)
AWSTemplateFormatVersion: '2010-09-09'
Description: EMR cluster in private subnet without public IPs

Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC for the EMR cluster

PrivateSubnetId:
Type: AWS::EC2::Subnet::Id
Description: Private subnet for EMR cluster (no IGW route)

ClusterName:
Type: String
Default: PrivateEMRCluster
Description: Name of the EMR cluster

ReleaseLabel:
Type: String
Default: emr-7.0.0
Description: EMR release version

InstanceType:
Type: String
Default: m5.xlarge
Description: Instance type for master and core nodes

Resources:
# Security group for EMR master node
EMRMasterSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: EMR Master Security Group
VpcId: !Ref VpcId
SecurityGroupEgress:
- IpProtocol: -1
CidrIp: 0.0.0.0/0
Description: Allow all outbound traffic
Tags:
- Key: Name
Value: !Sub "${ClusterName}-master-sg"

# Security group for EMR worker nodes
EMRSlaveSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: EMR Slave Security Group
VpcId: !Ref VpcId
SecurityGroupEgress:
- IpProtocol: -1
CidrIp: 0.0.0.0/0
Description: Allow all outbound traffic
Tags:
- Key: Name
Value: !Sub "${ClusterName}-slave-sg"

# Security group for EMR service access (required for private subnets)
EMRServiceAccessSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: EMR Service Access Security Group
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 9443
ToPort: 9443
SourceSecurityGroupId: !Ref EMRMasterSecurityGroup
Description: Allow EMR service access from master
Tags:
- Key: Name
Value: !Sub "${ClusterName}-service-access-sg"

# Allow master to communicate with service access SG
MasterToServiceAccessIngress:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref EMRMasterSecurityGroup
IpProtocol: tcp
FromPort: 8443
ToPort: 8443
SourceSecurityGroupId: !Ref EMRServiceAccessSecurityGroup
Description: Allow traffic from service access SG

# Allow communication between master and slave
MasterSlaveIngress:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref EMRMasterSecurityGroup
IpProtocol: -1
SourceSecurityGroupId: !Ref EMRSlaveSecurityGroup

SlaveMasterIngress:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref EMRSlaveSecurityGroup
IpProtocol: -1
SourceSecurityGroupId: !Ref EMRMasterSecurityGroup

# EMR service role
EMRServiceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: elasticmapreduce.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole

# EC2 instance profile for EMR nodes
EMREC2Role:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role

EMREC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Roles:
- !Ref EMREC2Role

# EMR Cluster in private subnet
EMRCluster:
Type: AWS::EMR::Cluster
Properties:
Name: !Ref ClusterName
ReleaseLabel: !Ref ReleaseLabel
Applications:
- Name: Spark
- Name: Hadoop
ServiceRole: !Ref EMRServiceRole
JobFlowRole: !Ref EMREC2InstanceProfile
Instances:
Ec2SubnetId: !Ref PrivateSubnetId
EmrManagedMasterSecurityGroup: !Ref EMRMasterSecurityGroup
EmrManagedSlaveSecurityGroup: !Ref EMRSlaveSecurityGroup
ServiceAccessSecurityGroup: !Ref EMRServiceAccessSecurityGroup
MasterInstanceGroup:
InstanceCount: 1
InstanceType: !Ref InstanceType
Market: ON_DEMAND
Name: Master
CoreInstanceGroup:
InstanceCount: 2
InstanceType: !Ref InstanceType
Market: ON_DEMAND
Name: Core
VisibleToAllUsers: true
Tags:
- Key: Name
Value: !Ref ClusterName

Outputs:
ClusterId:
Description: EMR Cluster ID
Value: !Ref EMRCluster

MasterSecurityGroupId:
Description: Master node security group
Value: !Ref EMRMasterSecurityGroup

SlaveSecurityGroupId:
Description: Worker node security group
Value: !Ref EMRSlaveSecurityGroup

Deploy with:

aws cloudformation deploy \
--template-file emr-private-cluster.yaml \
--stack-name private-emr-cluster \
--parameter-overrides \
VpcId=vpc-xxxxxxxxxxxxxxxxx \
PrivateSubnetId=subnet-xxxxxxxxxxxxxxxxx \
--capabilities CAPABILITY_IAM \
--region us-east-1
Terraform (optional)
variable "vpc_id" {
description = "VPC ID for the EMR cluster"
type = string
}

variable "private_subnet_id" {
description = "Private subnet ID (no IGW route)"
type = string
}

variable "cluster_name" {
description = "Name of the EMR cluster"
type = string
default = "PrivateEMRCluster"
}

variable "release_label" {
description = "EMR release version"
type = string
default = "emr-7.0.0"
}

variable "instance_type" {
description = "Instance type for master and core nodes"
type = string
default = "m5.xlarge"
}

# Security group for EMR master node
resource "aws_security_group" "emr_master" {
name = "${var.cluster_name}-master-sg"
description = "EMR Master Security Group"
vpc_id = var.vpc_id

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
description = "Allow all outbound traffic"
}

tags = {
Name = "${var.cluster_name}-master-sg"
}
}

# Security group for EMR worker nodes
resource "aws_security_group" "emr_slave" {
name = "${var.cluster_name}-slave-sg"
description = "EMR Slave Security Group"
vpc_id = var.vpc_id

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
description = "Allow all outbound traffic"
}

tags = {
Name = "${var.cluster_name}-slave-sg"
}
}

# Security group for EMR service access (required for private subnets)
resource "aws_security_group" "emr_service_access" {
name = "${var.cluster_name}-service-access-sg"
description = "EMR Service Access Security Group"
vpc_id = var.vpc_id

ingress {
from_port = 9443
to_port = 9443
protocol = "tcp"
security_groups = [aws_security_group.emr_master.id]
description = "Allow EMR service access from master"
}

tags = {
Name = "${var.cluster_name}-service-access-sg"
}
}

# Allow master to communicate with service access SG
resource "aws_security_group_rule" "master_service_access" {
type = "ingress"
from_port = 8443
to_port = 8443
protocol = "tcp"
security_group_id = aws_security_group.emr_master.id
source_security_group_id = aws_security_group.emr_service_access.id
description = "Allow traffic from service access SG"
}

# Allow communication between master and slave
resource "aws_security_group_rule" "master_slave_ingress" {
type = "ingress"
from_port = 0
to_port = 0
protocol = "-1"
security_group_id = aws_security_group.emr_master.id
source_security_group_id = aws_security_group.emr_slave.id
}

resource "aws_security_group_rule" "slave_master_ingress" {
type = "ingress"
from_port = 0
to_port = 0
protocol = "-1"
security_group_id = aws_security_group.emr_slave.id
source_security_group_id = aws_security_group.emr_master.id
}

# EMR service role
resource "aws_iam_role" "emr_service_role" {
name = "${var.cluster_name}-service-role"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "elasticmapreduce.amazonaws.com"
}
Action = "sts:AssumeRole"
}
]
})
}

resource "aws_iam_role_policy_attachment" "emr_service_policy" {
role = aws_iam_role.emr_service_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"
}

# EC2 instance profile for EMR nodes
resource "aws_iam_role" "emr_ec2_role" {
name = "${var.cluster_name}-ec2-role"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
Action = "sts:AssumeRole"
}
]
})
}

resource "aws_iam_role_policy_attachment" "emr_ec2_policy" {
role = aws_iam_role.emr_ec2_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"
}

resource "aws_iam_instance_profile" "emr_ec2_profile" {
name = "${var.cluster_name}-ec2-profile"
role = aws_iam_role.emr_ec2_role.name
}

# EMR Cluster in private subnet
resource "aws_emr_cluster" "cluster" {
name = var.cluster_name
release_label = var.release_label
applications = ["Spark", "Hadoop"]

service_role = aws_iam_role.emr_service_role.arn

ec2_attributes {
subnet_id = var.private_subnet_id
emr_managed_master_security_group = aws_security_group.emr_master.id
emr_managed_slave_security_group = aws_security_group.emr_slave.id
service_access_security_group = aws_security_group.emr_service_access.id
instance_profile = aws_iam_instance_profile.emr_ec2_profile.arn
}

master_instance_group {
instance_type = var.instance_type
instance_count = 1
name = "Master"
}

core_instance_group {
instance_type = var.instance_type
instance_count = 2
name = "Core"
}

visible_to_all_users = true

tags = {
Name = var.cluster_name
}
}

output "cluster_id" {
description = "EMR Cluster ID"
value = aws_emr_cluster.cluster.id
}

output "master_security_group_id" {
description = "Master node security group"
value = aws_security_group.emr_master.id
}

output "slave_security_group_id" {
description = "Worker node security group"
value = aws_security_group.emr_slave.id
}

Deploy with:

terraform init
terraform plan -var="vpc_id=vpc-xxxxxxxxxxxxxxxxx" -var="private_subnet_id=subnet-xxxxxxxxxxxxxxxxx"
terraform apply -var="vpc_id=vpc-xxxxxxxxxxxxxxxxx" -var="private_subnet_id=subnet-xxxxxxxxxxxxxxxxx"

Verification

After launching the new cluster in a private subnet:

  1. Confirm no public IPs:

    • Go to EMR Console > Clusters
    • Click your cluster name
    • Click the Instances tab
    • Verify no instances show a Public IP address
  2. Verify cluster functionality:

    • Wait for cluster status to show Waiting or Running
    • Submit a test job to verify the cluster works correctly
CLI verification commands

Check cluster network configuration:

aws emr describe-cluster \
--cluster-id <cluster-id> \
--query "Cluster.Ec2InstanceAttributes.{SubnetId:Ec2SubnetId,AvailabilityZone:Ec2AvailabilityZone}" \
--region us-east-1

Verify the subnet is private (no route to Internet Gateway):

aws ec2 describe-route-tables \
--filters "Name=association.subnet-id,Values=<subnet-id>" \
--query "RouteTables[*].Routes[?GatewayId!=null && starts_with(GatewayId,'igw-')]" \
--region us-east-1

If this returns empty results [], the subnet has no route to an Internet Gateway (it is a private subnet).

List all active clusters to re-run Prowler check:

aws emr list-clusters \
--active \
--query "Clusters[*].[Id,Name]" \
--output table \
--region us-east-1

Additional Resources

Notes

  • No in-place remediation: You cannot change the network configuration of a running EMR cluster. You must terminate and recreate the cluster in a private subnet.

  • Job migration: Before terminating the old cluster, ensure all running jobs complete or migrate job definitions to the new cluster.

  • Data persistence: EMR cluster storage (HDFS) is ephemeral. Ensure critical data is stored in S3 before terminating the old cluster.

  • Enable Block Public Access: As an account-level control, enable EMR Block Public Access to prevent future clusters from being launched with public IPs. Go to EMR Console > Block public access and enable it.

  • VPC endpoints reduce costs: Using VPC endpoints for S3 and other AWS services eliminates NAT Gateway data transfer charges for AWS traffic.

  • Session Manager for access: Use AWS Systems Manager Session Manager to access EMR master nodes without needing SSH over public networks. Install the SSM agent via a bootstrap action.

  • Security groups still matter: Even in private subnets, follow least-privilege principles for security groups. Do not allow 0.0.0.0/0 ingress rules.