SageMaker Training Jobs VPC Settings Configured
Overview
This check verifies that Amazon SageMaker training jobs are configured to run within a Virtual Private Cloud (VPC). When VPC settings are enabled, SageMaker places elastic network interfaces in your VPC subnets, ensuring that training data and model artifacts remain within your private network.
Without VPC configuration, training containers use public networking with unrestricted internet access, which can expose sensitive data and bypass your network security controls.
Risk
If SageMaker training jobs are not configured with VPC settings:
- Data exfiltration: Training data or model artifacts could be sent to unauthorized external destinations
- Malware exposure: Training containers could download malicious code from the internet
- Compliance violations: Sensitive data may traverse public networks, violating regulatory requirements
- Reduced visibility: Network traffic cannot be monitored or controlled through VPC flow logs and security groups
- Weakened security posture: You lose the ability to apply fine-grained network controls to ML workloads
Remediation Steps
Prerequisites
- AWS account access with permissions to create SageMaker training jobs
- An existing VPC with at least one private subnet
- A security group configured for SageMaker workloads
- (Optional) VPC endpoints for S3, ECR, and SageMaker API if using network isolation
VPC Endpoint Setup (recommended for production)
For training jobs with network isolation, create VPC endpoints for required services:
- S3 Gateway Endpoint: For accessing training data and storing outputs
- ECR Endpoints: For pulling container images (both
ecr.apiandecr.dkr) - SageMaker API Endpoint: For SageMaker service communication
- CloudWatch Logs Endpoint: For logging (optional but recommended)
You can create these endpoints in the VPC console under Endpoints.
AWS Console Method
Note: SageMaker training jobs are one-time operations. You configure VPC settings when creating a new training job. Existing training jobs cannot be modified.
Creating a New Training Job with VPC Configuration
- Open the Amazon SageMaker console
- In the left navigation, choose Training > Training jobs
- Choose Create training job
- Fill in the basic job details:
- Enter a Training job name
- Select an IAM role with appropriate permissions
- Configure the algorithm and data settings as needed
- Scroll to the Network section
- For VPC, select your VPC from the dropdown
- For Subnet(s), select one or more private subnets
- For Security group(s), select security groups that allow required traffic
- (Optional) Enable Network isolation if you want to completely block internet access
- Complete the remaining configuration and choose Create training job
Checking Existing Training Jobs
- Open the Amazon SageMaker console
- In the left navigation, choose Training > Training jobs
- Select a training job to view its details
- In the Network section, verify that VPC, Subnets, and Security groups are configured
- If a job shows no VPC configuration, note it for future remediation (you cannot modify existing jobs)
AWS CLI (optional)
List Training Jobs Without VPC Configuration
# List all training jobs
aws sagemaker list-training-jobs \
--region us-east-1 \
--query 'TrainingJobSummaries[*].TrainingJobName' \
--output text
Check VPC Configuration for a Specific Job
# Replace <training-job-name> with your actual job name
aws sagemaker describe-training-job \
--training-job-name <training-job-name> \
--region us-east-1 \
--query '{JobName: TrainingJobName, VpcConfig: VpcConfig}'
If VpcConfig is null or empty, the job was not configured with VPC settings.
Create a Training Job with VPC Configuration
aws sagemaker create-training-job \
--training-job-name my-vpc-training-job \
--role-arn arn:aws:iam::<account-id>:role/<sagemaker-execution-role> \
--algorithm-specification '{
"TrainingImage": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/my-training-image:latest",
"TrainingInputMode": "File"
}' \
--input-data-config '[{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://<bucket-name>/training-data/",
"S3DataDistributionType": "FullyReplicated"
}
}
}]' \
--output-data-config '{
"S3OutputPath": "s3://<bucket-name>/output/"
}' \
--resource-config '{
"InstanceType": "ml.m5.large",
"InstanceCount": 1,
"VolumeSizeInGB": 30
}' \
--vpc-config '{
"Subnets": ["subnet-xxxxxxxxxxxxxxxxx"],
"SecurityGroupIds": ["sg-xxxxxxxxxxxxxxxxx"]
}' \
--stopping-condition '{
"MaxRuntimeInSeconds": 86400
}' \
--enable-inter-container-traffic-encryption \
--region us-east-1
Replace placeholders:
<account-id>: Your AWS account ID<sagemaker-execution-role>: IAM role for SageMaker<bucket-name>: Your S3 bucket namesubnet-xxxxxxxxxxxxxxxxx: Your VPC subnet IDsg-xxxxxxxxxxxxxxxxx: Your security group ID
CloudFormation - IAM Policy Enforcement (optional)
Since SageMaker training jobs are operational resources (not infrastructure), CloudFormation cannot directly create them. However, you can enforce VPC configuration through IAM policies.
IAM Policy to Enforce VPC Settings
This policy denies the creation of training jobs that lack VPC configuration:
AWSTemplateFormatVersion: '2010-09-09'
Description: IAM Policy to enforce VPC configuration for SageMaker Training Jobs
Parameters:
AllowedSubnets:
Type: CommaDelimitedList
Description: List of allowed subnet IDs that training jobs must use
AllowedSecurityGroups:
Type: CommaDelimitedList
Description: List of allowed security group IDs that training jobs must use
Resources:
SageMakerVPCEnforcementPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: SageMaker-VPC-Enforcement-Policy
Description: Enforces VPC configuration for SageMaker training jobs
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: DenyTrainingJobsWithoutVPC
Effect: Deny
Action:
- sagemaker:CreateTrainingJob
Resource: '*'
Condition:
'Null':
'sagemaker:VpcSubnets': 'true'
- Sid: AllowTrainingJobsWithApprovedVPC
Effect: Allow
Action:
- sagemaker:CreateTrainingJob
Resource: '*'
Condition:
ForAllValues:StringEquals:
'sagemaker:VpcSubnets': !Ref AllowedSubnets
'sagemaker:VpcSecurityGroupIds': !Ref AllowedSecurityGroups
Outputs:
PolicyArn:
Description: ARN of the VPC enforcement policy
Value: !Ref SageMakerVPCEnforcementPolicy
Deploy the Policy
aws cloudformation deploy \
--template-file sagemaker-vpc-enforcement.yaml \
--stack-name sagemaker-vpc-enforcement \
--parameter-overrides \
AllowedSubnets="subnet-xxxxxxxxx,subnet-yyyyyyyyy" \
AllowedSecurityGroups="sg-xxxxxxxxx" \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1
Attach this policy to IAM roles or users who create SageMaker training jobs.
Terraform - IAM Policy Enforcement (optional)
IAM Policy to Enforce VPC Settings
variable "allowed_subnets" {
description = "List of allowed subnet IDs for training jobs"
type = list(string)
}
variable "allowed_security_groups" {
description = "List of allowed security group IDs for training jobs"
type = list(string)
}
data "aws_iam_policy_document" "sagemaker_vpc_enforcement" {
statement {
sid = "DenyTrainingJobsWithoutVPC"
effect = "Deny"
actions = [
"sagemaker:CreateTrainingJob"
]
resources = ["*"]
condition {
test = "Null"
variable = "sagemaker:VpcSubnets"
values = ["true"]
}
}
statement {
sid = "AllowTrainingJobsWithApprovedVPC"
effect = "Allow"
actions = [
"sagemaker:CreateTrainingJob"
]
resources = ["*"]
condition {
test = "ForAllValues:StringEquals"
variable = "sagemaker:VpcSubnets"
values = var.allowed_subnets
}
condition {
test = "ForAllValues:StringEquals"
variable = "sagemaker:VpcSecurityGroupIds"
values = var.allowed_security_groups
}
}
}
resource "aws_iam_policy" "sagemaker_vpc_enforcement" {
name = "SageMaker-VPC-Enforcement-Policy"
description = "Enforces VPC configuration for SageMaker training jobs"
policy = data.aws_iam_policy_document.sagemaker_vpc_enforcement.json
}
output "policy_arn" {
description = "ARN of the VPC enforcement policy"
value = aws_iam_policy.sagemaker_vpc_enforcement.arn
}
Example tfvars
allowed_subnets = [
"subnet-xxxxxxxxxxxxxxxxx",
"subnet-yyyyyyyyyyyyyyyyy"
]
allowed_security_groups = [
"sg-xxxxxxxxxxxxxxxxx"
]
Apply
terraform init
terraform plan
terraform apply
Attach the created policy to IAM roles or users who create SageMaker training jobs.
Verification
After creating a training job with VPC settings, verify the configuration:
- Open the SageMaker console
- Navigate to Training > Training jobs
- Select your training job
- In the Network section, confirm:
- VPC shows your selected VPC
- Subnets lists your private subnets
- Security groups shows your configured security groups
CLI Verification
aws sagemaker describe-training-job \
--training-job-name <training-job-name> \
--region us-east-1 \
--query 'VpcConfig'
Expected output (with VPC configured):
{
"SecurityGroupIds": ["sg-xxxxxxxxxxxxxxxxx"],
"Subnets": ["subnet-xxxxxxxxxxxxxxxxx"]
}
If the output is null, VPC settings are not configured.
Check Multiple Jobs
# List recent training jobs and their VPC status
for job in $(aws sagemaker list-training-jobs --region us-east-1 --max-results 10 --query 'TrainingJobSummaries[*].TrainingJobName' --output text); do
vpc=$(aws sagemaker describe-training-job --training-job-name "$job" --region us-east-1 --query 'VpcConfig' --output text)
echo "Job: $job - VPC Config: $vpc"
done
Additional Resources
- Amazon SageMaker and VPC
- Give SageMaker Training Jobs Access to Resources in Your VPC
- SageMaker Network Isolation
- SageMaker Condition Keys for IAM Policies
- VPC Endpoints for SageMaker
Notes
-
Existing jobs cannot be modified: VPC settings must be configured at job creation time. If an existing training job lacks VPC configuration, you must create a new job with the correct settings.
-
Private subnets recommended: Use private subnets (without direct internet access) for training jobs. If internet access is needed, route through a NAT gateway.
-
Security group configuration: Ensure security groups allow:
- Outbound HTTPS (443) to S3, ECR, and SageMaker endpoints
- If using distributed training, allow inter-container communication on all ports within the security group
-
Network isolation: For maximum security, enable
EnableNetworkIsolation. This completely blocks internet access, requiring VPC endpoints for all AWS services. -
Cost considerations: VPC endpoints incur additional charges. Factor this into your architecture planning.
-
IAM enforcement: To prevent non-compliant training jobs, deploy the IAM policy from the CloudFormation or Terraform sections above. This ensures all future training jobs require VPC configuration.