SageMaker Training Jobs Inter-Container Traffic Encryption
Overview
This check verifies that Amazon SageMaker training jobs have inter-container traffic encryption enabled. When you run distributed training jobs across multiple instances, data flows between containers. Enabling encryption protects this data in transit.
Risk
Without inter-container traffic encryption, sensitive information exchanged between training containers could be exposed:
- Data exposure: Training data, model weights, and intermediate results travel unencrypted between instances
- Credential theft: Authentication tokens or session data could be intercepted
- Model tampering: An attacker could potentially modify gradients or training parameters
- Compliance violations: Many regulations require encryption of data in transit
Remediation Steps
Prerequisites
- Access to the AWS Console with permissions to create SageMaker training jobs
- An IAM role that SageMaker can assume for training
Important: Existing training jobs cannot be modified. You must create new training jobs with encryption enabled. If you have running jobs without encryption, you will need to stop them and start new ones with the correct settings.
AWS Console Method
- Sign in to the AWS Management Console
- Navigate to Amazon SageMaker (search for "SageMaker" in the top search bar)
- In the left navigation pane, click Training > Training jobs
- Click Create training job
- Fill in the required configuration:
- Enter a Training job name
- Select your IAM role
- Configure your algorithm, input/output data, and compute resources
- In the Network section, find the checkbox labeled Enable inter-container traffic encryption
- Check this box to enable encryption
- Complete the remaining configuration and click Create training job
AWS CLI (optional)
When creating training jobs via the AWS CLI, include the --enable-inter-container-traffic-encryption flag.
List existing training jobs to audit:
aws sagemaker list-training-jobs \
--region us-east-1 \
--query 'TrainingJobSummaries[*].TrainingJobName' \
--output table
Check if a specific training job has encryption enabled:
aws sagemaker describe-training-job \
--training-job-name <your-training-job-name> \
--region us-east-1 \
--query 'EnableInterContainerTrafficEncryption'
This returns true if encryption is enabled, or false if not.
Create a new training job with encryption enabled:
aws sagemaker create-training-job \
--training-job-name my-secure-training-job \
--role-arn arn:aws:iam::<account-id>:role/<sagemaker-role> \
--algorithm-specification \
TrainingImage=<training-image-uri>,TrainingInputMode=File \
--input-data-config '[{
"ChannelName": "training",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://<your-bucket>/training-data/",
"S3DataDistributionType": "FullyReplicated"
}
}
}]' \
--output-data-config S3OutputPath=s3://<your-bucket>/output/ \
--resource-config \
InstanceCount=2,InstanceType=ml.m5.large,VolumeSizeInGB=30 \
--stopping-condition MaxRuntimeInSeconds=86400 \
--enable-inter-container-traffic-encryption \
--region us-east-1
Replace the placeholder values:
<account-id>: Your AWS account ID<sagemaker-role>: Name of your SageMaker execution role<training-image-uri>: Docker image URI for your training algorithm<your-bucket>: Your S3 bucket name
SageMaker Python SDK (recommended for programmatic use)
SageMaker training jobs are ephemeral (they run once and complete), so they are typically not managed through CloudFormation or Terraform. Instead, use the SageMaker Python SDK for programmatic job creation.
Install the SageMaker SDK:
pip install sagemaker
Create a training job with encryption enabled:
import sagemaker
from sagemaker.estimator import Estimator
# Initialize session
session = sagemaker.Session()
role = "arn:aws:iam::<account-id>:role/<sagemaker-role>"
# Create estimator with inter-container encryption
estimator = Estimator(
image_uri="<training-image-uri>",
role=role,
instance_count=2,
instance_type="ml.m5.large",
output_path="s3://<your-bucket>/output/",
sagemaker_session=session,
# Enable inter-container traffic encryption
encrypt_inter_container_traffic=True
)
# Start training
estimator.fit({
"training": "s3://<your-bucket>/training-data/"
})
Key parameter: encrypt_inter_container_traffic=True
This parameter ensures all communication between training containers is encrypted using TLS 1.2.
Enforcing encryption via IAM Policy
To prevent team members from creating training jobs without encryption, attach an IAM policy that denies unencrypted job creation:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencryptedTrainingJobs",
"Effect": "Deny",
"Action": "sagemaker:CreateTrainingJob",
"Resource": "*",
"Condition": {
"Bool": {
"sagemaker:InterContainerTrafficEncryption": "false"
}
}
}
]
}
Attach this policy to IAM users, groups, or roles that create SageMaker training jobs. Any attempt to create a training job without EnableInterContainerTrafficEncryption set to true will be denied.
Service Control Policy (SCP) for Organizations
For organization-wide enforcement, apply an SCP to require inter-container encryption:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RequireInterContainerEncryption",
"Effect": "Deny",
"Action": "sagemaker:CreateTrainingJob",
"Resource": "*",
"Condition": {
"Bool": {
"sagemaker:InterContainerTrafficEncryption": "false"
}
}
}
]
}
This prevents any account in your organization from creating unencrypted training jobs.
Verification
After creating a training job with encryption enabled:
- Go to Amazon SageMaker > Training > Training jobs in the AWS Console
- Click on your training job name
- In the Job settings section, verify that Inter-container traffic encryption shows Enabled
CLI Verification
aws sagemaker describe-training-job \
--training-job-name <your-training-job-name> \
--region us-east-1 \
--query '{
JobName: TrainingJobName,
Status: TrainingJobStatus,
InterContainerEncryption: EnableInterContainerTrafficEncryption
}'
Expected output for a compliant job:
{
"JobName": "my-secure-training-job",
"Status": "Completed",
"InterContainerEncryption": true
}
Audit all recent training jobs:
for job in $(aws sagemaker list-training-jobs \
--region us-east-1 \
--max-results 50 \
--query 'TrainingJobSummaries[*].TrainingJobName' \
--output text); do
encryption=$(aws sagemaker describe-training-job \
--training-job-name "$job" \
--region us-east-1 \
--query 'EnableInterContainerTrafficEncryption' \
--output text)
echo "$job: InterContainerEncryption=$encryption"
done
Additional Resources
- AWS Documentation: Protect Communications Between ML Compute Instances in a Distributed Training Job
- SageMaker Security Best Practices
- SageMaker Python SDK - Estimator Class
- IAM Condition Keys for SageMaker
Notes
-
Existing jobs cannot be modified: Inter-container encryption must be enabled at job creation time. To remediate existing non-compliant jobs, you must create new jobs with the setting enabled.
-
Performance impact: Encryption adds minimal overhead. AWS uses hardware-accelerated TLS, so the impact on training time is negligible for most workloads.
-
Single-instance jobs: If your training job uses only one instance (
InstanceCount=1), inter-container encryption has no effect since there is no inter-container traffic. However, it is still a best practice to enable it for consistency. -
VPC considerations: Inter-container encryption works regardless of whether your training job runs in a VPC. However, combining it with VPC isolation provides defense in depth.
-
Distributed training frameworks: This setting applies to all distributed training jobs, including those using built-in algorithms, custom containers, and frameworks like PyTorch Distributed, Horovod, or TensorFlow distributed strategies.