Skip to main content

SageMaker Training Jobs Volume and Output Encryption Enabled

Overview

This check verifies that Amazon SageMaker training jobs use customer-managed KMS keys (CMK) for encrypting:

  1. Storage volumes - The EBS volumes attached to ML compute instances that store training data, checkpoints, and logs
  2. Output data - The model artifacts stored in S3 after training completes

Enabling KMS encryption gives you control over your encryption keys, including the ability to manage key policies, enable automatic rotation, and revoke access when needed.

Risk

Without customer-managed KMS encryption:

  • Limited control - You cannot manage encryption key policies or rotation schedules
  • No key revocation - You lose the ability to immediately revoke access to training data by disabling the key
  • Compliance gaps - Many regulatory frameworks require customer-managed encryption keys for sensitive workloads
  • Data exposure - Training artifacts (which may contain sensitive data patterns) could be accessed through snapshots or privileged access without additional protection

Remediation Steps

Prerequisites

  • AWS account access with permissions to create SageMaker training jobs
  • A KMS key that SageMaker can use (your SageMaker execution role must have kms:Encrypt, kms:Decrypt, and kms:GenerateDataKey permissions on the key)
How to create a KMS key for SageMaker

If you do not have a KMS key, create one in the AWS Console:

  1. Go to AWS KMS > Customer managed keys
  2. Click Create key
  3. Choose Symmetric key type
  4. Give it an alias like sagemaker-training-key
  5. Add your SageMaker execution role to the key policy with these permissions:
    • kms:Encrypt
    • kms:Decrypt
    • kms:GenerateDataKey
    • kms:DescribeKey

Or use the AWS CLI:

# Create the KMS key
aws kms create-key \
--description "KMS key for SageMaker training job encryption" \
--region us-east-1

# Create an alias (optional but recommended)
aws kms create-alias \
--alias-name alias/sagemaker-training-key \
--target-key-id <your-key-id> \
--region us-east-1

AWS Console Method

SageMaker training jobs cannot be modified after creation. To fix an unencrypted training job, you must create a new training job with encryption enabled.

  1. Open the Amazon SageMaker console
  2. In the left navigation, choose Training > Training jobs
  3. Find the training job you want to recreate with encryption
  4. Select the job and click Clone to create a new job with the same settings
  5. In the Resource configuration section:
    • Find Volume encryption key
    • Select your customer-managed KMS key from the dropdown
  6. In the Output data configuration section:
    • Find Encryption key
    • Select the same (or a different) customer-managed KMS key
  7. Complete any other required fields
  8. Click Create training job
  9. Once the new job completes successfully, delete the old unencrypted job if no longer needed
AWS CLI method

Use the create-training-job command with the VolumeKmsKeyId parameter in the resource configuration and KmsKeyId in the output data configuration.

List existing training jobs to identify unencrypted ones:

aws sagemaker list-training-jobs \
--region us-east-1 \
--output table

Describe a training job to check its encryption settings:

aws sagemaker describe-training-job \
--training-job-name <your-training-job-name> \
--region us-east-1 \
--query '{
TrainingJobName: TrainingJobName,
VolumeKmsKeyId: ResourceConfig.VolumeKmsKeyId,
OutputKmsKeyId: OutputDataConfig.KmsKeyId
}'

If VolumeKmsKeyId or OutputKmsKeyId is null, the job is not encrypted with a customer-managed key.

Create a new training job with encryption enabled:

aws sagemaker create-training-job \
--training-job-name my-encrypted-training-job \
--role-arn arn:aws:iam::123456789012:role/SageMakerExecutionRole \
--algorithm-specification '{
"TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-training-image:latest",
"TrainingInputMode": "File"
}' \
--input-data-config '[{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://my-bucket/training-data/",
"S3DataDistributionType": "FullyReplicated"
}
}
}]' \
--output-data-config '{
"S3OutputPath": "s3://my-bucket/model-output/",
"KmsKeyId": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
}' \
--resource-config '{
"InstanceType": "ml.m5.xlarge",
"InstanceCount": 1,
"VolumeSizeInGB": 50,
"VolumeKmsKeyId": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
}' \
--stopping-condition '{"MaxRuntimeInSeconds": 86400}' \
--region us-east-1

Key parameters for encryption:

ParameterLocationPurpose
VolumeKmsKeyId--resource-configEncrypts the EBS volume attached to training instances
KmsKeyId--output-data-configEncrypts model artifacts stored in S3
Python SDK (Boto3) method

For programmatic training job creation, use the SageMaker Python SDK or Boto3. This is the most common approach for production ML pipelines.

Using Boto3:

import boto3

sagemaker = boto3.client('sagemaker', region_name='us-east-1')

kms_key_arn = 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012'

response = sagemaker.create_training_job(
TrainingJobName='my-encrypted-training-job',
RoleArn='arn:aws:iam::123456789012:role/SageMakerExecutionRole',
AlgorithmSpecification={
'TrainingImage': '123456789012.dkr.ecr.us-east-1.amazonaws.com/my-image:latest',
'TrainingInputMode': 'File'
},
InputDataConfig=[{
'ChannelName': 'train',
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': 's3://my-bucket/training-data/',
'S3DataDistributionType': 'FullyReplicated'
}
}
}],
OutputDataConfig={
'S3OutputPath': 's3://my-bucket/model-output/',
'KmsKeyId': kms_key_arn # Encrypts output artifacts
},
ResourceConfig={
'InstanceType': 'ml.m5.xlarge',
'InstanceCount': 1,
'VolumeSizeInGB': 50,
'VolumeKmsKeyId': kms_key_arn # Encrypts training volume
},
StoppingCondition={
'MaxRuntimeInSeconds': 86400
}
)

print(f"Training job ARN: {response['TrainingJobArn']}")

Using SageMaker Python SDK (higher-level API):

from sagemaker.estimator import Estimator

kms_key_arn = 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012'

estimator = Estimator(
image_uri='123456789012.dkr.ecr.us-east-1.amazonaws.com/my-image:latest',
role='arn:aws:iam::123456789012:role/SageMakerExecutionRole',
instance_count=1,
instance_type='ml.m5.xlarge',
volume_size=50,
volume_kms_key=kms_key_arn, # Encrypts training volume
output_kms_key=kms_key_arn, # Encrypts output artifacts
output_path='s3://my-bucket/model-output/'
)

estimator.fit({'train': 's3://my-bucket/training-data/'})
Infrastructure as Code note

Important: SageMaker training jobs are ephemeral resources that run to completion and then terminate. They are not typically managed as persistent infrastructure in CloudFormation or Terraform.

Instead, training jobs are usually:

  • Created programmatically via Python SDK or AWS CLI
  • Orchestrated through SageMaker Pipelines
  • Triggered by CI/CD pipelines or AWS Step Functions

If you use SageMaker Pipelines, you can define encryption settings in your pipeline definition:

from sagemaker.workflow.steps import TrainingStep
from sagemaker.inputs import TrainingInput

# In your pipeline definition
training_step = TrainingStep(
name="MyTrainingStep",
estimator=estimator, # Estimator with volume_kms_key and output_kms_key set
inputs={
"train": TrainingInput(s3_data="s3://my-bucket/training-data/")
}
)

Verification

After creating the new training job with encryption:

  1. Go to the SageMaker console
  2. Navigate to Training > Training jobs
  3. Select your new training job
  4. In the job details, verify:
    • Volume encryption key shows your KMS key ARN (under Resource configuration)
    • Encryption key shows your KMS key ARN (under Output data configuration)
CLI verification
aws sagemaker describe-training-job \
--training-job-name my-encrypted-training-job \
--region us-east-1 \
--query '{
TrainingJobName: TrainingJobName,
Status: TrainingJobStatus,
VolumeKmsKeyId: ResourceConfig.VolumeKmsKeyId,
OutputKmsKeyId: OutputDataConfig.KmsKeyId
}'

Expected output (both KMS key fields should be populated):

{
"TrainingJobName": "my-encrypted-training-job",
"Status": "Completed",
"VolumeKmsKeyId": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012",
"OutputKmsKeyId": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
}

Additional Resources

Notes

  • Existing jobs cannot be modified - You must create new training jobs with encryption; existing jobs cannot be updated
  • Nitro instances with local storage - Some Nitro-based instance types (like ml.p4d.24xlarge) have local NVMe storage that is encrypted using a hardware module. You cannot specify a VolumeKmsKeyId for these instances
  • Key permissions - Your SageMaker execution role must have permissions to use the KMS key (kms:Encrypt, kms:Decrypt, kms:GenerateDataKey)
  • Cross-account keys - If using a KMS key from another AWS account, ensure the key policy grants access to your SageMaker execution role
  • Cost - Using KMS keys incurs additional costs for API requests. See AWS KMS pricing