SageMaker Endpoint Production Variants Should Have at Least Two Initial Instances
Overview
This check verifies that your Amazon SageMaker endpoint configurations have at least two instances per production variant. Running multiple instances provides redundancy and keeps your machine learning inference services available even if one instance fails.
Risk
Running a SageMaker endpoint with only one instance creates a single point of failure. If that instance or its Availability Zone experiences problems, your inference service becomes unavailable. This can cause:
- Service outages affecting applications that depend on ML predictions
- SLA breaches if uptime commitments are not met
- Cascading failures in downstream systems waiting for inference results
Remediation Steps
Prerequisites
- AWS Console access with permissions to manage SageMaker endpoints
- Know which endpoint configurations need updating
Required IAM permissions
You need the following permissions:
sagemaker:DescribeEndpointConfigsagemaker:CreateEndpointConfigsagemaker:DeleteEndpointConfigsagemaker:UpdateEndpointsagemaker:DescribeEndpoint
AWS Console Method
Important: SageMaker endpoint configurations cannot be modified after creation. You must create a new configuration and update your endpoint to use it.
- Open the Amazon SageMaker console
- In the left navigation, expand Inference and click Endpoint configurations
- Note the settings of your existing configuration (model name, instance type, variant weight)
- Click Create endpoint configuration
- Enter a name for the new configuration
- Under Production variants, click Add variant or edit the existing variant:
- Set Initial instance count to 2 or higher
- Configure other settings to match your original configuration
- Click Create endpoint configuration
- Go to Inference > Endpoints
- Select your endpoint and click Update endpoint
- Choose your new endpoint configuration
- Click Update endpoint
- Once the endpoint status shows InService, delete the old configuration if no longer needed
AWS CLI (optional)
Step 1: View existing endpoint configuration
aws sagemaker describe-endpoint-config \
--endpoint-config-name <your-endpoint-config-name> \
--region us-east-1
Step 2: Create a new endpoint configuration with 2+ instances
aws sagemaker create-endpoint-config \
--endpoint-config-name <new-endpoint-config-name> \
--production-variants '[
{
"VariantName": "AllTraffic",
"ModelName": "<your-model-name>",
"InitialInstanceCount": 2,
"InstanceType": "ml.m5.large",
"InitialVariantWeight": 1.0
}
]' \
--region us-east-1
Step 3: Update your endpoint to use the new configuration
aws sagemaker update-endpoint \
--endpoint-name <your-endpoint-name> \
--endpoint-config-name <new-endpoint-config-name> \
--region us-east-1
Step 4: Wait for the endpoint to finish updating
aws sagemaker wait endpoint-in-service \
--endpoint-name <your-endpoint-name> \
--region us-east-1
Step 5: Delete the old configuration (optional)
aws sagemaker delete-endpoint-config \
--endpoint-config-name <old-endpoint-config-name> \
--region us-east-1
CloudFormation (optional)
AWSTemplateFormatVersion: '2010-09-09'
Description: SageMaker Endpoint Configuration with High Availability
Parameters:
EndpointConfigName:
Type: String
Description: Name for the endpoint configuration
ModelName:
Type: String
Description: Name of the SageMaker model to deploy
InstanceType:
Type: String
Default: ml.m5.large
Description: ML compute instance type
InitialInstanceCount:
Type: Number
Default: 2
MinValue: 2
Description: Number of instances (minimum 2 for high availability)
Resources:
SageMakerEndpointConfig:
Type: AWS::SageMaker::EndpointConfig
Properties:
EndpointConfigName: !Ref EndpointConfigName
ProductionVariants:
- VariantName: AllTraffic
ModelName: !Ref ModelName
InitialInstanceCount: !Ref InitialInstanceCount
InstanceType: !Ref InstanceType
InitialVariantWeight: 1.0
Outputs:
EndpointConfigArn:
Description: ARN of the endpoint configuration
Value: !Ref SageMakerEndpointConfig
Deploy with:
aws cloudformation deploy \
--template-file template.yaml \
--stack-name sagemaker-ha-endpoint-config \
--parameter-overrides \
EndpointConfigName=my-ha-endpoint-config \
ModelName=my-model \
--region us-east-1
Terraform (optional)
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
variable "endpoint_config_name" {
description = "Name for the endpoint configuration"
type = string
}
variable "model_name" {
description = "Name of the SageMaker model to deploy"
type = string
}
variable "instance_type" {
description = "ML compute instance type"
type = string
default = "ml.m5.large"
}
variable "initial_instance_count" {
description = "Number of instances (minimum 2 for high availability)"
type = number
default = 2
validation {
condition = var.initial_instance_count >= 2
error_message = "Initial instance count must be at least 2 for high availability."
}
}
resource "aws_sagemaker_endpoint_configuration" "ha_endpoint_config" {
name = var.endpoint_config_name
production_variants {
variant_name = "AllTraffic"
model_name = var.model_name
initial_instance_count = var.initial_instance_count
instance_type = var.instance_type
initial_variant_weight = 1.0
}
}
output "endpoint_config_arn" {
description = "ARN of the endpoint configuration"
value = aws_sagemaker_endpoint_configuration.ha_endpoint_config.arn
}
Verification
After updating your endpoint:
- In the SageMaker console, go to Inference > Endpoint configurations
- Click on your new configuration
- Under Production variants, confirm the Initial instance count is 2 or higher
CLI verification
aws sagemaker describe-endpoint-config \
--endpoint-config-name <your-endpoint-config-name> \
--region us-east-1 \
--query 'ProductionVariants[*].{Name:VariantName,InstanceCount:InitialInstanceCount}'
Expected output shows instance count of 2 or more:
[
{
"Name": "AllTraffic",
"InstanceCount": 2
}
]
Additional Resources
- Amazon SageMaker Endpoints Documentation
- SageMaker Endpoint Configuration API Reference
- Update Models in Production
- SageMaker High Availability Best Practices
Notes
- Endpoint configurations are immutable: You cannot edit an existing configuration. Create a new one and update your endpoint to use it.
- Cost implications: Running 2+ instances doubles (or more) your compute costs. Balance availability needs against budget constraints.
- Update process: When you update an endpoint, SageMaker deploys new instances before removing old ones, so there is no downtime during the transition.
- Consider auto-scaling: For variable workloads, consider configuring auto-scaling policies that maintain a minimum of 2 instances but can scale up during peak demand.
- Multi-AZ distribution: SageMaker automatically distributes instances across Availability Zones when you have multiple instances, improving fault tolerance.