Medium severitySageMakerMachine Learning

SageMaker Endpoint Production Variants Should Have at Least Two Initial Instances

Overview

This check verifies that your Amazon SageMaker endpoint configurations have at least two instances per production variant. Running multiple instances provides redundancy and keeps your machine learning inference services available even if one instance fails.

Risk

Running a SageMaker endpoint with only one instance creates a single point of failure. If that instance or its Availability Zone experiences problems, your inference service becomes unavailable. This can cause:

Service outages affecting applications that depend on ML predictions
SLA breaches if uptime commitments are not met
Cascading failures in downstream systems waiting for inference results

Remediation Steps

Prerequisites

AWS Console access with permissions to manage SageMaker endpoints
Know which endpoint configurations need updating

Required IAM permissions

You need the following permissions:

sagemaker:DescribeEndpointConfig
sagemaker:CreateEndpointConfig
sagemaker:DeleteEndpointConfig
sagemaker:UpdateEndpoint
sagemaker:DescribeEndpoint

AWS Console Method

Important: SageMaker endpoint configurations cannot be modified after creation. You must create a new configuration and update your endpoint to use it.

Open the Amazon SageMaker console
In the left navigation, expand Inference and click Endpoint configurations
Note the settings of your existing configuration (model name, instance type, variant weight)
Click Create endpoint configuration
Enter a name for the new configuration
Under Production variants, click Add variant or edit the existing variant:
- Set Initial instance count to 2 or higher
- Configure other settings to match your original configuration
Click Create endpoint configuration
Go to Inference > Endpoints
Select your endpoint and click Update endpoint
Choose your new endpoint configuration
Click Update endpoint
Once the endpoint status shows InService, delete the old configuration if no longer needed

AWS CLI (optional)

Step 1: View existing endpoint configuration

aws sagemaker describe-endpoint-config \
    --endpoint-config-name <your-endpoint-config-name> \
    --region us-east-1

Step 2: Create a new endpoint configuration with 2+ instances

aws sagemaker create-endpoint-config \
    --endpoint-config-name <new-endpoint-config-name> \
    --production-variants '[
        {
            "VariantName": "AllTraffic",
            "ModelName": "<your-model-name>",
            "InitialInstanceCount": 2,
            "InstanceType": "ml.m5.large",
            "InitialVariantWeight": 1.0
        }
    ]' \
    --region us-east-1

Step 3: Update your endpoint to use the new configuration

aws sagemaker update-endpoint \
    --endpoint-name <your-endpoint-name> \
    --endpoint-config-name <new-endpoint-config-name> \
    --region us-east-1

Step 4: Wait for the endpoint to finish updating

aws sagemaker wait endpoint-in-service \
    --endpoint-name <your-endpoint-name> \
    --region us-east-1

Step 5: Delete the old configuration (optional)

aws sagemaker delete-endpoint-config \
    --endpoint-config-name <old-endpoint-config-name> \
    --region us-east-1

CloudFormation (optional)

AWSTemplateFormatVersion: '2010-09-09'
Description: SageMaker Endpoint Configuration with High Availability

Parameters:
  EndpointConfigName:
    Type: String
    Description: Name for the endpoint configuration
  ModelName:
    Type: String
    Description: Name of the SageMaker model to deploy
  InstanceType:
    Type: String
    Default: ml.m5.large
    Description: ML compute instance type
  InitialInstanceCount:
    Type: Number
    Default: 2
    MinValue: 2
    Description: Number of instances (minimum 2 for high availability)

Resources:
  SageMakerEndpointConfig:
    Type: AWS::SageMaker::EndpointConfig
    Properties:
      EndpointConfigName: !Ref EndpointConfigName
      ProductionVariants:
        - VariantName: AllTraffic
          ModelName: !Ref ModelName
          InitialInstanceCount: !Ref InitialInstanceCount
          InstanceType: !Ref InstanceType
          InitialVariantWeight: 1.0

Outputs:
  EndpointConfigArn:
    Description: ARN of the endpoint configuration
    Value: !Ref SageMakerEndpointConfig

Deploy with:

aws cloudformation deploy \
    --template-file template.yaml \
    --stack-name sagemaker-ha-endpoint-config \
    --parameter-overrides \
        EndpointConfigName=my-ha-endpoint-config \
        ModelName=my-model \
    --region us-east-1

Terraform (optional)

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

variable "endpoint_config_name" {
  description = "Name for the endpoint configuration"
  type        = string
}

variable "model_name" {
  description = "Name of the SageMaker model to deploy"
  type        = string
}

variable "instance_type" {
  description = "ML compute instance type"
  type        = string
  default     = "ml.m5.large"
}

variable "initial_instance_count" {
  description = "Number of instances (minimum 2 for high availability)"
  type        = number
  default     = 2

  validation {
    condition     = var.initial_instance_count >= 2
    error_message = "Initial instance count must be at least 2 for high availability."
  }
}

resource "aws_sagemaker_endpoint_configuration" "ha_endpoint_config" {
  name = var.endpoint_config_name

  production_variants {
    variant_name           = "AllTraffic"
    model_name             = var.model_name
    initial_instance_count = var.initial_instance_count
    instance_type          = var.instance_type
    initial_variant_weight = 1.0
  }
}

output "endpoint_config_arn" {
  description = "ARN of the endpoint configuration"
  value       = aws_sagemaker_endpoint_configuration.ha_endpoint_config.arn
}

Verification

After updating your endpoint:

In the SageMaker console, go to Inference > Endpoint configurations
Click on your new configuration
Under Production variants, confirm the Initial instance count is 2 or higher

CLI verification

aws sagemaker describe-endpoint-config \
    --endpoint-config-name <your-endpoint-config-name> \
    --region us-east-1 \
    --query 'ProductionVariants[*].{Name:VariantName,InstanceCount:InitialInstanceCount}'

Expected output shows instance count of 2 or more:

[
    {
        "Name": "AllTraffic",
        "InstanceCount": 2
    }
]

Additional Resources

Notes

Endpoint configurations are immutable: You cannot edit an existing configuration. Create a new one and update your endpoint to use it.
Cost implications: Running 2+ instances doubles (or more) your compute costs. Balance availability needs against budget constraints.
Update process: When you update an endpoint, SageMaker deploys new instances before removing old ones, so there is no downtime during the transition.
Consider auto-scaling: For variable workloads, consider configuring auto-scaling policies that maintain a minimum of 2 instances but can scale up during peak demand.
Multi-AZ distribution: SageMaker automatically distributes instances across Availability Zones when you have multiple instances, improving fault tolerance.

Overview​

Risk​

Remediation Steps​

Prerequisites​

AWS Console Method​

Verification​

Additional Resources​

Notes​