Skip to main content

How to Reduce Redshift Cluster Size for Low CPU Utilization

Overview

When your Amazon Redshift cluster consistently shows CPU utilization below 20%, it's a clear indicator that you're over-provisioned. This tutorial demonstrates how to downsize a Redshift cluster from 4 nodes to 2 nodes using elastic resize, reducing costs by approximately 50% while maintaining adequate performance for your workload.

Cost Impact: Reducing from 4 dc2.large nodes to 2 nodes saves approximately $365/month (50% reduction from ~$730/month to ~$365/month).

Time Required: The elastic resize process typically takes 10-15 minutes.

Prerequisites

  • An Amazon Redshift cluster with sustained low CPU utilization (<20% over 7 days)
  • AWS Console access with permissions to modify Redshift clusters
  • Understanding that the cluster will be read-only briefly during the resize operation

Step 1: Navigate to Redshift Console and Select Cluster

Navigate to the Redshift console and locate the cluster showing low CPU utilization. The cluster should have an "Available" status.

Click on the cluster name to view its details.

Step 2: Review CPU Utilization Metrics

In the cluster details page, click on the Monitoring or Metrics tab to review CPU utilization.

Verify that average CPU has remained below 20% for the past 7 days. This confirms the cluster is significantly over-provisioned and is a good candidate for downsizing.

Key Metrics to Check:

  • CPU Utilization: Should be consistently <20%
  • Query Patterns: Ensure low CPU isn't due to temporary low usage
  • Storage Usage: Verify you have adequate headroom after reducing nodes

Step 3: Review Current Configuration

Click on the Properties or Configuration tab to view the current cluster setup.

Note the current configuration:

  • Node Type: dc2.large (or your specific node type)
  • Number of Nodes: 4
  • Estimated Monthly Cost: ~$730/month for 4 dc2.large nodes

After resizing to 2 nodes, the cost will be approximately $365/month.

Step 4: Initiate Elastic Resize

Click the Actions dropdown button at the top right of the cluster details page.

Select Resize from the Actions menu.

When prompted, choose Elastic resize for faster completion (typically 10-15 minutes vs several hours for classic resize).

Elastic vs Classic Resize:

  • Elastic Resize: Faster (10-15 min), temporarily read-only, limited node type changes
  • Classic Resize: Slower (hours), supports all configuration changes, creates new cluster

Step 5: Configure New Node Count

In the resize configuration screen:

  1. Change the Number of nodes from 4 to 2
  2. Keep the Node type as dc2.large (or your current type)
  3. Review the estimated new monthly cost shown in the console (~50% reduction)

Important Considerations:

  • Ensure your data will fit on 2 nodes (check storage capacity)
  • With <20% CPU usage, performance impact should be negligible
  • Database will be read-only for a few minutes during resize

Step 6: Review Resize Impact

Before confirming, review the resize summary panel:

  • Estimated Duration: 10-15 minutes for elastic resize
  • Cluster Availability: Temporarily read-only during operation
  • Expected Cost Reduction: ~$365/month savings
  • Performance Impact: Minimal for workloads with <20% CPU utilization

Double-check that you're comfortable with the brief read-only period for your applications.

Step 7: Confirm and Start Resize

Click Resize cluster to begin the operation.

The cluster status will change to "Resizing". You can navigate away and return later to check progress.

Step 8: Monitor Resize Progress

The cluster details page will show the resize status. During elastic resize:

  1. Cluster enters "Resizing" state
  2. Cluster becomes read-only (queries can read but not write)
  3. Nodes are added/removed
  4. Cluster returns to "Available" state

You can monitor the progress in the console or set up CloudWatch alarms to notify you when complete.

Step 9: Verify Completion

Once the resize completes:

  1. Verify cluster status returns to Available
  2. Confirm the configuration shows 2 nodes
  3. Check that your applications can connect and run queries normally
  4. Monitor CPU utilization over the next few days to ensure it remains acceptable

The cluster is now operational at half the previous cost.

Alternative Approaches

If this elastic resize doesn't fully optimize your costs, consider these alternatives:

1. Pause/Resume Scheduling

For clusters used only during business hours, implement automated pause/resume:

# Lambda function example for pause/resume scheduling
import boto3

redshift = boto3.client('redshift')

def lambda_handler(event, context):
cluster_id = 'your-cluster-id'

if event['action'] == 'pause':
redshift.pause_cluster(ClusterIdentifier=cluster_id)
else:
redshift.resume_cluster(ClusterIdentifier=cluster_id)

Savings: 50-70% reduction with no performance impact during active hours

2. Migrate to Redshift Serverless

For workloads with unpredictable patterns, Redshift Serverless offers:

  • Pay only for actual compute used (RPU-hours)
  • Auto-scaling based on demand
  • No cluster management overhead

Best For: Development/test environments, sporadic analytics, variable workloads

3. Classic Resize for Node Type Changes

If you want to change node types simultaneously:

  • Use classic resize instead of elastic
  • Takes longer (hours vs minutes)
  • Supports any configuration change

4. Further Node Reduction

If CPU remains low after downsizing to 2 nodes:

  • Consider reducing to 1 node
  • Evaluate smaller node types (dc2.large → dc2.medium)
  • For very light workloads, migrate to Redshift Serverless

Cost Summary

Before Optimization:

  • 4 × dc2.large nodes
  • ~$730/month
  • CPU Utilization: <20%

After Optimization:

  • 2 × dc2.large nodes
  • ~$365/month
  • Expected CPU Utilization: <40% (still comfortable headroom)

Savings: $365/month (~$4,380/year)

For clusters with <20% CPU utilization, this resize typically has negligible performance impact while delivering significant cost savings.

Monitoring Post-Resize

After completing the resize, monitor these metrics for 1-2 weeks:

  1. CPU Utilization: Should remain below 60% for comfortable operation
  2. Query Performance: Watch for any degradation in query times
  3. Disk Space: Ensure storage capacity is adequate
  4. Concurrent Queries: Verify your workload handles concurrency well

If metrics show stress, you can always resize back up using the same elastic resize process.

Troubleshooting

Resize Fails or Takes Too Long

  • Classic Resize Fallback: If elastic resize fails, try classic resize
  • Check Cluster Health: Ensure cluster is in "Available" state before resizing
  • Snapshot First: For critical clusters, take a manual snapshot before resizing

Performance Degradation After Resize

  • Insufficient Capacity: May need to resize back up or optimize queries
  • Workload Changed: Verify CPU patterns haven't changed since initial analysis
  • Disk I/O Bottleneck: Check if storage is now constrained

Applications Can't Connect During Resize

  • Expected Behavior: Brief read-only period is normal for elastic resize
  • Plan Maintenance Window: Schedule resize during low-traffic periods
  • Configure Retries: Ensure applications have connection retry logic

Best Practices

  1. Monitor First: Collect at least 7 days of CPU metrics before resizing
  2. Start Conservative: Reduce nodes gradually (4→3→2) if uncertain
  3. Test in Non-Prod: Try the process in dev/staging first
  4. Snapshot Before: Always take a manual snapshot before major changes
  5. Schedule Wisely: Perform resize during maintenance windows or low-traffic periods
  6. Document Changes: Keep records of configuration changes and rationale

Conclusion

Downsizing an over-provisioned Redshift cluster is a straightforward way to reduce AWS costs without impacting performance. For clusters with sustained CPU utilization below 20%, reducing node count by 50% typically delivers proportional cost savings with minimal risk.

Monitor your cluster's performance post-resize and adjust as needed. The elastic resize feature makes it easy to scale both up and down based on actual workload requirements.