How to Reduce Redshift Cluster Size for Low CPU Utilization
Overview
When your Amazon Redshift cluster consistently shows CPU utilization below 20%, it's a clear indicator that you're over-provisioned. This tutorial demonstrates how to downsize a Redshift cluster from 4 nodes to 2 nodes using elastic resize, reducing costs by approximately 50% while maintaining adequate performance for your workload.
Cost Impact: Reducing from 4 dc2.large nodes to 2 nodes saves approximately $365/month (50% reduction from ~$730/month to ~$365/month).
Time Required: The elastic resize process typically takes 10-15 minutes.
Prerequisites
- An Amazon Redshift cluster with sustained low CPU utilization (<20% over 7 days)
- AWS Console access with permissions to modify Redshift clusters
- Understanding that the cluster will be read-only briefly during the resize operation
Step 1: Navigate to Redshift Console and Select Cluster
Navigate to the Redshift console and locate the cluster showing low CPU utilization. The cluster should have an "Available" status.
Click on the cluster name to view its details.
Step 2: Review CPU Utilization Metrics
In the cluster details page, click on the Monitoring or Metrics tab to review CPU utilization.
Verify that average CPU has remained below 20% for the past 7 days. This confirms the cluster is significantly over-provisioned and is a good candidate for downsizing.
Key Metrics to Check:
- CPU Utilization: Should be consistently <20%
- Query Patterns: Ensure low CPU isn't due to temporary low usage
- Storage Usage: Verify you have adequate headroom after reducing nodes
Step 3: Review Current Configuration
Click on the Properties or Configuration tab to view the current cluster setup.
Note the current configuration:
- Node Type: dc2.large (or your specific node type)
- Number of Nodes: 4
- Estimated Monthly Cost: ~$730/month for 4 dc2.large nodes
After resizing to 2 nodes, the cost will be approximately $365/month.
Step 4: Initiate Elastic Resize
Click the Actions dropdown button at the top right of the cluster details page.
Select Resize from the Actions menu.
When prompted, choose Elastic resize for faster completion (typically 10-15 minutes vs several hours for classic resize).
Elastic vs Classic Resize:
- Elastic Resize: Faster (10-15 min), temporarily read-only, limited node type changes
- Classic Resize: Slower (hours), supports all configuration changes, creates new cluster
Step 5: Configure New Node Count
In the resize configuration screen:
- Change the Number of nodes from 4 to 2
- Keep the Node type as dc2.large (or your current type)
- Review the estimated new monthly cost shown in the console (~50% reduction)
Important Considerations:
- Ensure your data will fit on 2 nodes (check storage capacity)
- With <20% CPU usage, performance impact should be negligible
- Database will be read-only for a few minutes during resize
Step 6: Review Resize Impact
Before confirming, review the resize summary panel:
- Estimated Duration: 10-15 minutes for elastic resize
- Cluster Availability: Temporarily read-only during operation
- Expected Cost Reduction: ~$365/month savings
- Performance Impact: Minimal for workloads with <20% CPU utilization
Double-check that you're comfortable with the brief read-only period for your applications.
Step 7: Confirm and Start Resize
Click Resize cluster to begin the operation.
The cluster status will change to "Resizing". You can navigate away and return later to check progress.
Step 8: Monitor Resize Progress
The cluster details page will show the resize status. During elastic resize:
- Cluster enters "Resizing" state
- Cluster becomes read-only (queries can read but not write)
- Nodes are added/removed
- Cluster returns to "Available" state
You can monitor the progress in the console or set up CloudWatch alarms to notify you when complete.
Step 9: Verify Completion
Once the resize completes:
- Verify cluster status returns to Available
- Confirm the configuration shows 2 nodes
- Check that your applications can connect and run queries normally
- Monitor CPU utilization over the next few days to ensure it remains acceptable
The cluster is now operational at half the previous cost.
Alternative Approaches
If this elastic resize doesn't fully optimize your costs, consider these alternatives:
1. Pause/Resume Scheduling
For clusters used only during business hours, implement automated pause/resume:
# Lambda function example for pause/resume scheduling
import boto3
redshift = boto3.client('redshift')
def lambda_handler(event, context):
cluster_id = 'your-cluster-id'
if event['action'] == 'pause':
redshift.pause_cluster(ClusterIdentifier=cluster_id)
else:
redshift.resume_cluster(ClusterIdentifier=cluster_id)
Savings: 50-70% reduction with no performance impact during active hours
2. Migrate to Redshift Serverless
For workloads with unpredictable patterns, Redshift Serverless offers:
- Pay only for actual compute used (RPU-hours)
- Auto-scaling based on demand
- No cluster management overhead
Best For: Development/test environments, sporadic analytics, variable workloads
3. Classic Resize for Node Type Changes
If you want to change node types simultaneously:
- Use classic resize instead of elastic
- Takes longer (hours vs minutes)
- Supports any configuration change
4. Further Node Reduction
If CPU remains low after downsizing to 2 nodes:
- Consider reducing to 1 node
- Evaluate smaller node types (dc2.large → dc2.medium)
- For very light workloads, migrate to Redshift Serverless
Cost Summary
Before Optimization:
- 4 × dc2.large nodes
- ~$730/month
- CPU Utilization: <20%
After Optimization:
- 2 × dc2.large nodes
- ~$365/month
- Expected CPU Utilization: <40% (still comfortable headroom)
Savings: $365/month (~$4,380/year)
For clusters with <20% CPU utilization, this resize typically has negligible performance impact while delivering significant cost savings.
Monitoring Post-Resize
After completing the resize, monitor these metrics for 1-2 weeks:
- CPU Utilization: Should remain below 60% for comfortable operation
- Query Performance: Watch for any degradation in query times
- Disk Space: Ensure storage capacity is adequate
- Concurrent Queries: Verify your workload handles concurrency well
If metrics show stress, you can always resize back up using the same elastic resize process.
Troubleshooting
Resize Fails or Takes Too Long
- Classic Resize Fallback: If elastic resize fails, try classic resize
- Check Cluster Health: Ensure cluster is in "Available" state before resizing
- Snapshot First: For critical clusters, take a manual snapshot before resizing
Performance Degradation After Resize
- Insufficient Capacity: May need to resize back up or optimize queries
- Workload Changed: Verify CPU patterns haven't changed since initial analysis
- Disk I/O Bottleneck: Check if storage is now constrained
Applications Can't Connect During Resize
- Expected Behavior: Brief read-only period is normal for elastic resize
- Plan Maintenance Window: Schedule resize during low-traffic periods
- Configure Retries: Ensure applications have connection retry logic
Best Practices
- Monitor First: Collect at least 7 days of CPU metrics before resizing
- Start Conservative: Reduce nodes gradually (4→3→2) if uncertain
- Test in Non-Prod: Try the process in dev/staging first
- Snapshot Before: Always take a manual snapshot before major changes
- Schedule Wisely: Perform resize during maintenance windows or low-traffic periods
- Document Changes: Keep records of configuration changes and rationale
Conclusion
Downsizing an over-provisioned Redshift cluster is a straightforward way to reduce AWS costs without impacting performance. For clusters with sustained CPU utilization below 20%, reducing node count by 50% typically delivers proportional cost savings with minimal risk.
Monitor your cluster's performance post-resize and adjust as needed. The elastic resize feature makes it easy to scale both up and down based on actual workload requirements.