AWS ECS autoscaling sounds simple on paper - set a target, let AWS handle the rest. In practice, most teams end up with services that scale too slowly, overshoot on the way up, and refuse to scale down. Here’s what we’ve learned running ECS autoscaling in production across dozens of services.
The short version: target tracking works for request-driven services, step scaling gives you control for batch workloads, and queue-based scaling is the only sane option if your services consume from SQS, Kafka, or similar queues.
ECS supports three autoscaling policy types. Each one fits a different workload shape.
You pick a metric (CPU, memory, ALB request count) and a target value. AWS figures out how many tasks you need. This is the default recommendation from AWS, and it works well for HTTP services behind a load balancer.
# CloudFormation example
ScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyType: TargetTrackingScaling
TargetTrackingScalingPolicyConfiguration:
TargetValue: 60.0
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
ScaleInCooldown: 300
ScaleOutCooldown: 60
The 60% CPU target is a good starting point. We’ve seen teams set it to 80% thinking they’ll save money - then wonder why their p99 latency spikes during scale-out because there’s no headroom left.
When to use it: API services, web backends, anything where CPU or request count correlates directly with load.
When it fails: Batch processors, queue consumers, services where CPU doesn’t reflect actual workload. A service consuming SQS messages might sit at 10% CPU while a queue of 50,000 messages builds up.
You define specific thresholds and how many tasks to add or remove at each level. More work to configure, but you get precise control.
StepScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyType: StepScaling
StepScalingPolicyConfiguration:
AdjustmentType: ChangeInCapacity
StepAdjustments:
- MetricIntervalLowerBound: 0
MetricIntervalUpperBound: 1000
ScalingAdjustment: 2
- MetricIntervalLowerBound: 1000
MetricIntervalUpperBound: 5000
ScalingAdjustment: 5
- MetricIntervalLowerBound: 5000
ScalingAdjustment: 10
Cooldown: 60
The config above says: if the metric crosses the alarm threshold by 0-1000, add 2 tasks. By 1000-5000, add 5. Over 5000, add 10. This lets you react proportionally to different levels of load.
When to use it: When you know your scaling curve and want fine-grained control. Works well for workloads with predictable patterns where you’ve measured the relationship between metric and capacity.
When it fails: When you don’t know the right thresholds yet. Getting step scaling wrong means either over-provisioning or being too slow to react.
Neither target tracking nor step scaling works well for queue consumers. The problem is fundamental: CloudWatch CPU metrics tell you how busy your current tasks are, not how much work is waiting.
A service with 5 tasks at 20% CPU and 100,000 messages in the queue needs to scale up aggressively. But target tracking sees 20% CPU and thinks everything is fine.
The fix is scaling based on queue depth directly:
# Calculate desired tasks based on queue depth
messages_in_queue = 45000
messages_per_task = 1000 # each task processes ~1000 msg/min
desired_tasks = math.ceil(messages_in_queue / messages_per_task)
desired_tasks = max(min_tasks, min(desired_tasks, max_tasks))
This is where a Lambda-based scaling approach pays off. A small function that reads queue metrics every minute and updates your ECS service’s desired count directly bypasses CloudWatch-based autoscaling entirely, giving you sub-minute scaling latency for queue consumers. stepscale AI goes one step further: it learns the right messages-per-task ratio and min/max bounds from your historical workload, so you do not have to guess those values yourself.
When to use it: Any service consuming from SQS, Kafka, RabbitMQ, Redis queues, or Kinesis streams. Also useful for services processing S3 event notifications.
Cooldowns prevent your service from scaling up and down repeatedly (thrashing). AWS defaults are often too conservative.
Here’s what we’ve found works:
| Scenario | Scale-out cooldown | Scale-in cooldown |
|---|---|---|
| API service (target tracking) | 60s | 300s |
| Queue consumer (step scaling) | 30s | 120s |
| Batch processor | 60s | 600s |
The pattern: scale out fast, scale in slow. When load hits, you want tasks up quickly. When load drops, wait longer to confirm it’s actually gone before removing capacity.
A common mistake: setting scale-in cooldown to 60 seconds. What happens is traffic drops briefly during a lull, tasks get removed, then traffic comes back and you’re scrambling to scale up again. 300 seconds minimum for scale-in on any production service.
Setting minCapacity and maxCapacity wrong causes the most common autoscaling failures.
Min tasks too low: Setting min to 1 means a cold start on every traffic spike. If your service takes 45 seconds to start and register with the load balancer, you’ll have a full minute of degraded service on every scale-out from 1.
For production services, calculate your minimum based on:
If your quietest hour needs 3 tasks at full load, set min to 2.
Max tasks too low: We’ve seen teams set max to 20 “to control costs” and then eat a full outage when a marketing campaign drives 10x normal traffic. Your max should be your absolute ceiling based on what your VPC, database connections, and downstream services can handle - not a cost control measure. Use billing alerts for cost control instead.
Max tasks too high: Less common, but setting max to 1000 when your RDS instance can only handle 200 connections means autoscaling could bring down your database. Know your downstream limits.
ECS autoscaling has a built-in delay that most teams underestimate:
Total: 45 seconds to 4+ minutes from decision to serving traffic.
This means your scaling needs to be predictive, not reactive. By the time your new tasks are serving traffic, the spike might already be over.
Fixes:
ScheduledAction:
Type: AWS::ApplicationAutoScaling::ScheduledAction
Properties:
ScheduledActionName: morning-warmup
Schedule: "cron(50 8 ? * MON-FRI *)"
ScalableTargetAction:
MinCapacity: 10
A single metric is rarely enough. Your API service might need to scale on both CPU and request count:
ECS lets you attach multiple scaling policies to one service. The policies operate independently - whichever one calls for the most tasks wins. This is the correct behavior: if either metric says you need more capacity, you should get more capacity.
Don’t try to build a single composite metric by averaging CPU and request count. It dilutes both signals.
You can’t improve what you don’t measure. Track these:
CloudWatch dashboards work for this, but stepscale AI takes it further by analyzing your scaling patterns over time and automatically tuning your thresholds, cooldowns, and min/max values based on actual workload data. Instead of manually adjusting these numbers, the AI learns your traffic patterns and optimizes the configuration for you.
1. Using CPU scaling for queue consumers. Already covered this, but it’s the #1 mistake. CPU tells you how busy tasks are, not how much work is waiting.
2. Not testing autoscaling before production. Run a load test that simulates your actual traffic pattern. Steady ramp-up, sudden spike, gradual decline. Watch how your scaling responds.
3. Ignoring downstream limits. Your ECS service might scale to 100 tasks, but if your RDS instance only handles 50 connections and each task opens 2, you’ve just created a database outage. Always check: database connections, API rate limits, NAT gateway bandwidth, and any shared resources.
4. Setting identical scale-out and scale-in thresholds. If you scale out at 60% CPU and scale in at 59% CPU, you’ll thrash endlessly. Create a gap: scale out at 70%, scale in at 40%.
5. Forgetting about Fargate spot termination. If you use Fargate Spot for cost savings, your tasks can be interrupted with 30 seconds notice. Your min capacity should use regular Fargate, with spot only for additional capacity. Mix capacity providers:
CapacityProviderStrategy:
- CapacityProvider: FARGATE
Base: 3
Weight: 1
- CapacityProvider: FARGATE_SPOT
Weight: 3
This keeps 3 tasks on regular Fargate (stable base) and adds spot tasks at a 3:1 ratio for scale-out.