14 Jul, 2023
You want more resources? Here, pay us more money and you can have some! This makes the lives of many technology teams way easier, as handling the ups and downs of customer demand can all be managed automatically. Working as both a cloud engineer and a developer, one thing I’ve had to do numerous times is look at autoscaling policies for applications. The best are when these applications are running as containers (luckily most of them are these days). Today I wanted to look at some of the default scaling for containers in ECS (Elastic Container Service), and compare this with Kubernetes.
Why am I writing about this? Chances are you won’t have a choice in the platform your containers are running on (and even if you do, autoscaling will likely not be a deciding factor). But after working on both platforms, I realised the approaches / management of auto scaling for the platforms was a little different, even though concepts are the same. To help build your understanding, and clarify any sticking points when getting started on a different platform, I’ve tried to compare their implementation to scaling and look at any key differences.
For both of these scenarios, we are going to ignore how the underlying infrastructure that the containers run on scales, and focus on the application scaling. We are also going to be focussing on the default provided metrics for scaling (CPU + Memory). And before we get started, a couple of naming things:
The ‘application’ deployment is a service in ECS, and a deployment in Kubernetes
The ‘containers’ being run for your application are called tasks in ECS and pods in Kubernetes
ECS uses AWS’s application autoscaling to scale the desired count of tasks in an ECS service. Tasks running in the service send data on CPU/Memory to AWS Cloudwatch, which is what is used for autoscaling.
Here is some sample Cloudformation code for deploying a scaling policy for an ECS service:
RoleARN: !GetAtt AutoScalingRole.Arn
ScalingTargetId: !Ref ScalableTarget
From looking at our code example above, the first thing we do is define our ECS service as a scalable target (because AWS’s application auto scaling is used for many different AWS services, this is how cloudformation handles specifying that we will be scaling ECS). We can then define a scaling policy for the scalable target we have just defined.
PolicyType - this defines the type of scaling that will be used, there are 3 options for ECS:
Target tracking scaling - Increase/decrease the number of tasks that your service is running based on a target value. This is what we’ll use for our examples. It’s good if you have fairly basic scaling requirements, and just want to get started.
Step scaling - Increase/decrease tasks based on a set of scaling adjustments, that vary based on the size of the alarm breach. This is useful if you want more control over your scaling than target tracking, you can specify the scaling quantities, and the different thresholds, with much more control.
Scheduled scaling - scaling based on day/time, this can be used to scaling based on any regular spikes your application has (i.e. if Friday afternoons are always busy you might want to pre-emptively scale during this period), also useful for scaling down pre-prod applications after hours to save money.
PredefinedMetricSpecification - this is where you set what metric is being used, there are a couple of defaults ECSServiceAverageCPUUtilization and ECSServiceAverageMemoryUtilization
This calculates as a % how much CPU/memory you are using against what you defined in your task definition
If you want to scale based on memory, you would need another, separate policy as each target tracking scaling policy can only scale on one metric. ECS is fairly considerate if you have multiple scaling metrics, i.e. it will scale out if any of the target tracking policies are ready to scale out. It will only scale in though if all policies are ready to scale in.
Now we’ve defined all the resources we need, lets clarify the final autoscaling process that occurs.
The steps in the diagram are as follows:
Each task sends the amount of CPU and memory the task is using to ECS, and ECS calculates the utilization metrics and sends this to Cloudwatch
If the metrics are above/below the percentage specified in your autoscaling policy (i.e. 75), then a cloudwatch alarm would be triggered
When the cloudwatch alarm is triggered, ECS sets the desired count of tasks on your service to match the change in demand
In Kubernetes resources are managed slightly differently to ECS. The flow of Pod data (i.e. CPU and memory) is managed by the Kubernetes controller. This information is typically sent to a metrics server where resource usage data would be available for each pod.
To setup autoscaling the next step is to add a Horizontal Pod Autoscaler (HPA) to your deployment. This is what the resource definition looks like:
- type: Resource
- type: Resource
The default/easiest to get up and running is using either the Utilization or AverageValue metrics for CPU/Memory:
Utilization: This is a % utilisation of the pod resources, you can specify pod resource usage 2 ways in Kubernetes
Requests: This is the normal expected usage, and what is used in the % usage calculation for both CPU and Memory metrics (this amount of resources is basically reserved for the pod)
Limits: This is what the pod can scale to use, if there is enough resources on the cluster. More info can be found on requests/limits here.
Average value: This is fairly self-explanatory, it is the average value of the resource usage across all pods
The HPA manages looking at the metrics from the pods, and checking whether we need to scale up or down our pods. The process the HPA undergoes is:
The steps from the diagram in more detail are:
The Kubernetes metrics server is consistently fetching pod CPU/memory usage data and make these available
HPA checks metrics server for resource usage and based on pod data, HPA will calculate the number the number of pods are required
HPA then scales up/down your desired number of replicas to match what it has analysed from pod data
The key thing I wanted to call out is that while the end product can be the same (scaling containers based on their CPU/memory utilisation) the process for setting this up and monitoring cross platforms is quite different.
Because AWS uses the generic application autoscaling service, you typically have to do a bit more setup as code (i.e. creating the scalable target)
Like a lot of AWS services, if you do this via clickops in the console a lot of this complexity is automatically handled for you
Having access to visualisations of your scaling data (i.e. your CPU/Memory usage, and you scaling target value) makes it easier to understand how your scaling is working
There are some nice defaults included in the ECS AWS console, e.g. CPU and Memory utilisation, that allows you to really easily understand scaling activities.
In Kubernetes the data is there but isn’t as easily exposed, as the default is just available through the metrics server (most organisations will have this data easily displayable through the dashboarding tool, but requires a little more setup)
For both ECS and Kubernetes, the calculations to decide how many containers to scale up/down to meet your target (and how quickly) aren’t clearly defined/exposed. This doesn’t really matter though, as all we care about is making sure our target metrics are being maintained on our applications.
We only scratched the surface here of autoscaling in both Kubernetes and ECS. If you want to further improve your application autoscaling there is a lot more information and documentation online about using custom metrics, or different scaling policies, based on your application’s needs.
Hopefully this has improved your understanding of how scaling works under the covers, and where to look the next time you want to make your applications more resilient to customer demand!