Scaling Serverless Architecture using AWS and Lambda for Asynchronous Processing

Using AWS Lambda it is possible to develop serverless functions in which the container it is deployed to is maintained by AWS. As well as managing the resource allocation to the environment it will also handle scaling of the lambda function to meet demand. So for example if you have an API Gateway whose endpoints trigger lambda functions AWS can spawn multiple containers horizontally running your Lambda function to meet lets say a sudden burst of hits against your endpoint.


Even though AWS is managing the scaling of the lambda function you may still hit other choke points that AWS won’t manage for you. Keeping things simple, lets say the hit against the API Gateway is to trigger some kind of asynchronous task in which you hit a small RDS deployment of Maria DB. The sudden bursts against the DB could cause issues with other functions which may be using the DB. This is where you are at a point where you need to control the flow of queries so that they are triggered at a rate in which you know your current DB can handle.


In order to achieve this you will need to introduce a queue, which in AWS is called a Simple Queue Service or SQS for short. There are 2 types of queues but I only like First In First Out (FIFO) queues as they ensure no duplicates of the items you queue. The downside to this though is that there is a rate limit on the amount of items which can be queued per second. The general idea is that you have a function which will queue work items as quick as the queue allows. You then have a, or multiple consumers reading from the queue with each consumer being triggered by a Cloud Watch Event. The Final part is to have a function that acts as the worker. This is the function that performs the actual asynchronous task which the consumer will off load to.


Taking the API Gateway example above you will need the following setup.

Create Your Queue

The first port of call is to create the queue you intend to use. I tend to always prefer FIFO queues but you could use a standard queue if you need high through put and don’t care about duplicate items being delivered to the consumer or the ordering of the items. I always find myself in a situation where I may not care about the order but I certainly care about duplicates. I don’t want them, I don’t want to create a mechanism to detect them because thats what the FIFO queue is for! When setting up the queue you will want to pay attention to the settings. The important to are the visibility timeout and the dead letter queue.


The visibility timeout is important if you have multiple consumers reading from the same queue. When the consumer dequeues items it puts them in a state of in-flight until rather the queue item is deleted or the visibility timer expires. While in this in-flight state it is impossible for another consumer to dequeue the item as well which would lead to you processing the same item twice. If the in-flight timer expires because the queue item hasn’t bee deleted then the item returns back to the queue, this would likely occur if you encountered an error during the worker function and can leave the queue clogging up with the same old queue items which will fail every time you go to process them. This is where you could consider the use of a dead letter queue in which these queue items that cause errors can be shifted to and handled in a different manor.

Handle the hits against the API Gateway

When your API Gateway is hit you need it to trigger a Lambda function that connects up to the FIFO SQS and queues items. Depending on how your work items are defined you may want to and should consider batch queuing of items in parallel. The reason for this is that your function is a Lambda running in a container which is confined by an execution time, 300 seconds being the limit. However you don’t want to your queuing mechanism to take up to that limit because its sequentially posting individual items on to the queue and waiting for the response. This is unnecessary and will cost you more for you Lambda’s pay as you go uses. The only thing you need to match sure you do if parallel processing is not to exceed the queue limit which as of writing is 300 transactions per second.

Create the Consumer

The consumer is another Lambda function whose job is to simply pull items from the queue and off load them to the worker function. The consumer should be triggered from a Cloud Watch Event Rule. You can if you like have the consumer being triggered as quick as every minute. The rate at which you trigger the consumer is down to how long you are prepared to wait for the next invocation of the consumer.


When dequeuing you can take up to 10 queue items from the queue in one request. What you want to do ideally is pull the maxium items off the queue and for each queue item parallel process it by invoking the worker functions. When the worker functions are complete you need to check if there is time to execute another batch, if not then you let the consumer invoke its complete callback.  The key thing with the consumer is that you make sure it finishes its processing before AWS terminates the container. If you don’t you could end up in a state of having in-flight items that have been half processed.

Create the Worker

The worker is a Lambda function in which the grunt of the work is performed. Typically you would allocate a decent about of memory to the worker to cater for the logic that it needs to perform. Aside from the asynchronous task the worker has to execute it needs to also handle deleting of the queue item and if required shifting the queue item to a dead letter queue. The reason for this is that only the worker will known if it has successfully perform the asynchronous task so it is in the best place to finally remove the item.

The Final Word

With this setup you are now able to control the rate at which you process bursts of asynchronous operations by representing each operation as a queue items which is consumed at a rate you know your supporting architecture such as a DB can handle. You have the ability to scale your architecture to meet growing demands. If monitoring the queues you will have an indication that demand is increasing by looking at the number of queue items which require processing. As soon as you are happy your architecture, such as the DB can handle the load another consumer can be thrown into the mix to help process queue items quicker.


Help on setting up the queue can be found here

Help on getting started with the API Gateway and Lambda can be found here.

Leave a Reply

Your email address will not be published. Required fields are marked *