How Razorpay scaled their Notification Service

Oct 26, 2022

Here’s the original article from Razorpay.

Old Architecture

API sends the notification requests to SQS queue.
Which are read by workers and notifications are sent.
Workers also write the execution result to database and data lake directly.
Scheduler, which is run periodically, reads the requests from database, that have to be retried.
And then pushes them to the same SQS for processing once more.

Prioritising requests. Not able to do it since there is only one queue for all request types.
Bottleneck at database. As traffic increases, inserts into database increase, which has limited connections, thereby not scaling well. And also as inserts increases, RAM usage increases, which degrades read performance.
Handling unexpected customer server response times. If one customer server returns response after 5 min, worker which is executing it should wait on it and cant take any other requests until this is done, thereby blocking it and causing increasing processing time for other requests.

API sends the notification requests to Rate Limiter.
Rate Limiter, then prioritises requests, based on request type and customer, and sends them to P0, P1 queues. Along with it, it also rate limits, based on request type, and sends rate limited requests to Rate Limit Queue, such that other request types performance doesn’t slow down.
Workers continuously read from all the queues, send notifications.
And instead of directly inserting into database, data lake, workers then push the execution result to Kinesis.
Execution result is then read from Kinesis and inserted into database, data lake.
Scheduler, which is run periodically, reads the requests from database, that have to be retried.
And instead of pushing directly to main SQS, it pushes them to Retry SQS for processing once more.

8, 9. Separate workers execute these retry requests and push the results to Kinesis to further store down the line.

Adding multiple queues and prioritising requests and adding rate limits solved the first challenge for prioritising requests.
Since they are not inserting directly to database, using certain number of workers/connections to insert and thus using constant number of connections and inserts/sec always, thereby solving database bottleneck.
To handle unexpected retries, customers server response time, instead of pushing to main queue, they handled separately with different logic like deprioritising that customer events etc.

Using async writes and deprioritising, rate limiting requests, they have solved and scaled their Notification System.

Let me know if you have a different solution in mind.

That’s it! Please let me know about your views and comment below for any clarifications or ask them directly on their article 😄

If you found value in reading this, please consider sharing it with your friends and also on social media 🙏

Also, to be notified about my upcoming articles, subscribe to my newsletter (I’ll not spam you 😂)

You can find me on Twitter and LinkedIn ✌️