System back-pressure

What’s back-pressure

Before diving into the details of this post, let's clarify what "back-pressure" means in this context. Back-pressure occurs when the rate of incoming data surpasses the system's ability to process it. In computing, back-pressure is often the result of a bottleneck where slower or limited resources, like databases, can't keep up with the rate of incoming requests and cause disruptions in other parts of the system, i.e. the HTTP layer.

Back-pressure effects

When back-pressure occurs in your database due to a high volume of incoming write requests, queries and transactions start to queue up. The database struggles to handle these tasks at the rate they are coming in. This backlog impacts upstream components, specifically the HTTP layer where requests are initially received setting off a chain reaction: the database slowdown leads to HTTP requests timing out or throwing errors. Customer dissatisfaction and operational disruptions follow. The effects of this breakdown can extend to other parts of the system, compromising your application's reliability and performance.

Not just a database issue

While databases often come to mind as the usual suspects for causing back-pressure, the reality is that any component in a system —be it a message bus, a caching layer, or a REST API— could induce back-pressure. When this happens, it leads to a chain reaction causing upstream components to break down.

Solutions

Following up with the database example, a common solution to database back-pressure could be to provision more power to your database. However, this is not only a reactive strategy but also a financially costly one. A more effective and cost-efficient approach is to introduce a pressure release mechanism, such as a buffer or contention service, between the HTTP layer and the database. This mechanism queues incoming requests and feeds them to the database at a manageable rate decoupling the immediate fate of incoming HTTP requests from the current state of the database, effectively breaking the chain reaction of failures. This strategy not only makes your system more reliable but also financially sustainable. You optimize resource usage and stave off unnecessary scaling costs, which can be especially important in pay-as-you-go cloud setups.

Pull systems are undervalued

In a world that often emphasizes real-time data and immediate responses, pull systems can be perceived as less modern or efficient than their push-based counterparts. However, this perception undervalues the stability and control that pull systems offer. In a pull system, the consumer requests data at its pace, thereby "pulling" it from the producer when ready. This mechanism introduces a level of reliability often missing in push architectures, especially valuable for mitigating back-pressure. By controlling the rate of incoming requests based on the system's processing ability, pull systems add an extra layer of reliability. Additionally, the controlled data flow allows for better resource management, reducing waste and costs— particularly important in cloud settings where resource utilization directly impacts financial outlays.

Conclusion

In short, back-pressure disrupts systems when data floods in too quickly, often jamming up databases. It can affect any system part, not just databases. The fix is simple and often underestimated: buffers or pull systems to control data flow, prevent system crashes and save on costs. Something to keep in mind when facing high-performance/high-cost scenarios.

karloscodes's blog