Adaptive Rate Limiting

Blank 20/2/2026 11:51 - 20/2/2026 11:51
Website Management

The “Why” and “What” in Plain English

At its core, our rate limiting system is a smart safety valve for our application. Its job is to prevent any single part of our service from becoming overloaded and slowing down or crashing the entire system.

Think of it like an automated traffic controller for a city.

  • The Goal: Prevent traffic jams to keep things moving smoothly for everyone.

  • The Method: The controller watches the speed of traffic on a specific road.

    • If cars are moving fast, it lets more cars onto that road (a green light).

    • If traffic starts to slow down, it reduces the number of cars it lets on (a yellow light) to ease congestion.

    • If there’s a total gridlock, it temporarily closes the on-ramp (a red light) to give the jam a chance to clear.

Our system does the exact same thing with user requests. It constantly monitors the performance (speed) of a feature and adjusts the allowed traffic (request rate) accordingly. This ensures that one struggling component doesn’t bring everything else down with it.

The Technical Deep Dive

This section breaks down the mechanics of how the system makes its decisions. The logic is based on two key metrics and a set of configurable rules.

Rate Limiting Rules

Our system uses two distinct rules for rate limiting, based on the type of content being requested. This allows us to apply different responses for users browsing the site versus background requests for assets.

  1. HTML Content Rule: This applies to standard web page requests. If rate-limited, it serves a user-friendly HTML error page.
  2. Non-HTML Content Rule: This applies to assets like API calls (/api/), JavaScript (.js), CSS (.css), images (.png), and JSON data (.json). If rate-limited, it returns an empty response with the appropriate error code.

The core logic for calculating the allowed rate is the same for both rules.

Core Metrics

For every URL or API endpoint, the system tracks two things over a 60-second interval:

  1. request.count.by.url (Current Traffic): This is a simple counter. It measures the number of requests a URL has received in the last 60 seconds.

  2. request.resp.avg.by.url (Current Speed): This measures the average time, in milliseconds, that the URL took to respond to requests over the last 60 seconds.

The fundamental decision for every new request is: Is the "Current Traffic" less than the "Allowed Traffic"?

If yes, the request goes through. If no, it is denied. The magic is in how the “Allowed Traffic” (or allowedRate) is calculated.

Configuration Parameters

The system’s behavior is defined by six key parameters, which set the boundaries for its decisions.

Parameter Our Value What it Means
minValue 300 ms The response time threshold for a “perfectly healthy” service. Any response time below this is considered excellent.
maxValue 18,000 ms The response time threshold for a “critically overloaded” service. Any response time above this indicates a major problem.
maxRate 240 req/min The maximum request rate allowed when the service is healthy (i.e., when response time is at or below minValue).
minRate 4 req/min The minimum request rate allowed when the service is overloaded (i.e., when response time is at or above maxValue).
countMetricName request.count.by.url The name of the metric used to measure current traffic.
valueMetricName request.resp.avg.by.url The name of the metric used to measure performance (speed).

The Calculation Logic (calculateAllowedRate)

This is the brain of the system. It determines the allowedRate based on the average response time.

Scenario 1: Healthy State (The Green Light)

  • Condition: The average response time is less than or equal to minValue (300 ms).

  • Action: The system returns the maxRate.

  • Result: The allowed rate is 240 requests per minute. The service is fast, so we allow maximum traffic.

Scenario 2: Overloaded State (The Red Light)

  • Condition: The average response time is greater than or equal to maxValue (18,000 ms).

  • Action: The system returns the minRate.

  • Result: The allowed rate is 4 requests per minute. The service is struggling badly, so we drastically reduce traffic to allow it to recover.

Scenario 3: Degrading State (The Sliding Scale)

  • Condition: The average response time is between minValue and maxValue (301 ms - 17,999 ms).

  • Action: The system calculates the allowed rate using linear interpolation. This means the allowed rate decreases smoothly as the response time increases.

  • Result: The allowed rate will be a value somewhere between 4 and 240. For example, a moderately slow response time will result in a moderate reduction in the allowed traffic.

A Practical Example

Let’s walk through a real-world example to see the system in action.

  • Situation: The average response time for the /api/dashboard endpoint is currently 5,000 ms (5 seconds).

  • Step 1: Check the state.

    • 5,000 ms is between our minValue (300) and maxValue (18,000). The system will use the sliding scale calculation.
  • Step 2: Calculate the allowed rate.

    • The system calculates a point on the line between our two extremes: (300ms, 240 req/min) and (18000ms, 4 req/min).

    • Based on the formula, an average response time of 5,000 ms results in an allowedRate of approximately 177 requests per minute.

  • Step 3: Make the decision.

    • The system checks the request.count.by.url for the /api/dashboard endpoint for the last 60 seconds.

    • If the current count is 150, the request is ALLOWED because 150 < 177.

    • If the current count is 180, the request is DENIED because 180 is not less than 177.

The Response (When a Request is Denied)

When the system denies a request, it doesn’t just drop it silently. It sends a specific response to inform you (e.g., a web browser or an automated tool) what has happened.

HTTP Status Code: 429 Too Many Requests

Every denied request receives an HTTP 429 Too Many Requests status code. This is the standard way to signal that a rate limit has been enforced. It tells browsers, search engines, and other automated systems to back off and try again later.

Response Content: It Depends on the Request

The body of the response varies based on what was requested:

  • For HTML Pages: If a user browsing the site gets rate-limited, we display a friendly, full-page HTML error message titled “Whoa, Slow Down There!”. This ensures a clear and non-technical experience for the end-user.

  • For Non-HTML Assets (APIs, JS, CSS, etc.): If a background request for an asset is denied, the system returns an empty response. Sending a full HTML page in this context would be wasteful and could potentially break the rendering of the website.

No-Cache Headers

Along with the 429 status, we send a set of headers to prevent the error response from being cached by browsers or proxy servers. This is critical to ensure that once the rate limit is lifted, the client can fetch the actual content without getting a stale error page.

If you have any questions, please reach out to our support team.