The “Why” and “What” in Plain English
At its core, our rate limiting system is a smart safety valve for our application. Its job is to prevent any single part of our service from becoming overloaded and slowing down or crashing the entire system.
Think of it like an automated traffic controller for a city.
-
The Goal: Prevent traffic jams to keep things moving smoothly for everyone.
-
The Method: The controller watches the speed of traffic on a specific road.
-
If cars are moving fast, it lets more cars onto that road (a green light).
-
If traffic starts to slow down, it reduces the number of cars it lets on (a yellow light) to ease congestion.
-
If there’s a total gridlock, it temporarily closes the on-ramp (a red light) to give the jam a chance to clear.
-
Our system does the exact same thing with user requests. It constantly monitors the performance (speed) of a feature and adjusts the allowed traffic (request rate) accordingly. This ensures that one struggling component doesn’t bring everything else down with it.
The Technical Deep Dive
This section breaks down the mechanics of how the system makes its decisions. The logic is based on two key metrics and a set of configurable rules.
Rate Limiting Rules
Our system uses two distinct rules for rate limiting, based on the type of content being requested. This allows us to apply different responses for users browsing the site versus background requests for assets.
- HTML Content Rule: This applies to standard web page requests. If rate-limited, it serves a user-friendly HTML error page.
- Non-HTML Content Rule: This applies to assets like API calls (
/api/), JavaScript (.js), CSS (.css), images (.png), and JSON data (.json). If rate-limited, it returns an empty response with the appropriate error code.
The core logic for calculating the allowed rate is the same for both rules.
Core Metrics
For every URL or API endpoint, the system tracks two things over a 60-second interval:
-
request.count.by.url(Current Traffic): This is a simple counter. It measures the number of requests a URL has received in the last 60 seconds. -
request.resp.avg.by.url(Current Speed): This measures the average time, in milliseconds, that the URL took to respond to requests over the last 60 seconds.
The fundamental decision for every new request is: Is the "Current Traffic" less than the "Allowed Traffic"?
If yes, the request goes through. If no, it is denied. The magic is in how the “Allowed Traffic” (or allowedRate) is calculated.
Configuration Parameters
The system’s behavior is defined by six key parameters, which set the boundaries for its decisions.
| Parameter | Our Value | What it Means |
|---|---|---|
minValue |
300 ms | The response time threshold for a “perfectly healthy” service. Any response time below this is considered excellent. |
maxValue |
18,000 ms | The response time threshold for a “critically overloaded” service. Any response time above this indicates a major problem. |
maxRate |
240 req/min | The maximum request rate allowed when the service is healthy (i.e., when response time is at or below minValue). |
minRate |
4 req/min | The minimum request rate allowed when the service is overloaded (i.e., when response time is at or above maxValue). |
countMetricName |
request.count.by.url |
The name of the metric used to measure current traffic. |
valueMetricName |
request.resp.avg.by.url |
The name of the metric used to measure performance (speed). |
The Calculation Logic (calculateAllowedRate)
This is the brain of the system. It determines the allowedRate based on the average response time.
Scenario 1: Healthy State (The Green Light)
-
Condition: The average response time is less than or equal to
minValue(300 ms). -
Action: The system returns the
maxRate. -
Result: The allowed rate is 240 requests per minute. The service is fast, so we allow maximum traffic.
Scenario 2: Overloaded State (The Red Light)
-
Condition: The average response time is greater than or equal to
maxValue(18,000 ms). -
Action: The system returns the
minRate. -
Result: The allowed rate is 4 requests per minute. The service is struggling badly, so we drastically reduce traffic to allow it to recover.
Scenario 3: Degrading State (The Sliding Scale)
-
Condition: The average response time is between
minValueandmaxValue(301 ms - 17,999 ms). -
Action: The system calculates the allowed rate using linear interpolation. This means the allowed rate decreases smoothly as the response time increases.
-
Result: The allowed rate will be a value somewhere between 4 and 240. For example, a moderately slow response time will result in a moderate reduction in the allowed traffic.
A Practical Example
Let’s walk through a real-world example to see the system in action.
-
Situation: The average response time for the
/api/dashboardendpoint is currently 5,000 ms (5 seconds). -
Step 1: Check the state.
- 5,000 ms is between our
minValue(300) andmaxValue(18,000). The system will use the sliding scale calculation.
- 5,000 ms is between our
-
Step 2: Calculate the allowed rate.
-
The system calculates a point on the line between our two extremes: (300ms, 240 req/min) and (18000ms, 4 req/min).
-
Based on the formula, an average response time of 5,000 ms results in an
allowedRateof approximately 177 requests per minute.
-
-
Step 3: Make the decision.
-
The system checks the
request.count.by.urlfor the/api/dashboardendpoint for the last 60 seconds. -
If the current count is 150, the request is ALLOWED because
150 < 177. -
If the current count is 180, the request is DENIED because
180is not less than177.
-
The Response (When a Request is Denied)
When the system denies a request, it doesn’t just drop it silently. It sends a specific response to inform you (e.g., a web browser or an automated tool) what has happened.
HTTP Status Code: 429 Too Many Requests
Every denied request receives an HTTP 429 Too Many Requests status code. This is the standard way to signal that a rate limit has been enforced. It tells browsers, search engines, and other automated systems to back off and try again later.
Response Content: It Depends on the Request
The body of the response varies based on what was requested:
-
For HTML Pages: If a user browsing the site gets rate-limited, we display a friendly, full-page HTML error message titled “Whoa, Slow Down There!”. This ensures a clear and non-technical experience for the end-user.
-
For Non-HTML Assets (APIs, JS, CSS, etc.): If a background request for an asset is denied, the system returns an empty response. Sending a full HTML page in this context would be wasteful and could potentially break the rendering of the website.
No-Cache Headers
Along with the 429 status, we send a set of headers to prevent the error response from being cached by browsers or proxy servers. This is critical to ensure that once the rate limit is lifted, the client can fetch the actual content without getting a stale error page.
If you have any questions, please reach out to our support team.