Approximating Queue Length with Little's Law

1. What we have: request rate and response time

I'm just a simple country software engineer trying to do the best I can to make sense of distributed systems. In the constantly evolving effort to build in observability to anticipate and answer any question I or anyone on my team might need to answer about the dynamics of a system running in production, without blowing cost, performance, or mental capacity budgets, I get a ton of mileage out of standard application performance monitoring metrics. Requests by status code, errors, and response time distribution are supported out of the box by web frameworks, HTTP libraries, and database libraries alike, and form the core of many service dashboards. Simple resource or operation name tags support adequate disaggregation for many cases.

2. What we need: queue length

But an important system property the frameworks don't seem to monitor is queue length¹. Sometimes I need this to answer questions like:

Is traffic saturating the configured max concurrency of the web server? (How many requests are in flight in the service?)
Is the service overwhelming the database with concurrent queries? (How many database queries are in flight?)

Could I implement this metric myself? Maybe. The frameworks/libraries probably provide suitable hooks if I really wanted to dig into it. But I really don't want to, and thankfully, I don't have to.

3. Enter: Math

Luckily, John Little gifted us a simple formula that lets us approximate queue length from response time and request rate.

Little's Law (wikipedia):

Little's result, theorem, lemma, law, or formula is a theorem by John Little which states that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system. Expressed algebraically the law is:

L = λW

The beauty of the law is it works for a system as a whole or any subsystem. So we can calculate the concurrency of a top level API or isolate a particular downstream service or database. We can calculate any one variable for any subsystem as long as we monitor the other two variables at that level.

4. Example: Server concurrency

In the server concurrency case, say we have metrics for request rate and response time, and we're looking at this jump a couple hours ago:

Figure 2: Average reponse time (simulated data)

How many requests are in flight after the increase in traffic and latency?

Apply Little's Law:

mean requests in flight = mean request rate * mean response time

(Yes, mean. This is one of the rare times in monitoring where you just use the averages and don't worry about percentiles. Take it up with Mr. Little².)

Voila, requests in flight:

5. It works

That's it. I used this in a Datadog dashboard this week. I checked a client-side application of Little's Law against a server-side component that did measure queue length, and the two matched within 1%.

6. Setup

Aside: I wrote the following code to generate example plots with the help of ChatGPT. The code took under an hour to write and tune. Not bad considering I started with minimal Python experience, no experience with numpy or matplotlib, and have only used ChatGPT for coding once before. Starting from scratch would've taken me 2-3 hours, and I probably wouldn't have bothered for throwaway data for a blog post. Bot-generated code required few and terse prompting. It worked as expected, including implementing the formula (prompt: "Add a series for requests in flight using Little's Law"), except it missed normalizing time units in the formula.

pip3 install matplotlib numpy

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

# Generate time values over 4 hours at 1 minute intervals
time_values = np.arange(0, 4 * 60)

# Generate request rate time-series with a jump from ~100 rps to 250 at 2 hours
mean_rps = np.concatenate([
    np.full(int(60 * 2), 100),
    np.full(int(60 * 2), 250)
])
rps = np.random.normal(mean_rps, 10, len(time_values))

# Generate average response time (milliseconds) time-series with a jump at 2 hours
mean_response_time = np.concatenate([
    np.full(int(60 * 2), 50),
    np.full(int(60 * 2), 170)
])
response_time = np.random.normal(mean_response_time, 10, len(time_values))

# Calculate requests in flight using Little's Law
requests_in_flight = rps * (response_time / 1000)

def plot_non_negative_metric_over_minutes(time_values, y_values, title, ylabel):
    plt.figure(figsize=(9.0, 6.0)).patch.set_facecolor('#fffff8')
    plt.gca().set_facecolor('#fffff8')
    plt.plot(time_values, y_values, color="#111")
    plt.gca().xaxis.set_major_locator(ticker.MultipleLocator(15))
    plt.ylim(ymin=0)
    plt.title(title)
    plt.xlabel("Time (minutes)")
    plt.ylabel(ylabel)
    plt.grid(True)

plot_non_negative_metric_over_minutes(time_values, rps, "Request Rate", "Requests per second")
plt.savefig('rps.svg', format='svg')

plot_non_negative_metric_over_minutes(time_values, response_time, "Response Time - Average", "Response Time (ms)")
plt.savefig('response_time.svg', format='svg')

plot_non_negative_metric_over_minutes(time_values, requests_in_flight, "Requests in Flight", "Requests in Flight")
plt.savefig('requests_in_flight.svg', format='svg')

plt.close('all')

Footnotes

Or, at least, I haven't found the metric in the frameworks I use.

Actually, there is also a distributional form of Little's Law. Knock yourself out.