Generating Histograms in Postgres
Typically when I'm doing ad hoc analysis with data in Postgres, I need a histogram of records binned by some timestamp or age. This page distills my typical quick uses. Start at Sample Setup at the bottom if you want to set up sample data to play with.
Note: the examples do not fill gaps in sparse data. See Crunchy Data: Easy PostgreSQL Time Bins for a complete solution generating histograms ready for plotting.
1. Timestamp Bins
Postgres 14+: use date_bin
.
select date(date_bin('1 week'::interval, ts, current_date - '12 months'::interval)) as interval, count(*) as count from blooms where ts > current_timestamp - '12 months'::interval group by interval order by interval;
interval count 2023-04-07 1 2023-04-14 2
Postgres 13-: use width_bucket
, generate_series
, and a cross join. (Ok,
hardcoding the start time three times is starting to bug me. How do I factor
this better, SQL experts?)
with bins as ( select array_agg(a) as bins from generate_series(CURRENT_DATE - '12 months'::interval, CURRENT_DATE, '1 week'::interval) a ), counts as ( select width_bucket(ts, bins.bins) as bin, count(*) as count from blooms cross join bins where ts > current_date - '12 months'::interval group by bin ) select date(current_date - '12 months'::interval + make_interval(weeks => bin)) as interval, count from counts order by bin;
2. Age Bins
To work with deltas between timestamps at a lower level in seconds or hours, extract
epoch seconds and bin with width_bucket
.
select width_bucket(extract(epoch from ts - '2023-04-01'), 0, 3600 * 24 * 30, 24 * 30) as age_hr, count(*) as count from blooms where ts > '2023-04-01' group by age_hr order by age_hr;
age_hr count 217 1 433 1 457 1
3. Sample Setup
To test the queries in this post, any Postgres server and client will do. I'm using Docker and psql via postgresql-client-16 (Linux installation).
Start and connect to a throwaway Postgres server in Docker:
docker run --name postgres -e POSTGRES_PASSWORD=local --detach --publish 5432:5432 postgres
psql -h localhost postgres postgres
Set up a table with a timestamp column:
create table blooms ( plant text, area text, ts timestamptz ); insert into blooms values ('dogwood', 'deck', '2023-04-10'), ('tulip', 'front', '2023-04-19'), ('oak', 'side', '2023-04-20'), ('hemlock', 'side', '2022-04-15');