September 30, 2023

Generating Histograms in Postgres

Typically when I'm doing ad hoc analysis with data in Postgres, I need a histogram of records binned by some timestamp or age. This page distills my typical quick uses. Start at Sample Setup at the bottom if you want to set up sample data to play with.

Note: the examples do not fill gaps in sparse data. See Crunchy Data: Easy PostgreSQL Time Bins for a complete solution generating histograms ready for plotting.

1. Timestamp Bins

Postgres 14+: use date_bin.

select date(date_bin('1 week'::interval, ts, current_date - '12 months'::interval)) as interval,
       count(*) as count
from blooms
where ts > current_timestamp - '12 months'::interval
group by interval
order by interval;
interval	count
2023-04-07	1
2023-04-14	2

Postgres 13-: use width_bucket, generate_series, and a cross join. (Ok, hardcoding the start time three times is starting to bug me. How do I factor this better, SQL experts?)

with bins as (
    select array_agg(a) as bins
    from generate_series(CURRENT_DATE - '12 months'::interval, CURRENT_DATE, '1 week'::interval) a
),
counts as (
    select
        width_bucket(ts, bins.bins) as bin,
        count(*) as count
    from blooms
    cross join bins
    where ts > current_date - '12 months'::interval
    group by bin
)
select date(current_date - '12 months'::interval + make_interval(weeks => bin)) as interval, count
from counts
order by bin;

2. Age Bins

To work with deltas between timestamps at a lower level in seconds or hours, extract epoch seconds and bin with width_bucket.

select
    width_bucket(extract(epoch from ts - '2023-04-01'), 0, 3600 * 24 * 30, 24 * 30) as age_hr,
    count(*) as count
from blooms
where ts > '2023-04-01'
group by age_hr
order by age_hr;
age_hr	count
217	1
433	1
457	1

3. Sample Setup

To test the queries in this post, any Postgres server and client will do. I'm using Docker and psql via postgresql-client-16 (Linux installation).

Start and connect to a throwaway Postgres server in Docker:

docker run --name postgres -e POSTGRES_PASSWORD=local --detach --publish 5432:5432 postgres
psql -h localhost postgres postgres

Set up a table with a timestamp column:

create table blooms (
    plant text,
    area text,
    ts timestamptz
);

insert into blooms values
    ('dogwood', 'deck', '2023-04-10'),
    ('tulip', 'front', '2023-04-19'),
    ('oak', 'side', '2023-04-20'),
    ('hemlock', 'side', '2022-04-15');

RSS

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.