PostgreSQL 10.4
I have a table that has already an index on date column. Current query plan is doing a Bitmap Heap Scan using the existing date index. I would like to add a new index for this query, no parameters are injected to the query, I started with a partial index to the status column, but I don't know how to handle group and sort by part.
select date, hour, sum(installs) as installs, sum(clicks) as clicks
from ho_aggregated_stats
where date > (current_date - interval '2 day')
and (status='approved' or status is null)
group by date, hour
order by date, hour;
explain https://explain.depesz.com/s/rnCW
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=992433.95..992442.13 rows=1488 width=24) (actual time=3903.296..3903.337 rows=43 loops=1)
Group Key: date, hour
Buffers: shared hit=85314 read=11496
I/O Timings: read=2896.216
-> Sort (cost=992433.95..992434.69 rows=1488 width=24) (actual time=3903.290..3903.298 rows=86 loops=1)
Sort Key: date, hour
Sort Method: quicksort Memory: 31kB
Buffers: shared hit=85314 read=11496
I/O Timings: read=2896.216
-> Gather (cost=992265.00..992418.27 rows=1488 width=24) (actual time=3903.167..3903.233 rows=86 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=85314 read=11496
I/O Timings: read=2896.216
-> Partial HashAggregate (cost=991265.00..991269.47 rows=1488 width=24) (actual time=3899.779..3899.808 rows=43 loops=2)
Group Key: date, hour
Buffers: shared hit=149987 read=16557
I/O Timings: read=4694.060
-> Parallel Bitmap Heap Scan on ho_aggregated_stats (cost=21995.80..990158.35 rows=553327 width=16) (actual time=1232.325..3623.710 rows=592709 loops=2)
Recheck Cond: (date > (CURRENT_DATE - '2 days'::interval))
Filter: (((status)::text = 'approved'::text) OR (status IS NULL))
Rows Removed by Filter: 3946
Heap Blocks: exact=91807
Buffers: shared hit=149987 read=16557
I/O Timings: read=4694.060
-> Bitmap Index Scan on index_ho_aggregated_stats_on_date (cost=0.00..21948.76 rows=1160433 width=0) (actual time=1194.685..1194.685 rows=1339010 loops=1)
Index Cond: (date > (CURRENT_DATE - '2 days'::interval))
Buffers: shared read=5003
I/O Timings: read=1082.452
Planning time: 0.611 ms
Execution time: 3948.178 ms
table schema
CREATE TABLE public.stats (
id bigserial NOT NULL,
"date" date NOT NULL,
"hour" int4 NOT NULL,
status varchar NULL,
installs int4 NULL DEFAULT 0,
clicks int4 NULL DEFAULT 0,
CONSTRAINT stats_pkey PRIMARY KEY (id)
)
CREATE INDEX index_stats_on_date ON public.stats USING btree (date);
Estimate row count: ~40M
Update: I checked the distribution on the status column and 75% is null, 20% approved, 5% rejected, thinking the index on status is not necessary.