I have a scenario where a search on a range with the same value on both sides is using a plan with an index that is not the composite index I expected. After some work, I was able to generate a sample data that shows the different planning. The first query uses the idx_hourts_btree index, but the second is using another composite index. In the real scenario, the plan using the "anti-natural" gist composite generates very slow queries.
Setup:
create table sampledata as
select (row_number() over ())::int
, extract(hour from generate_series)::int as hour
, extract(minute from generate_series)::int as minute
, extract(second from generate_series)::int as second
, generate_series as ts
from generate_series (timestamptz '2004-03-08 18:29:00'
, timestamptz '2004-03-08 18:31:00'
, interval '1 millisecond');
create index idx_HourTs_btree on sampledata using btree(hour,ts);
create index idx_idts_gist on sampledata using gist(row_number,minute,second,ts);
analyze sampledata;
# \d sampledata
Table "public.sampledata"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+--- ------
row_number | integer | | |
hour | integer | | |
minute | integer | | |
second | integer | | |
ts | timestamp with time zone | | |
Indexes:
"idx_hourts_btree" btree (hour, ts)
"idx_idts_gist" gist (row_number, minute, second, ts)
Queries:
explain analyze
select * from sampledata where hour = 18 and ts between '2004-03-08 18:30:00.991' and '2004-03-08 18:30:00.993';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_hourts_btree on sampledata (cost=0.57..8.59 rows=1 width=16) (actual time=3.077..3.079 rows=3 loops=1)
Index Cond: ((hour = 18) AND (ts >= '2004-03-08 18:30:00.991-03'::timestamp with time zone) AND (ts <= '2004-03-08 18:30:00.993-03'::timestamp with time zone))
Planning time: 0.114 ms
Execution time: 3.101 ms
(4 rows)
Time: 4.090 ms
explain analyze
select * from sampledata where hour = 18 and ts between '2004-03-08 18:30:00.993' and '2004-03-08 18:30:00.993';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_idts_gist on sampledata (cost=0.55..8.57 rows=1 width=16) (actual time=5.985..5.988 rows=1 loops=1)
Index Cond: ((ts >= '2004-03-08 18:30:00.993-03'::timestamp with time zone) AND (ts <= '2004-03-08 18:30:00.993-03'::timestamp with time zone))
Filter: (hour = 18)
Planning time: 0.153 ms
Execution time: 6.030 ms
(5 rows)
Time: 7.318 ms
PG server: 9.5.9.
Why the different planning?
Here the analyze for the real table, obfuscated. "quebec" is the gist composite index (another column, mike), "uniform" is the btree index (hotel, mike) Here with same value on between (bad exec time): https://explain.depesz.com/s/O5Gf
Here with a small difference on between (good exec time): https://explain.depesz.com/s/67yL
It's possible to simulate the problem with the table sample data I've provided on the question.
Update 2018-03-14
@Erwin's answer gave me more ideas to improve the sample data to something like the real case. In the previous sample, I was trying to simulate the real data with redundant columns that do not exist in the real world. I need the gist composite index for another use case, so before making a bigger design change I'd like to know if I'm not forgetting something simpler. The new sample should demonstrate the actual difference in execution and planning time. I've also tried increasing the statistics target for columns, with no success.
Setup script:
show default_statistics_target;
show random_page_cost;
drop table if exists sampledata2;
create table sampledata2 as (with a as (select generate_series(1,50) as id) select id, md5(random()::text) rand, generate_series (timestamptz '2004-03-07', timestamptz '2004-03-17', interval '1 minute') ts from a);
select * from sampledata2 limit 3;
create index idx_idTs_btree on sampledata2 using btree(id, ts);
create index idx_randTs_gist on sampledata2 using gist(rand, ts);
analyze sampledata2;
explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:22:00-03';
explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:23:00-03';
alter table sampledata2 alter column id set statistics 10000;
alter table sampledata2 alter column rand set statistics 10000;
alter table sampledata2 alter column ts set statistics 10000;
analyze sampledata2;
explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:22:00-03';
explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:23:00-03';
Output:
default_statistics_target
---------------------------
50
(1 row)
Time: 0.640 ms
random_page_cost
------------------
4
(1 row)
Time: 0.257 ms
DROP TABLE
Time: 35.850 ms
SELECT 720050
Time: 1438.842 ms (00:01.439)
id | rand | ts
----+----------------------------------+------------------------
1 | 8e1d3920ef44f94e71291b2371178ece | 2004-03-07 00:00:00-03
1 | 664fcfc94e09ea0ff050b934e6cb486f | 2004-03-07 00:01:00-03
1 | ac52031c8d98df67e2aacaf7d10b3af7 | 2004-03-07 00:02:00-03
(3 rows)
Time: 0.651 ms
CREATE INDEX
Time: 356.923 ms
CREATE INDEX
Time: 35661.019 ms (00:35.661)
ANALYZE
Time: 86.580 ms
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_randts_gist on sampledata2 (cost=0.41..8.43 rows=1 width=45) (actual time=10.851..18.430 rows=1 loops=1)
Index Cond: ((ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:22:00-03'::timestamp with time zone))
Filter: (id = 42)
Rows Removed by Filter: 49
Planning time: 0.224 ms
Execution time: 18.479 ms
(6 rows)
Time: 19.422 ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_idts_btree on sampledata2 (cost=0.42..8.45 rows=1 width=45) (actual time=0.040..0.041 rows=2 loops=1)
Index Cond: ((id = 42) AND (ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:23:00-03'::timestamp with time zone))
Planning time: 0.144 ms
Execution time: 0.067 ms
(4 rows)
Time: 0.803 ms
ALTER TABLE
Time: 1.220 ms
ALTER TABLE
Time: 0.924 ms
ALTER TABLE
Time: 0.894 ms
ANALYZE
Time: 2675.784 ms (00:02.676)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_randts_gist on sampledata2 (cost=0.41..8.43 rows=1 width=45) (actual time=6.472..11.493 rows=1 loops=1)
Index Cond: ((ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:22:00-03'::timestamp with time zone))
Filter: (id = 42)
Rows Removed by Filter: 49
Planning time: 0.757 ms
Execution time: 11.524 ms
(6 rows)
Time: 12.948 ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_idts_btree on sampledata2 (cost=0.42..8.45 rows=1 width=45) (actual time=0.021..0.022 rows=2 loops=1)
Index Cond: ((id = 42) AND (ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:23:00-03'::timestamp with time zone))
Planning time: 0.505 ms
Execution time: 0.045 ms
(4 rows)
Time: 1.170 ms
Update 2018-03-15
A bug was reported, for this moment there will be an improvement related to this issue in v11. https://www.postgresql.org/message-id/31902.1521064417%40sss.pgh.pa.us