1

I have a large Postgres table contain geographic positions (latitude and longitude) both fields are indexed, and both are defined as NUMERIC(9,6).

If I run a query looking for an exact position match, something like this:

WHERE latitude  = 1.234567890123456789
AND   longitude = 9.876543210987654321

Then get a very fast response, but I get very few results because the database is searching for a very precise match.

For my purposes, I'm looking for positions that match to within a few meters so a match to 4 or 5 decimal places should be fine. This gives me the results I'm looking for:

WHERE ABS(latitude  - 1.234567890123456789) < 0.0001
AND   ABS(longitude - 9.876543210987654321) < 0.0001

But NOT the performance (it can take 5 minutes to run, compared to a fraction of a second for the exact search)

Next I tried rounding the precision down:

WHERE ROUND( latitude, 4) = ROUND( 1.234567890123456789, 4)
AND   ROUND( longitude,4) = ROUND( 9.876543210987654321, 4)

Again, same problem. Got the results I wanted, but took far too long.

So, my question is how can I search for a close match between two numbers, without losing performance?

UPDATE - SOLVED:
As a couple of commenters have observed, using BETWEEN seems to work fine.

Erwin Brandstetter
  • 146,376
  • 19
  • 347
  • 493
ConanTheGerbil
  • 769
  • 1
  • 10
  • 22
  • 1
    your where clause has to lok up every single row to see if it fits there is no short cut. Besides, please read https://dba.meta.stackexchange.com/questions/3034/asking-query-performance-questions/3035#3035 for such geographical problems exist https://postgis.net/ – nbk Dec 29 '20 at 13:20
  • Can you store a copy of the less precise `latitude` and `longitude` in additional fields on the same table, and create the index on those new fields? By pre-staging the data this way, it'll already be materialized in a form that you likely can more efficiently query the index for. Otherwise like nbk stated, functions and arithmetic in the `WHERE` clause is what's causing your inefficiencies and there's not many other ways around it other than pre-staging the data in the format you need (i.e. less precision in this case). – J.D. Dec 29 '20 at 13:26
  • 2
    Try `WHERE latitude BETWEEN 1.234567890123456789 - 0.0001 AND 1.234567890123456789 + 0.0001 AND longitude BETWEEN 9.876543210987654321 - 0.0001 AND 9.876543210987654321 + 0.0001` – mustaccio Dec 29 '20 at 13:39
  • Do not use `WHERE ABS(coordinate - @position) < @accuracy`, use `WHERE coordinate BETWEEN @position - @accuracy AND @position + @accuracy`. *both fields are indexed* Do you mean one composite index or two separate indices? the former is preferred... – Akina Dec 29 '20 at 13:39

1 Answers1

1

The smart and fast solution for this class of problems is an index-backed "nearest neighbor" search.

For the record: if you want precise results with spatial data use PostGis and operate with geometry or geography types. Here is a starting point. And operate with ST_DWithin(). Examples:

Sticking to your setup (2-D points, no PostGis), and ignoring the additional approximation error of handling spatial data in a 2-D plain, which seems negligible for the case at hand - I suggest a space-partitioned GiST index (on an expression in your case):

CREATE INDEX tbl_location_spgist_idx ON tbl USING SPGIST (point(longitude, latitude));

Why SP-Gist? See:

To get a maximum of 10 "nearest neighbors" in next to no time:

SELECT *
FROM   tbl
ORDER  BY point (longitude, latitude)
      <-> point '(9.876543210987654321,1.234567890123456789)'
LIMIT  10;

You can then filter the ones close enough. To get a maximum of 10 closest within a maximum distance:

SELECT *, ((longitude - 9.876543210987654321) ^ 2
        + (latitude   - 1.234567890123456789) ^ 2) AS squared_dist
FROM   tbl
WHERE  ((longitude - 9.876543210987654321) ^ 2
      + (latitude  - 1.234567890123456789) ^ 2) < 0.000000001  -- or whatever
ORDER  BY point (longitude, latitude)
      <-> point '(9.876543210987654321,1.234567890123456789)'
LIMIT  10;

db<>fiddle here

I use a squared distance to get nearest neighbors based on simple Pythagorean theorem. The beauty of it: the calculation is only performed on the nearest neighbors, so it's still very fast when the calculation gets more expensive - even in big tables.

Erwin Brandstetter
  • 146,376
  • 19
  • 347
  • 493