Searching for a close numeric match on spatial coordinates

Question

I have a large Postgres table contain geographic positions (latitude and longitude) both fields are indexed, and both are defined as NUMERIC(9,6).

If I run a query looking for an exact position match, something like this:

WHERE latitude  = 1.234567890123456789
AND   longitude = 9.876543210987654321

Then get a very fast response, but I get very few results because the database is searching for a very precise match.

For my purposes, I'm looking for positions that match to within a few meters so a match to 4 or 5 decimal places should be fine. This gives me the results I'm looking for:

WHERE ABS(latitude  - 1.234567890123456789) < 0.0001
AND   ABS(longitude - 9.876543210987654321) < 0.0001

But NOT the performance (it can take 5 minutes to run, compared to a fraction of a second for the exact search)

Next I tried rounding the precision down:

WHERE ROUND( latitude, 4) = ROUND( 1.234567890123456789, 4)
AND   ROUND( longitude,4) = ROUND( 9.876543210987654321, 4)

Again, same problem. Got the results I wanted, but took far too long.

So, my question is how can I search for a close match between two numbers, without losing performance?

UPDATE - SOLVED:
As a couple of commenters have observed, using BETWEEN seems to work fine.

your where clause has to lok up every single row to see if it fits there is no short cut. Besides, please read https://dba.meta.stackexchange.com/questions/3034/asking-query-performance-questions/3035#3035 for such geographical problems exist https://postgis.net/ — nbk, Dec 29 '20 at 13:20
Can you store a copy of the less precise `latitude` and `longitude` in additional fields on the same table, and create the index on those new fields? By pre-staging the data this way, it'll already be materialized in a form that you likely can more efficiently query the index for. Otherwise like nbk stated, functions and arithmetic in the `WHERE` clause is what's causing your inefficiencies and there's not many other ways around it other than pre-staging the data in the format you need (i.e. less precision in this case). — J.D., Dec 29 '20 at 13:26
Try `WHERE latitude BETWEEN 1.234567890123456789 - 0.0001 AND 1.234567890123456789 + 0.0001 AND longitude BETWEEN 9.876543210987654321 - 0.0001 AND 9.876543210987654321 + 0.0001` — mustaccio, Dec 29 '20 at 13:39
Do not use `WHERE ABS(coordinate - @position) < @accuracy`, use `WHERE coordinate BETWEEN @position - @accuracy AND @position + @accuracy`. *both fields are indexed* Do you mean one composite index or two separate indices? the former is preferred... — Akina, Dec 29 '20 at 13:39

Erwin Brandstetter · Answer 1 · 2021-01-02T19:04:52.953

The smart and fast solution for this class of problems is an index-backed "nearest neighbor" search.

For the record: if you want precise results with spatial data use PostGis and operate with geometry or geography types. Here is a starting point. And operate with ST_DWithin(). Examples:

Sticking to your setup (2-D points, no PostGis), and ignoring the additional approximation error of handling spatial data in a 2-D plain, which seems negligible for the case at hand - I suggest a space-partitioned GiST index (on an expression in your case):

CREATE INDEX tbl_location_spgist_idx ON tbl USING SPGIST (point(longitude, latitude));

Why SP-Gist? See:

To get a maximum of 10 "nearest neighbors" in next to no time:

SELECT *
FROM   tbl
ORDER  BY point (longitude, latitude)
      <-> point '(9.876543210987654321,1.234567890123456789)'
LIMIT  10;

You can then filter the ones close enough. To get a maximum of 10 closest within a maximum distance:

SELECT *, ((longitude - 9.876543210987654321) ^ 2
        + (latitude   - 1.234567890123456789) ^ 2) AS squared_dist
FROM   tbl
WHERE  ((longitude - 9.876543210987654321) ^ 2
      + (latitude  - 1.234567890123456789) ^ 2) < 0.000000001  -- or whatever
ORDER  BY point (longitude, latitude)
      <-> point '(9.876543210987654321,1.234567890123456789)'
LIMIT  10;

db<>fiddle here

I use a squared distance to get nearest neighbors based on simple Pythagorean theorem. The beauty of it: the calculation is only performed on the nearest neighbors, so it's still very fast when the calculation gets more expensive - even in big tables.

Searching for a close numeric match on spatial coordinates

1 Answers1