SELECT operation with four joins results in bad performance

Question

I'm having problems with a slow query. The query purpose is to get doers ids for X job who didn't exceed specified limits on realizations and also are in range of possible job places.

Resources

Query:

  SELECT DISTINCT doers.id 
  FROM doers
  JOIN doer_locations dl 
      ON dl.doer_id = doers.id
  JOIN job_places jp 
      ON (jp.lat - 0.3147625620715557) < dl.lat 
     AND (jp.lat + 0.3147625620715557) > dl.lat 
     AND (jp.lng - 0.5001626620527362) < dl.lng 
     AND (jp.lng + 0.5001626620527362) > dl.lng
  LEFT JOIN job_realizations jr 
      ON jr.job_place_id = jp.id 
     AND jr.status IN (1, 2, 3, 4)
  LEFT JOIN job_realizations jrpd 
      ON jrpd.job_place_id = jp.id 
     AND jrpd.doer_id = doers.id 
     AND jrpd.status IN (1, 2, 3, 4)
  WHERE (jp.job_id = 1 AND doers.id IS NOT NULL)
  GROUP BY doers.id, jp.id
  HAVING COUNT(DISTINCT jr.id) < jp.realizations_per_place 
     AND COUNT(DISTINCT jrpd.id) < jp.realizations_per_place_per_doer

Depesz explain
Raw explain analyze
Simplified Schema

Consideration

I'm not sure if I read the explain correctly but it seems it loses on performance especially when it calculates stuff on the run also HAVING COUNT(DISTINCT) seems pretty expensive.

Additional information

The type of both the lat and long columns is float.

I compared plain JOIN with RIGHT one and the time it takes to execute both is the same. — mist, Oct 01 '18 at 12:36
You dont have indexes on the foreign keys. Please create btree indexes on the foreign keys used on your joins, and try again. Please post results, and please, just put the results of explain as plain text i think i can help you more if you do. — Luciano Andress Martini, Oct 01 '18 at 17:53
You could use gist indexes and if these are actual geo location points, you could have more accurate results with `ll_to_earth()` function (and gist indexes). See this answer: https://dba.stackexchange.com/questions/158349/how-can-i-speed-up-my-query-on-geo-location-processes/158422 — ypercubeᵀᴹ, Oct 02 '18 at 09:28
Joining "job_realizations" twice where one ON condition is logical prefix of the other ON condition seems very strange to me. It seems like this is just going to generate a lot of extra rows which will then have to be removed by your various distinct clauses. Can you explain more about what this is doing? — jjanes, Oct 02 '18 at 15:10
When you dropped the left joins, does the execution plan change? Your current execution plan is pretty hard to reason about intuitively, and a different plan, even though not faster, might help us gain some insight. — jjanes, Oct 02 '18 at 15:13
@jjanes i need to check 2 things, firstly I have to check if particular job_place didn't reach realizations limit (.realizations_per_place) and secondly I check if particular doer didn't exceed special limit (.realizations_per_place_per_doer). Basically it can look like this: job_place has limit 4 and limit per doer 2. If X doer has more than 2 realizations in such place it should ignore him, also when the place has more than 4 realizations (in total) it should ignore this place. — mist, Oct 03 '18 at 08:42

score 0 · Answer 1 · answered Oct 01 '18 at 19:34

    --I think this is mandatory for your query performance:
    --Because you do joins using this columns from parent to child 

    create index on doer_locations(doer_id);
    create index on job_realizations(job_place_id);
    create index on job_realizations(doer_id);

    --Maybe very big and slowdown other operations... 
    --create index on job_realizations(lat);
    --create index on doer_locations(lat); 
    --create index on job_realizations(lng);
    --create index on doer_locations(lng); 

    --Maybe not mandatory: 
    create index on job_realization(realizations_per_place);
    create index on job_realization(realizations_per_place_per_doer); 

SELECT DISTINCT doers.id FROM doers
    JOIN doer_locations dl ON dl.doer_id = doers.id
    JOIN job_places jp ON (jp.lat - 0.3147625620715557) < dl.lat AND (jp.lat + 0.3147625620715557) > dl.lat AND (jp.lng - 0.5001626620527362) < dl.lng AND (jp.lng + 0.5001626620527362) > dl.lng
    LEFT JOIN job_realizations jr ON jr.job_place_id = jp.id
    LEFT JOIN job_realizations jrpd ON jrpd.job_place_id = jp.id AND jrpd.doer_id = doers.id
    WHERE (jp.job_id = 1 AND doers.id IS NOT NULL)
    GROUP BY doers.id, jp.id
    HAVING COUNT(DISTINCT jr.id) < jp.realizations_per_place AND COUNT(DISTINCT jrpd.id) < jp.realizations_per_place_per_doer

Please try and if it solve your problem select it as the right answer.

As you could see in the explain I posted, I have tried with btree indexes on most of those attributes. I haven't tried with indexes on realizations_per_place / realizations_per_place_per_doer though. — mist, Oct 02 '18 at 07:40

SELECT operation with four joins results in bad performance

1 Answers1