1

I'm having problems with a slow query. The query purpose is to get doers ids for X job who didn't exceed specified limits on realizations and also are in range of possible job places.

Resources

  • Query:

      SELECT DISTINCT doers.id 
      FROM doers
      JOIN doer_locations dl 
          ON dl.doer_id = doers.id
      JOIN job_places jp 
          ON (jp.lat - 0.3147625620715557) < dl.lat 
         AND (jp.lat + 0.3147625620715557) > dl.lat 
         AND (jp.lng - 0.5001626620527362) < dl.lng 
         AND (jp.lng + 0.5001626620527362) > dl.lng
      LEFT JOIN job_realizations jr 
          ON jr.job_place_id = jp.id 
         AND jr.status IN (1, 2, 3, 4)
      LEFT JOIN job_realizations jrpd 
          ON jrpd.job_place_id = jp.id 
         AND jrpd.doer_id = doers.id 
         AND jrpd.status IN (1, 2, 3, 4)
      WHERE (jp.job_id = 1 AND doers.id IS NOT NULL)
      GROUP BY doers.id, jp.id
      HAVING COUNT(DISTINCT jr.id) < jp.realizations_per_place 
         AND COUNT(DISTINCT jrpd.id) < jp.realizations_per_place_per_doer
    
  • Depesz explain

  • Raw explain analyze

  • Simplified Schema

Consideration

I'm not sure if I read the explain correctly but it seems it loses on performance especially when it calculates stuff on the run also HAVING COUNT(DISTINCT) seems pretty expensive.

Additional information

The type of both the lat and long columns is float.

Lennart
  • 20,804
  • 2
  • 26
  • 61
mist
  • 29
  • 3
  • I compared plain JOIN with RIGHT one and the time it takes to execute both is the same. – mist Oct 01 '18 at 12:36
  • You dont have indexes on the foreign keys. Please create btree indexes on the foreign keys used on your joins, and try again. Please post results, and please, just put the results of explain as plain text i think i can help you more if you do. – Luciano Andress Martini Oct 01 '18 at 17:53
  • 1
    What are the types of `lat` and `long`? – ypercubeᵀᴹ Oct 02 '18 at 08:38
  • @ypercubeᵀᴹ these are floats. – mist Oct 02 '18 at 09:20
  • 3
    You could use gist indexes and if these are actual geo location points, you could have more accurate results with `ll_to_earth()` function (and gist indexes). See this answer: https://dba.stackexchange.com/questions/158349/how-can-i-speed-up-my-query-on-geo-location-processes/158422 – ypercubeᵀᴹ Oct 02 '18 at 09:28
  • Joining "job_realizations" twice where one ON condition is logical prefix of the other ON condition seems very strange to me. It seems like this is just going to generate a lot of extra rows which will then have to be removed by your various distinct clauses. Can you explain more about what this is doing? – jjanes Oct 02 '18 at 15:10
  • When you dropped the left joins, does the execution plan change? Your current execution plan is pretty hard to reason about intuitively, and a different plan, even though not faster, might help us gain some insight. – jjanes Oct 02 '18 at 15:13
  • @jjanes i need to check 2 things, firstly I have to check if particular job_place didn't reach realizations limit (.realizations_per_place) and secondly I check if particular doer didn't exceed special limit (.realizations_per_place_per_doer). Basically it can look like this: job_place has limit 4 and limit per doer 2. If X doer has more than 2 realizations in such place it should ignore him, also when the place has more than 4 realizations (in total) it should ignore this place. – mist Oct 03 '18 at 08:42

1 Answers1

0
    --I think this is mandatory for your query performance:
    --Because you do joins using this columns from parent to child 

    create index on doer_locations(doer_id);
    create index on job_realizations(job_place_id);
    create index on job_realizations(doer_id);

    --Maybe very big and slowdown other operations... 
    --create index on job_realizations(lat);
    --create index on doer_locations(lat); 
    --create index on job_realizations(lng);
    --create index on doer_locations(lng); 

    --Maybe not mandatory: 
    create index on job_realization(realizations_per_place);
    create index on job_realization(realizations_per_place_per_doer); 

SELECT DISTINCT doers.id FROM doers
    JOIN doer_locations dl ON dl.doer_id = doers.id
    JOIN job_places jp ON (jp.lat - 0.3147625620715557) < dl.lat AND (jp.lat + 0.3147625620715557) > dl.lat AND (jp.lng - 0.5001626620527362) < dl.lng AND (jp.lng + 0.5001626620527362) > dl.lng
    LEFT JOIN job_realizations jr ON jr.job_place_id = jp.id
    LEFT JOIN job_realizations jrpd ON jrpd.job_place_id = jp.id AND jrpd.doer_id = doers.id
    WHERE (jp.job_id = 1 AND doers.id IS NOT NULL)
    GROUP BY doers.id, jp.id
    HAVING COUNT(DISTINCT jr.id) < jp.realizations_per_place AND COUNT(DISTINCT jrpd.id) < jp.realizations_per_place_per_doer

Please try and if it solve your problem select it as the right answer.

Luciano Andress Martini
  • 1,317
  • 1
  • 11
  • 25
  • As you could see in the explain I posted, I have tried with btree indexes on most of those attributes. I haven't tried with indexes on realizations_per_place / realizations_per_place_per_doer though. – mist Oct 02 '18 at 07:40