Disclaimer: I am relatively new to PostgreSQL.
I'm wondering how to optimize a query that does 2 INNER JOINs. My scenario is fairly simple:
Select Posts with a photo (Posts.photo IS NOT NULL) and a Hashtag that has the name 'dead' (Hashtags.name = 'dead').
The associations are as follows:
Posts <- PostHashtags -> Hashtags
Posts.id = PostHashtags.postId (FK)
Hashtags.id = PostHashtags.hashtagId (FK)
Here is the query:
SELECT
"Posts".*,
"hashtags"."id" AS "hashtags.id",
"hashtags"."count" AS "hashtags.count",
"hashtags"."name" AS "hashtags.name",
"hashtags"."createdAt" AS "hashtags.createdAt",
"hashtags"."updatedAt" AS "hashtags.updatedAt",
"hashtags"."objectId" AS "hashtags.objectId",
"hashtags"."_etl" AS "hashtags._etl",
"hashtags.PostHashtag"."id" AS "hashtags.PostHashtag.id",
"hashtags.PostHashtag"."createdAt" AS "hashtags.PostHashtag.createdAt",
"hashtags.PostHashtag"."updatedAt" AS "hashtags.PostHashtag.updatedAt",
"hashtags.PostHashtag"."postId" AS "hashtags.PostHashtag.postId",
"hashtags.PostHashtag"."hashtagId" AS "hashtags.PostHashtag.hashtagId",
"hashtags.PostHashtag"."objectId" AS "hashtags.PostHashtag.objectId",
"hashtags.PostHashtag"."_etl" AS "hashtags.PostHashtag._etl"
FROM (
SELECT
"Posts"."id",
"Posts"."note",
"Posts"."photo",
"Posts"."createdAt",
"user"."id" AS "user.id",
"user"."name" AS "user.name"
FROM "Posts" AS "Posts"
INNER JOIN "Users" AS "user" ON "Posts"."userId" = "user"."id"
WHERE "Posts"."photo" IS NOT NULL
AND (
SELECT "PostHashtags"."id" FROM "PostHashtags" AS "PostHashtags"
INNER JOIN "Hashtags" AS "Hashtag" ON "PostHashtags"."hashtagId" = "Hashtag"."id"
WHERE "Posts"."id" = "PostHashtags"."postId"
LIMIT 1
) IS NOT NULL
ORDER BY "Posts"."createdAt" DESC LIMIT 10
) AS "Posts"
INNER JOIN (
"PostHashtags" AS "hashtags.PostHashtag"
INNER JOIN "Hashtags" AS "hashtags" ON "hashtags"."id" = "hashtags.PostHashtag"."hashtagId"
)
ON "Posts"."id" = "hashtags.PostHashtag"."postId"
AND "hashtags"."name" = 'dead'
ORDER BY "Posts"."createdAt" DESC;
EXPLAIN results:
Nested Loop (cost=886222912.89..886223769.55 rows=1 width=277)
Join Filter: ("hashtags.PostHashtag"."postId" = "Posts".id)
-> Limit (cost=886220835.39..886220835.42 rows=10 width=189)
-> Sort (cost=886220835.39..886220988.88 rows=61394 width=189)
Sort Key: "Posts"."createdAt"
-> Nested Loop (cost=0.42..886219508.69 rows=61394 width=189)
-> Seq Scan on "Posts" (cost=0.00..885867917.51 rows=78196 width=177)
Filter: ((photo IS NOT NULL) AND ((SubPlan 1) IS NOT NULL))
SubPlan 1
-> Limit (cost=0.42..815.70 rows=1 width=4)
-> Nested Loop (cost=0.42..815.70 rows=1 width=4)
-> Seq Scan on "PostHashtags" (cost=0.00..811.25 rows=1 width=8)
Filter: ("Posts".id = "postId")
-> Index Only Scan using "Hashtags_pkey" on "Hashtags" "Hashtag" (cost=0.42..4.44 rows=1 width=4)
Index Cond: (id = "PostHashtags"."hashtagId")
-> Index Scan using "Users_pkey" on "Users" "user" (cost=0.42..4.49 rows=1 width=16)
Index Cond: (id = "Posts"."userId")
-> Materialize (cost=2077.50..2933.89 rows=1 width=88)
-> Hash Join (cost=2077.50..2933.89 rows=1 width=88)
Hash Cond: ("hashtags.PostHashtag"."hashtagId" = hashtags.id)
-> Seq Scan on "PostHashtags" "hashtags.PostHashtag" (cost=0.00..721.00 rows=36100 width=40)
-> Hash (cost=2077.49..2077.49 rows=1 width=48)
-> Seq Scan on "Hashtags" hashtags (cost=0.00..2077.49 rows=1 width=48)
Filter: ((name)::text = 'dead'::text)
This query has been simplified slightly. It also performs OUTER JOINS on other data related to Posts, which is why the SELECT must be performed on Posts instead of, say, PostHashtags.
Any help in translating the EXPLAIN to a useful index would be greatly appreciated.
My ideas:
- Build an index on
Posts.photo, but should it be a partial indexWHERE "photo" IS NOT NULL? - Build a
UNIQUEindex onHashtags.name.
I'm not sure if those are necessarily the bottlenecks, though.