Questions tagged [optimization]

In the context of a database, optimisation refers to the process of the query optimiser selecting an efficient physical execution plan.

SQL is a declarative language, so the actual physical operations selected to fulfil a query are not under the direct control of the person writing the query. The database management system will have the option of a number of semantically equivalent strategies to execute the query. Most database management systems have a query optimiser that generates the set (or a subset) of possible execution plans and selects the best one.

A database query plan consists of a tree structure of operations that process data to produce the final result. For any given query, there may be multiple query plans that are semantically equivalent (i.e. will return the same final result) but vary in their structure and performance. A query plan can also be transformed algorithmically into other semantically equivalent query plans.

For example, a predicate (filter) could be applied at various points in a query. It may be quicker to apply it early, reducing the amount of data to be processed by the rest of a complex query. The query optimiser can apply transformations to push the query predicate down toward the leaves of the plan so it is evaluated earlier, which avoids subsequent operations having to process unnecessary data.

Other optimisations possible include generating intermediate results and post-processing them (sometimes known as spool operations), altering the order that tables are joined, and selecting the algorithm used for the join. For example, a hash join is efficient for matching a larger table against a smaller one but has a startup overhead in generating the hash table. A nested loos join is efficient for a query that processes a small amount of data as it has little startup overhead. A merge join processes data sequentially, so it is good for processing multiple large data sets if they can be sorted on the key used for the join.

Most DBMS platforms use a technique called 'cost based query optimisation' that works by joining a large number of query plans and using statistics about data volumes and distributions of key values to estimate a cost metric for the candidate plans. The cheapest plan is then selected and executed. Cost based optimisation is heuristic, and for various reasons a cost based optimiser can produce suboptimal plans. This can necessitate tuning work on the database if the suboptimal plans cause performance issues.

Other optimisation strategies include 'rule based', where a set of transformation rules are applied to query plans where specific patterns are found in the query plan. PostgreSQL has an unusual optimiser based on a genetic algorithm that evolves optimal plans by mutating query plans over time and retaining successful mutations.

1879 questions
136
votes
4 answers

Optimizing queries on a range of timestamps (two columns)

I use PostgreSQL 9.1 on Ubuntu 12.04. I need to select records inside a range of time: my table time_limits has two timestamp fields and one integer property. There are additional columns in my actual table that are not involved with this…
Stephane Rolland
  • 7,003
  • 8
  • 27
  • 39
120
votes
7 answers

How to determine if an Index is required or necessary

I've been running an auto-index tool on our MS SQL database (I modified a script originating from Microsoft that looks at the index statistics tables - Automated Auto Indexing). From the stats, I now have a list of recommendations for indexes that…
misterjaytee
  • 1,303
  • 3
  • 11
  • 8
75
votes
4 answers

Are views harmful for performance in PostgreSQL?

The following is an excerpt from a book about db design (Beginning Database Design ISBN: 0-7645-7490-6): The danger with using views is filtering a query against a view, expecting to read a very small portion of a very large table. Any filtering…
ARX
  • 1,057
  • 2
  • 9
  • 11
65
votes
3 answers

When to use views in MySQL?

When creating tables from multiple joins for use in analysis, when is it preferred to use views versus creating a new table? One reason that I would prefer to use views is that the database schema has been developed by our administrator from within…
David LeBauer
  • 3,002
  • 8
  • 28
  • 33
57
votes
2 answers

Optimizing a Postgres query with a large IN

This query gets a list of posts created by people you follow. You can follow an unlimited number of people, but most people follow < 1000 others. With this style of query, the obvious optimization would be to cache the "Post" ids, but unfortunately…
40
votes
3 answers

Unexpected scans during delete operation using WHERE IN

I've got a query like the following: DELETE FROM tblFEStatsBrowsers WHERE BrowserID NOT IN ( SELECT DISTINCT BrowserID FROM tblFEStatsPaperHits WITH (NOLOCK) WHERE BrowserID IS NOT NULL ) tblFEStatsBrowsers has got 553 rows. tblFEStatsPaperHits…
Mark S. Rasmussen
  • 1,465
  • 1
  • 14
  • 20
40
votes
2 answers

How to partition existing table in postgres?

I would like to partition a table with 1M+ rows by date range. How is this commonly done without requiring much downtime or risking losing data? Here are the strategies I am considering, but open to suggestions: The existing table is the master and…
Evan Appleby
  • 1,023
  • 1
  • 9
  • 14
38
votes
5 answers

Logical operators OR AND in condition and order of conditions in WHERE

Let's examine these two statements: IF (CONDITION 1) OR (CONDITION 2) ... IF (CONDITION 3) AND (CONDITION 4) ... If CONDITION 1 is TRUE, will CONDITION 2 be checked? If CONDITION 3 is FALSE, will CONDITION 4 be checked? What about conditions on…
garik
  • 6,492
  • 10
  • 41
  • 56
38
votes
1 answer

USING construct in JOIN clause can introduce optimization barriers in certain cases?

It was brought to my attention that the USING construct (instead of ON) in the FROM clause of SELECT queries might introduce optimization barriers in certain cases. I mean this key word: SELECT * FROM a JOIN b USING (a_id) Just in more complex…
Erwin Brandstetter
  • 146,376
  • 19
  • 347
  • 493
36
votes
2 answers

How (and why) does TOP impact an execution plan?

For a moderately complex query I am trying to optimize, I noticed that removing the TOP n clause changes the execution plan. I would have guessed that when a query includes TOP n the database engine would run the query ignoring the the TOP clause,…
30
votes
3 answers

How to speed up select distinct?

I have a simple select distinct on some time series data: SELECT DISTINCT user_id FROM events WHERE project_id = 6 AND time > '2015-01-11 8:00:00' AND time < '2015-02-10 8:00:00'; And it takes 112 seconds. Here's the query…
28
votes
5 answers

Why does my SELECT DISTINCT TOP N query scan the entire table?

I've run into a few SELECT DISTINCT TOP N queries which appear to be poorly optimized by the SQL Server query optimizer. Let's start by considering a trivial example: a million row table with two alternating values. I'll use the GetNums function to…
Joe Obbish
  • 29,978
  • 4
  • 56
  • 131
28
votes
4 answers

Why does the estimated cost of (the same) 1000 seeks on a unique index differ in these plans?

In the queries below both execution plans are estimated to perform 1,000 seeks on a unique index. The seeks are driven by an ordered scan on the same source table so seemingly should end up seeking the same values in the same order. Both nested…
Martin Smith
  • 77,689
  • 15
  • 224
  • 316
26
votes
1 answer

EXPLAIN ANALYZE shows no details for queries inside a plpgsql function

I am using a PL/pgSQL function in PostgreSQL 9.3 with several complex queries inside: create function f1() returns integer as $$ declare event tablename%ROWTYPE; .... .... begin FOR event IN SELECT * FROM tablename WHERE condition LOOP …
skumar
  • 271
  • 1
  • 3
  • 7
26
votes
3 answers

SQL Server splits A <> B into A < B OR A > B, yielding strange results if B is non-deterministic

We have encountered an interesting issue with SQL Server. Consider the following repro example: CREATE TABLE #test (s_guid uniqueidentifier PRIMARY KEY); INSERT INTO #test (s_guid) VALUES ('7E28EFF8-A80A-45E4-BFE0-C13989D69618'); SELECT s_guid FROM…
Heinzi
  • 3,075
  • 2
  • 24
  • 38
1
2 3
99 100