Reducing CREATE TABLE overhead?

Question

I have a simple SQL query: SELECT * FROM table; - a simple sequential scan - which takes me 10s.

When I add CREATE TABLE AS (CTAS syntax) to it, i.e. my query is CREATE TABLE db_test AS SELECT * FROM table;, the query now takes 18s.

I could not see any notable difference between the plan.

What are the steps I can take to minimize this 8s overhead? I have already tried using UNLOGGED keyword.

How big is your table? Is the `select` returning all the rows, or is it going to a paginator and you cancel the query? — Colin 't Hart, Mar 18 '20 at 12:01
It has 6m rows. I am not using paging; my query is literally as simple as mentioned. — Zeruno, Mar 18 '20 at 12:05
Well, writing all those rows takes some time. There is no way you can avoid that if you really need to copy all the rows. — a_horse_with_no_name, Mar 18 '20 at 12:06
@a_horse_with_no_name yeah... I just thought there would be some parameter or technique to help improve the process or maybe even delay writing to disk, before me having to resort to better hardware. I guess not? — Zeruno, Mar 18 '20 at 12:08

score 0 · Answer 1 · answered Mar 18 '20 at 15:11

I could not see any notable difference between the plan.

That's because there isn't any.
Table Scanning is Table Scanning. As Lewis Carroll's King of Wonderland said:

Begin at the beginning and go on till you come to the end: then stop.

The difference is in what's being done with the data:

reading the data

... takes 10 seconds.

Creating a new table based on the structure of the first table,
reading the data, and
writing the data into the new table

... takes 18 seconds.

Spotted the difference?

Reading stuff in a database is [usually] quick. It's often just floating around in the Buffer Cache and can be quickly sent on its way.

Writing stuff in a database is slower. It [usually] requires logging into the Transaction log first, then writing to the Buffer Cache and flushing out to disk (at some point). It all takes time.

Consider:

Why are you creating a copy of the table at all? If the second table contains everything in the first one, then why not just use the first one?
Do you really want all the rows? If you're reading everything, then there's not much you can do to speed things up. No amount of indexing can speed up a sequential Table Scan.
Do you really want all the columns? If you're only interested in three, little, columns but are retrieving a hundred columns, full of massive BLOB data, then things will get a bit slow.

Of course, introducing more work and writing to disk in particular explains the difference! I have considered your suggestions, but I am interested in making the query go faster in its "raw" fashion, too, so to speak. For example, I would be interested in creating memory tables to avoid disk usage/rely less on writing to disk (my use case supports it), tweaking some parameter, compression... what kind of options are available in this domain and within PostgreSQL, if at all? Getting new hardware is a last resort. — Zeruno, Mar 18 '20 at 15:43
Just to clarify, I have already considered "logical" optimizations, now I am considering something more practical. — Zeruno, Mar 18 '20 at 15:50

Reducing CREATE TABLE overhead?

1 Answers1