Questions tagged [data-warehouse]

A database system optimised for reporting, particularly in aggregate. Often, but not always implemented using a star schema.

A Data Warehouse is a specialised database system that is optimised for reporting or at least easy extraction of data. One important point is that the data in a data warehouse is usually loaded from external systems, and the database design has significant differences to those used for transactional systems. Data warehouse systems often have several characteristic features:

  • Use of star schemas with central fact tables joining to dimension tables. These facilitate fast aggregate reporting and simple query plans. Sometimes other designs such as snowflake schemas or relatively normalised operational data store databases are employed.

  • Storage of historical data - often data warehouse systems are used for analytical queries that examine historical data, or trends in aggregate figures over time.

  • ETL processes to load the data from external sources. Data warehouse systems often function to aggregate data from multiple sources.

  • Conformed data - data from multiple sources is often transformed into a common format, allowing data from multiple sources to be queried at the same time.

409 questions
28
votes
5 answers

What are some ways to implement a many-to-many relationship in a data warehouse?

The dominant topologies of Data Warehouse modelling (Star, Snowflake) are designed with one-to-many relationships in mind. Query readability, performance, and structure degrades severely when faced with a many-to-many relationship in these modelling…
Brian Ballsun-Stanton
  • 4,683
  • 2
  • 27
  • 36
23
votes
3 answers

What are the arguments in favor of using ELT process over ETL?

I realized that my company uses an ELT (extract-load-transform) process instead of using an ETL (extract-transform-load) process. What are the differences in the two approaches and in which situations would one be "better" than the other? It would…
What'sUP
  • 783
  • 2
  • 6
  • 16
21
votes
4 answers

Compare two similar Postgres databases for differences

I occasionally download publicly available data sets in the form of Postgres dBs. These datasets are updated/modified/expanded over time by the repository host. Is there a Postgres command or tool (ideally FOSS) that can show the differences…
CuriousGorge
  • 221
  • 1
  • 2
  • 4
20
votes
3 answers

Clustered columnstore indexes and foreign keys

I am performance tuning a data warehouse using indexes. I am fairly new to SQL Server 2014.Microsoft describes the following: "We view the clustered columnstore index as the standard for storing large data warehousing fact tables, and expect it…
19
votes
2 answers

Difference between star schema and data cube?

I am involved in a new project, where I have to create data cube from the existing relational database system. I understood that, the existing system is not properly designed, I am not sure where to start. My question are: What is difference…
Rathish Kumar B
  • 2,006
  • 5
  • 19
  • 34
17
votes
2 answers

Open Source Business Intelligence/DWH solutions

I wonder that this question hasn't already been asked. Google only has very few results for me that don't show a high quality tool What are some Open Source (also free is ok) solutions for Data Warehouses and more specifically Business Intelligence…
17
votes
1 answer

Query strategies using SQL Server 2016 system-versioned temporal tables for Slowly-Changing Dimensions

When using a system-versioned temporal table (new in SQL Server 2016), what are the query authoring and performance implications when this feature is used to handle Slowly Changing Dimensions in a large relational data warehouse? For example, assume…
14
votes
2 answers

Handling time zones in data mart/warehouse

We are starting to design the building blocks of a data mart/warehouse and we need to be able to support all time zones (our clients are from all over the world). From reading discussions online (and in books), a common solution seems to be to have…
13
votes
2 answers

Alternative to EAV for dynamic fields in a star schema data warehouse

I need to support dynamic fields and values in a big datawarehouse for storing API requests log, my user case is that I need to store all API requests query string and able to perform query against them in the future (so it is not just storage, so I…
Howard
  • 249
  • 1
  • 5
  • 10
13
votes
2 answers

ETL: extracting from 200 tables - SSIS data flow or custom T-SQL?

Based on my analysis, a complete dimensional model for our data warehouse will require extraction from over 200 source tables. Some of these tables will be extracted as part of an incremental load and others will be a full load. To note, we have…
8kb
  • 2,539
  • 2
  • 27
  • 35
12
votes
1 answer

Should I disable "auto update statistics" in a data warehousing scenario?

I have 200 GB data warehouse in SQL Server. I have been experiencing really slow execution times for some queries; for example 12 hours for a simple delete query with an inner join. After doing some research with the execution plans, I've updated…
saso
  • 223
  • 2
  • 6
11
votes
1 answer

Limit the degree of parallelism (DOP) available to any query

On Oracle Exadata (11gR2), we have a relatively beefy database. cpu_count is 24 parallel_server_instances is 2 parallel_threads_per_cpu is 2 We noted, through observation in Oracle Enterprise Manager (OEM), that performance was terrible due to…
grenade
  • 211
  • 1
  • 2
  • 6
11
votes
3 answers

Is SQL Server data compression categorically good for read-only databases?

Some literature on SQL Server data compression I read state that the write cost increases to about four times what would normally be required. It also seems to imply that this is the primary downside to data compression, strongly implying that for…
孔夫子
  • 4,258
  • 3
  • 26
  • 49
11
votes
1 answer

How much RAM should I get for a cloud-hosted PostgreSQL data warehouse?

I'm looking at migrating a current PostgreSQL data warehouse to a cloud host with SSD storage and RAM as one of the main sizing variables. The bulkiest data we're dealing with at the moment will live on monthly partitioned tables. Each month is…
raphael
  • 469
  • 1
  • 4
  • 17
11
votes
2 answers

PostgreSQL for high volume transactions and for Data warehousing

Am quite new to PostgreSQL, I have never done a large deployment using it before. But, I have good experience in enterprise solutions and I want to try and apply some of what I learned using PostgreSQL. I have site which is sized to handle large…
Mo J. Mughrabi
  • 211
  • 1
  • 2
  • 4
1
2 3
27 28