Data Warehouse Architecture
This posting explains the approach to designing a warehouse
and what
components to actually arrive at an optimal design. The logical aspect
of the design is discussed here (physical architecture will be
discussed in subsequent posting).
Now, there are tables in OLTP that satisfy the need for faster processing. And there are tables to be put in Data Warehouse that will hold the same data being passed on from Online systems (either raw or aggregated).
Do Tables in Data Warehouse follow Relational concepts?
Yes Data Warehousing designs also follow relation concepts and ensure that primary and foreign key relations exist between the tables. The only difference here is that the design should be focused on query and analysis rather than transaction processing. This would mean that in some situation you may actually be de-normalizing the data for faster and better query analysis.
The Dimension and Fact tables will exist with proper Primary-Foreign key links, but there is every possibility of duplicate data existing in the Dimension to faciliate better reporting.
My recommendation is as much as possible, stick to relation concept and design a Star schema architecture. Further explaination follows.
Star Schema concepts
The most natural way to model a data warehouse is using a star schema methodology. The idea is to make use of one (or minimal) join to establish relationship between the data to be searched and the Criteria that is provided.As the name sounds, you are creating a Star shaped architecture. At the center of the star is the actual transactional data, i.e. the fact table. This data can only be accessed using various other tables that give meaning to the data, i.e. the Lookup or Dimensional tables.

So a Star schema will have one Fact and multiple Dimension tables. You could land up with a fact constellation schema, i.e., a star schema with multiple fact and dimension tables. You could also create a Snowflake schema with a higher level of normalization. I strongly recommend designs to stick to a Star schema in their basic form.
Dimension Tables (also called Lookup tables)
Dimension tables are the search criteria tables that categorize and give meaning to the data stored in Fact. Dimensions can be referred to as the "header" tables that allow the data to be filtered and grouped. When mapping an Online system, most of the master data will be grouped to form dimensions. The business intelligence tools will provide search conditions that are validated against the dimensions and based on which data will be retrieved from Fact table.Dimensions define the domain in which, each attribute when rolled up will provide the value for the next higher attribute.
For example, lets take a look at a Time or Period dimension (this dimension is considered mandatory for all time based transaction star schemas). The following are the columns of a Period dimension I have in one of the systems.
Data Warehouse Period Dimension Table
PERIOD_ID (PK)
PERIOD_NAME
PERIOD_DATE
PERIOD_DAY
PERIOD_WEEK
PERIOD_MONTH
PERIOD_MONTH_NAME
PERIOD_QUARTER
PERIOD_SEMISTER
PERIOD_YEAR
PERIOD_FINYER
The lowest level available for query in this dimension is a Day. This design does not support hourly reporting, as it was not required. Now the Daily totals can be rolled-up or aggregated to get the Week totals or the Month totals. Month totals can be rolled-up to get the Quarterly and so on. If the Financial year is from Apr to Mar, then the appropriate months can be aggregated to get the Financial period totals.
------------
| FACT TABLE | <- DATE |<-- WEEK
------------ |
|<-- MONTH -- QUARTER -- SEMISTER -- YEAR
|
|<-- FINANCIAL YEAR
If you imagine this in 3-dimension model, you basically land up presenting the Fact table data in multi-dimensional cubes. For the time being, remember that Dimensions should have the capability to rollup data to the next applicable entity within the dimension.
FACT Tables
Fact table holds the computational or numerical data in either aggregate or raw form (depends on data granular level, discussed later). Fact table will have columns that will represent some form of transaction values from online systems. This table will primarily have numeric columns. Data in fact table is always queried using one or more Dimensions. Fact can be referred to as the "Lines" table that will hold all possible summed up or raw data that the Dimensions can relate to.Refer below diagram. CUST_TRX_FACT is the Fact table that holds information such as the quantity Sold and returned, and the line value. The other columns in this table are the unique identifiers that are referencing dimension table primary key.
The PERIOD_DIM dimension is used to query the Fact data based on time factor. CUSTOMER_DIM dimension allows query of Fact based on customer data. ITEMS_DIM dimension can allow view of Fact data based on item attributes. All dimensions refer the same fact table for fetching data.
The PERIOD_ID, CUST_ID and ITEM_ID are surrogate primary key columns that uniquely identify a record in the dimensions, even if duplicate data exists. More on surrogate key is discussed next.

Here is a table that identifies the attributes of Fact and Dimension tables.
| Property | Dimension | Fact |
| Purpose | Hold hierarchies that categories the fact table data. | Holds the computational or number data in aggregate or raw form. |
| Meaning | Each dimension table should be meaningful and represents a query criteria | Fact tables are not meaningful without proper dimensions in place. |
| Table outlook | The tables are often short and fat. | The tables are often long and skinny. |
| Normalization | Dimension tables are often de-normalized. | Fact tables are generally in third normal form. |
| Role | Dimensions act like "headers" that are referred to read the Fact data. Dimensions qualify Fact. | Fact act like "Lines" that hold computational values and metrics referred to by the dimensions. |
| Primay Key | Will have a surrogate key as primary key (discussed later). | In general, combination of all foreign keys in the fact table will form a primary key. There may be exception to this depending on the raw data. |
| Growth | Dimension tables are relatively shorter and grow slowly. | Fact tables will keep on growing. |
Primary Key: Business Keys and Surrogate keys
Data Warehousing has popularized use of Surrogate key and in affect, I have seen this feature being used in OLTP systems also. The idea is to have an internal number column as Primary Key that uniquely identifies the records.The word 'Surrogate' means to appoint a deputy or to act for another. Online systems will have a business entity as the primary key that is either system generated or user defined, and visible to the user for normal transaction usage. This business key is carried to Data Warehouse dimensions as standard columns and an internal running sequence is used as a Primary Key.
OLTP Customer Master Table
CUST_NO (PK)
CUST_NAME
CUST_ADDRESS
Data Warehouse Customer Dimension Table
CUST_ID (PK)
CUST_NO
CUST_NAME
CUST_ADDRESS
As I mentioned, Surrogate key is also getting incorporated into OLTP designs. That said, OLTP Surrogate keys should not be considered as Primary key for Data Warehouse, they still are different entities.
An example of surrogate keys being used in OLTP is Oracle e-business suite system. This sofware uses internally generated sequence as a surrogate key that act as internally maintained primary key. In addition, Flex-fields are used to define the business primary key and are visible to users.
Surrogate keys should be maintained automatically by the system and no manual intervention should be involved. In Oracle, a sequence object can be created to provide running keys for every new dimension records. Every dimension should have its own dedicated sequence. During the ETL process, a sequence no. can be utilized for every insert into a dimension.
Previous Note link: Basics of Data Warehousing
Next recommended Note: Important factors in designing a Data Warehouse
This page was created on 09-Jun-09. Last updated on 09-Jun-09.
please forward all queries to amar@amar-padhi.com