*********************************************************************************************
Slowly Changing Dimensions
Slowly changing dimensions, referred as SCD, can be modeled
basically in 3 different ways based on whether we want to store full histories,
partial histories or no history. These different types are called Type 2, Type
3 and Type 1 respectively.
The "Slowly Changing
Dimension" problem is a common one particular to data warehousing. In a
nutshell, this applies to cases where the attribute for a record varies over
time. For example: Consider table Customer. Prasad is a name of customer. He first lived in Mumbai. So, the original
entry in the customer lookup table has the following record:
Customer key
|
Name
|
State
|
1
|
Prasad
|
Mumbai
|
Later on, he moved to
Pune, on January, 2014. Now how will you modify its customer table to reflect
this change? This is the "Slowly Changing Dimension" problem.
Generally there are
three ways to solve this type of problem, and they are categorized as follows:
SCD Type 1: The new
record replaces the original record. No HISTORY is maintained.
SCD Type 2: A new
record is added into the table, therefore both old and new record exist.
SCD Type 3: It stores partial history; In Type 2, whenever
the values of the attributes change, we insert new rows to the table. In case
of type 3, we add new column to the table to store the history.
We will see all SCD
types one by one as below:
SCD Type - 1
In SCD type - 1, the new
information simply overwrites the original information. In other words, no
history is kept. For example the
following table
Customer key
|
Name
|
State
|
1
|
Prasad
|
Mumbai
|
After Prasad moved from Mumbai to Pune, the new information
replaces the new record, and we have the following table:
Customer key
|
Name
|
State
|
1
|
Prasad
|
Pune
|
Advantages:
This is the easiest way to handle
the Slowly Changing Dimension problem, since there is no need to keep track of
the old information.
Disadvantages:
All history is lost. By using
this method, it is not possible to trace back in history.
Usage:
About 40% to 50% of the time.
Where to use Type 1:
If it is not necessary for the data warehouse to
keep track of historical changes.
SCD Type - 2
In SCD type - 2, a new record is added to the table to represent
the new information. Therefore, both the original and the new record will be
present. The new record gets its own primary key. Latest record can be tracked
by various ways.
In above example given in above type will become
Customer key
|
Name
|
State
|
1
|
Prasad
|
Mumbai
|
After Prasad moved from Mumbai to Pune, we add the new information as a
new row into the table
Customer key
|
Name
|
State
|
1
|
Prasad
|
Mumbai
|
3
|
Prasad
|
Pune
|
Latest record can be tracked by
various ways.
1) Effective and End date concept:
Customer key
|
Name
|
State
|
Eff_dt
|
End_Dt
|
1
|
Prasad
|
Mumbai
|
02-Dec
|
NULL
|
2
|
Nitin
|
Nagpur
|
02-Dec
|
NULL
|
Customer key
|
Name
|
State
|
Eff_dt
|
End_Dt
|
1
|
Prasad
|
Mumbai
|
02-Dec
|
07-Jan
|
3
|
Prasad
|
Pune
|
07-Jan
|
NULL
|
2
|
Nitin
|
Nagpur
|
02-Dec
|
NULL
|
2) Versioning:
We can save the history by using
the version method also as below
Customer
key
|
Name
|
State
|
Version
|
1
|
Prasad
|
Mumbai
|
1
|
2
|
Nitin
|
Nagpur
|
1
|
Customer key
|
Name
|
State
|
Version
|
1
|
Prasad
|
Mumbai
|
1
|
3
|
Prasad
|
Pune
|
2
|
2
|
Nitin
|
Nagpur
|
1
|
Advantages:
This allows us to accurately keep all historical information.
Disadvantages:
This will cause the size of the
table to grow fast. In cases where the number of rows for the table is very
high to start with, storage and performance can become a concern.
This necessarily complicates the ETL process.
Usage:
About 50%-60% of the time.
When to use Type 2:
If it is necessary for the data
warehouse to track historical changes.
SCD Type - 3
In SCD type - 3, there will be two columns to indicate the
particular attribute of interest, one indicating the original value, and one
indicating the current value. There will also be a column that indicates when
the current value becomes active. Type 3 design is used to store partial
history.
In Type 2 design. whenever the values of the attributes change, we
insert new rows to the table. In case of type 3, however, we add new column to
the table to store the history.
For example suppose Prasad lived in Mumbai then shifted to Pune
then table will be maintained a below:
Customer key
|
Name
|
State
|
1
|
Prasad
|
Mumbai
|
2
|
Nitin
|
Nagpur
|
Customer key
|
Name
|
State
|
State_Old
|
1
|
Prasad
|
Pune
|
Mumbai
|
2
|
Nitin
|
Nagpur
|
Prasad again shifted to Kolhapur from Pune the table will be as:
Customer key
|
Name
|
State
|
State_Old
|
State_Old_1
|
1
|
Prasad
|
Kolhapur
|
Pune
|
Mumbai
|
2
|
Nitin
|
Nagpur
|
Advantages:
This does not increase the size of the table, since new
information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an attribute is
changed more than once. For example, if Prasad later moves to Nagpur then the Mumbai information will be lost.
Usage:
Type 3 is rarely used in actual practice.
Where to use Type 3:
If it is necessary for the data warehouse to track historical
changes, and when such changes will only occur for a finite number of time.