SCDs
Slowly Changing Dimensions ..
Last updated
Slowly Changing Dimensions ..
Last updated
Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In a Data Warehouse, there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee.
There are many approaches how to deal with SCD. The most popular are:
Type 0 - The passive method
Type 1 - Overwriting the old value
Type 2 - Creating a new additional record
Type 3 - Adding a new column
Type 4 - Using historical table
Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)
A type 0 slowly changing dimension is a dimension that never changes its attributes over time.
For example, the date of birth of a person is a type 0 attribute, because it does not change after it is recorded. A type 0 dimension can be used to store the original values of some attributes that are not relevant for historical analysis.
Slowly changing dimensions (SCDs) are an integral part of data warehousing and business intelligence platforms, capturing how an entity's data changes over time. They facilitate tracking of historical data, providing insights into trends, patterns, and long-term changes in data. Implementing SCDs enables businesses to maintain historical accuracy, understand temporal shifts, and make informed decisions based on a comprehensive view of data evolution.
The workshop covers how to implement Type 1 & 2.
Slowly Changing Dimensions (SCD) in Pentaho Data Integration handles historical changes to dimension data in your data warehouse. PDI includes a dedicated "Dimension lookup/update" step that supports multiple SCD types:
SCD Types Supported
Type 1 (Overwrite): Simply replaces old values with new ones, maintaining no history.
Type 2 (Add new row): Preserves history by creating new records when changes occur. This typically involves:
Adding date/time fields to track when records were active
Including a "current" flag to identify the latest version
Maintaining version numbers for each record
Type 3 (Add new field): Tracks limited history by keeping both the current and previous values in separate columns.