SCDs

Slowly Changing Dimensions ..

Introduction

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In a Data Warehouse, there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee.

There are many approaches how to deal with SCD. The most popular are:

  • Type 0 - The passive method

  • Type 1 - Overwriting the old value

  • Type 2 - Creating a new additional record

  • Type 3 - Adding a new column

  • Type 4 - Using historical table

  • Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

Type 0 - value does not change over time

A type 0 slowly changing dimension is a dimension that never changes its attributes over time.

For example, the date of birth of a person is a type 0 attribute, because it does not change after it is recorded. A type 0 dimension can be used to store the original values of some attributes that are not relevant for historical analysis.

Workshops

Slowly changing dimensions (SCDs) are an integral part of data warehousing and business intelligence platforms, capturing how an entity's data changes over time. They facilitate tracking of historical data, providing insights into trends, patterns, and long-term changes in data. Implementing SCDs enables businesses to maintain historical accuracy, understand temporal shifts, and make informed decisions based on a comprehensive view of data evolution.

The workshop covers how to implement Type 1 & 2.

Slowly Changing Dimensions

Slowly Changing Dimensions (SCD) in Pentaho Data Integration handles historical changes to dimension data in your data warehouse. PDI includes a dedicated "Dimension lookup/update" step that supports multiple SCD types:

SCD Types Supported

Type 1 (Overwrite): Simply replaces old values with new ones, maintaining no history.

Type 2 (Add new row): Preserves history by creating new records when changes occur. This typically involves:

  • Adding date/time fields to track when records were active

  • Including a "current" flag to identify the latest version

  • Maintaining version numbers for each record

Type 3 (Add new field): Tracks limited history by keeping both the current and previous values in separate columns.

SCDs

Last updated

Was this helpful?