In every book you read about Business Intelligence (BI), you will find mention of a concept called slowly changing dimensions. The concept being that every attribute in every dimension falls into one of three possible change types. How you handle these attributes as they change determines how robust your solution is. While I maintain that these concepts existed long before BI became popular, let me first refresh your memory about what defines each of the change types.
The first change type is referred to as Type 0. In many ways that is very appropriate because it represents attributes that never change. You might also think of these attributes as having fixed values. In the one book I was reading, they referred to gender as an attribute that never changed. While this book is not that old, it could be argued that two gender values no longer apply and furthermore that when a record references one of those genders, it is not necessarily fixed forever. Perhaps a better example of an attribute that never changes is an employee’s birthdate in the Employee table.
The second change type refers to attributes that can change and when they do change, we really do not care about the old value. A good example of these Type 1 attributes might be the phone number for a customer in your order entry system. When a customer gets a new phone number, you want to update the customer record with that number by simply replacing the old number with the new one. You probably have no reason to keep track of what the old number was. You never do reports based on sales by phone number, and obviously, you would never need to call the old number any more. This case shows that a simple replacement of one attribute value with another has no need to track history, only to keep the current data.
A Type 2 change type differs from a Type 1 change type in that you do want to track the changing values of the attribute. The best example I can think of for a Type 2 change attribute is a product price. In any sales system, I can guarantee that the price of products will change over time. Some organizations may change prices more frequently than others, but whether they change daily or once every other year or so, the fact remains that you must be able to reference the price of each product based on when the product was sold so that historical reports accurate represent sales. Typically, the DBA or programmer accomplishes this tracking of different sales prices by adding a new record to the product table each time the sales price changes. In the record are two additional fields not generally found in Type 0 or Type 1 attributes. These fields specify the start date and end date for the record. When a product is first entered into the product table, the start date is generally set to the date the record is added and the end date is left blank or NULL or sometimes some future date like 12/31/2999 to avoid having to deal with NULL and blank values. Then when the product price changes, the date of the product price change is added to the end date field of the original record and new record is added for the product with the current date as the start date. Because a product can have multiple records in the table, the product ID is generally not used as the primary key. Rather an identity column (auto-increment) is added to the table to provide uniqueness while the current price of a product is the record with the blank, NULL, or future date as its end date.
Sounds straight forward, right? I’ve been creating programs since,… well, let’s just say a long time and this concept existed even back in my early database work with FoxBase. We had to have ways to track changes to certain key values in some of the tables we used. We may not have had the formal definition of a ‘slowly changing dimension’ as it appears in today’s business intelligence systems like SSAS (SQL Server Analysis Services), but remember having to use the concept of a start date and an end date to identify which record to use for reports based on the date of the ‘fact’ table such as a sales table.
There is actually another type not detailed here that is similar to the Type 2 type and may be called Type 3. It also requires that databases track changes by adding records each time a change occurs and identifying a start and end date to specify the valid time range for each record. The difference is only that it truncates history after a specific number of changes. In other words, the table only retains the last x number of changes to the dimension values. Frankly, I am not a big fan of this technique because I really feel uneasy when data is lost like that. However, if a business is only required to report on data that is seven years in the past or less, perhaps rather than counting the number of changes that the system retains, it may make sense to drop records that are older than seven years in order to conserve space.
Do you have experience with ‘slowly changing field values’ before the concept was popularized in Business Intelligence books? If so, add your comments to this entry.
C’ya next time.