Dealing with Blanks in Your Data Model


Last time, I discussed some reasons why you might want to denormalize your data model when using PowerPivot (https://sharepointmike.wordpress.com/2015/03/21/your-logical-data-model-is-not-normal/). As you may remember, one reason is to make the model easier for the end user to understand. A second reason is that if you denormalize the data outside of PowerPivot, you can substantially decrease the model size by eliminating many if not most of the lookup tables and just adding the lookup information you need in the entity table such as the Product table. I will talk more about denormalizing the data outside of PowerPivot, or at least during the process of importing the data next time. This time, I want to look at another issue that sometimes occurs in your data that could make your data harder to interpret by your end-users.

I’m talking about blank or null fields in the data. Often times a table will have one or more columns which are optional. Take for example a product table like the one we are using from Contoso. Not every product may be sold by color. Some products may not belong to a style or a class. Some products may not have a measured size or a weight. In some databases, the corresponding lookup tables have a special entry for no color, no style, or no class so that every product can point to one and only one value in the lookup. However, you may have a lookup table that only contains specific values. If a product does not exactly match one of those values, the reference from the product table to the lookup table may be left blank. When the user sees a blank value, they may wonder if the user just forgot to select a value or if no value from the lookup table applied. In other words, they do not really know the reason why the field is blank.

Suppose we start with the data model using Contoso data in the following figure. Note to anyone trying to reproduce this example, I went into the Product table ahead of time and removed the ClassID and ClassName from all products in which the color was ‘Blue’ just to create some records with blank values.

After bringing this data into an Excel PowerPivot data model, I create a simple pivot table to show the sales amount and sales counts by class. Notice that the three product classes that are defined include:

  • Economy
  • Regular
  • Deluxe

But you can see from the report that 87,597 of the over 2 million sales records had no class defined. From a user’s viewpoint, they do not know if someone forgot to enter the class for some of the products or if the definition of class does not apply to some products.

As you build your data model in Excel, you can fix this problem by replacing the blank values with something like the string: “[No Class]”. The difficulty is, however, after you load your data into the PowerPivot data model, you cannot edit individual cells. You can, on the other hand, create a new calculated column.

Since I want the new calculated column to have the name ClassName and because I cannot have two columns in the same table with the same name, I begin by renaming the existing ClassName column to ClassNameOriginal. The actual new name I use does not matter as long as it is different. Also note that changing the column names in the Excel data model has absolutely no effect on the name of the column in the original data source.

After renaming the original ClassName column, I add a new column to the end of the table called ClassName. For the calculation to replace the blanks from the original column, I need to know if those blanks are Null values or if they contain an empty string. If the original column used empty strings when a class value was not defined, I can use the following DAX expression to create my new class name value.

= IF(([ClassNameOriginal]=””),”[No Class]”, [ClassNameOriginal])

This expression uses a simple IF() function to see if the value in the column ClassNameOriginal contains the empty string. If it does, the function outputs the string: [No Class]. Otherwise, the function outputs the current row value of the column ClassNameOriginal. This is shown in the following figure.

On the other hand, if the original ClassName column uses a NULL value when there is no corresponding class, the expression needed here must be defined with the ISBLANK() function to test for NULL values as in:

= IF(ISBLANK([ClassNameOriginal]),”[No Class]”, [ClassNameOriginal])

Whichever method is appropriate for your table, you now have a column that you can use for the PivotTable. Before proceeding however, be sure to go back and hide the original class name column: ClassNameOriginal so as not to confuse users with two similar columns that they could use as dimensions in their tables.

Since I still have my original pivot table open, I can just switch over to the Pivot table and it should automatically update to show that the blank class now has the name [No Class]. (Which goes to show you that having no class is better than just being a blank. J )

That’s it for this week. By the time you read this I will be at Code Camp Orlando giving my presentation: Calculated Columns, Measures, and KPIs, Oh My!

C’ya next time when I will show you how to use the query designer while loading data to the PowerPivot model to fix some of the issues from this week and last week.

Advertisements

2 comments on “Dealing with Blanks in Your Data Model

  1. Pingback: Populating Your Data Model with A Query | SharePointMike's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s