Last week we looked at how to build and use a Date dimension table based on dates from our selected fact table to be used within a SSAS cube. Until relatively recently, building a cube in SSAS was the only way to analyze datasets with over a million records. With the introduction of Microsoft Office 2010, a new add-in to Excel called PowerPivot has become a game changer. It allows the average person, not just DBAs, to create large pivot tables based on tables having millions of records. In addition, Power Pivot lets the user combine data from different data sources as long as a common field can be used to link the tables.
But one of the key differences between SSAS and PowerPivot that the average Office Poweruser can now build their own pivot tables using tools already on their desktop/laptop computer with the addition of the free PowerPivot add-in from Microsoft. However, just like with SSAS, we must at least have a star or a snowflake schema for the tables. The data I started from in the last example as well as here has a many-to-many relation between Patient Visits and Vaccinations that is resolved through the table Visits_Vaccinations as shown in the figure below
I denormalized this relationship creating a single table to replace the many-to-many relationship by using the following statement in the SQL Server database before connecting to any data through PowerPivot.
The schema now consists of only the following four tables:
Using PowerPivot for Excel as described in my previous blogs, I can now load the data into the PowerPivot window. Next, I would define any necessary relationships between these four tables as shown in the following figure taken from the Manage Relationships dialog (Relationships group of the Design ribbon).
Before I can proceed to creating the pivot table, I need to define the other date dimensions. However, unlike what I did last week, I will not create a separate date table. (Note, I could do that, but there is a simpler method. I can simply create the new date data in the table Patient_Vaccinations using DAX calculated fields.
In the Pivot Table window, open the Patient_Vaccinations table using the tabs at the bottom of the window. Scroll to the right to the last column in the table. The next column has the header: Add Column. Double click this heading to edit the text changing it to Year. When I press the Enter key, the selected cell automatically moves to the first row under that column heading. I can now enter a formula for the column.
Formulas in the PivotTable window may look like normal Excel formulas when you first look at them, but they really use the DAX language. DAX is a special set of functions specifically created for working with PivotTable data. DAX consists of 135 functions, but 71 of those functions are very similar to functions you may already know from Excel. In fact 69 have exactly the same names. I will be using just a small number of these functions today but I will work with the FORMAT function which is similar to the TEXT function from Excel.
While DAX can be used to create calculated columns and measures, I will only use the functions here to create calculated columns. A calculated column definition always begins with an equal sign ‘=’. So with the first row under the new column selected, type the following but do not press Enter:
Note that a tooltip appears helping you with the parameters that the function requires. In this case, it is telling you that there is one parameter and it must be a date. I could simply enter the cell reference for the date I want to use, but a simpler method is to click on the date in the same row and let Excel enter the cell reference. In my case, the formula now looks like the following:
The equation is almost finished. I only need to add the closing parenthesis to end the function and then press Enter.
This is where the magic happens. Not only does Excel calculate the value for that first cell beneath the header, but it applies the formula to all of the cells in that column without you having to copy and paste the formula down through all the rows. PowerPivot does this because there can only be a single formula in a column. Therefore, it automatically does the work for you of applying the formula to all rows in the table.
In a similar way, you might want to add the following columns (and formulas) to build the various date components needed for your pivot table.
|Quarter_ Name||=”Qtr ” & int((Month(Patient_Vaccinations[VisitDate])-1)/3)+1|
As you can see, some functions take you directly to the value you want while other functions may need some formatting help. Typically, you will use the FORMAT() function to convert a number to a name such as day of week or month name, etc. but you can also use it for other formatting options such as displaying the full 4-digit year (‘yyyy’) as opposed to just a 2-digit year (‘yy’).
The calculation for quarter was a bit more challenging because there is no function to return the quarter number. However, it is quite easy to calculate. Simply take the month number and subtract 1. Then take that result and divide by 3 (the number of months in a quarter). Then take the integer result and add ‘1’.
Of course you could build other calculated columns, but these will suit our need. It is time to build the pivot table from the data. From the Home ribbon, find and click on the PivotTable icon in the Reports group. In the dialog that appears, choose whether to add the Pivot table to the current sheet or a new sheet. It must however be in the current workbook as the Pivot Table Window and all of the data.
The initial view of the pivot table on the sheet (shown below) is very similar to the view of the cube in SSAS, but it is more like the pivot table feature in standard Excel which has been available for years.
In the top portion of the Field List panel, you will see a list of all your tables. To access the fields within the table, click the box with the plus sign to the left of the table name. You can now select using the checkboxes to the left of each field in the field list fields that you want to appear in the pivot table. By default, Excel assumes that numeric values should be used as measures and string should be row labels. You may not always agree. Therefore, clicking on a field name and dragging it to the appropriate box in the lower portion of the Field List panel may be a more accurate way to select your fields.
Furthermore, Excel will assume that numeric measures placed in the Values box (measures) should be summed while string measures placed here should just be counted. You can change this (count numeric fields, but not sum string fields) by using the dropdown menu to the right of the field name and selecting Edit Measure. This pops up a dialog which lets you choose from a list of five predefined aggregations (Sum, Count, Min, Max, Average). You cannot define a custom calculation here. However, in a future blog, I show you how to use DAX to create your own custom measures.
In the figure below, you can see the result of a simple pivot table analyzing the number of vaccinations per year by each of the racial group. You may also want to compare this result to the cube created last week to see that it generates exactly the same results.
Could I have used the date dimension table created last week in the same source database? Yes, but I wanted to show that in PowerPivot tables, that is not necessary.
In closing, today I’m at Orlando Code Camp giving the presentation: Do you have the time? This presentation roughly follows the steps you read here for last week and this week. In future weeks, I will examine other ways to use DAX in your PowerPivot projects.
See you next time.