In past weeks, I looked at using DAX to create calculated columns and to link tables that required more than one column as the relational link. Last week I looked at the available built-in aggregate measure options that go beyond the basic COUNT, SUM, MIN, MAX, and AVERAGE functions that pivot table users are familiar with. But even these options do not solve every problem. This week I return to DAX to show how to create new aggregate measures. Using the Texas vaccination database I used last week, I want to ask the question, how many unique patient visits have there been each year in each county.
The following figure shows the starting pivot table which shows the number of vaccinations in each year for each county. Over the two year period, almost three and half million vaccinations were provided. The problem is that often a patient received multiple vaccinations in a single visit. So how can we get a distinct count of visits? We will assume that each unique visit represents a unique individual since the data provided does not include personal information to identify the patients.
You have two ways in which to begin the definition of a new measure. One way is to click the New Measure button in the Measures group of the PowerPivot ribbon. The other method is to right click on the name of one of the tables in the PowerPivot Field List which is visible as long as the focus is within the PowerPivot table and then click the option: Add New Measure. Since I want to create a new measure in the Patient_Vaccinations table to count distinct visits, I would right click on the table name Patient_Vaccinations. While either method displays the following dialog, the advantage of right clicking on the desired table name is that the first field in the dialog box automatically assumes the name of the table you clicked on. Otherwise, you must select the name of the table in which you want to build the new measure definition from the dropdown menu.
You must then define a name for the measure. The first name is a measure name that is unique across all pivot tables in the current workbook. The second name is a custom name for just the current pivot table. Quite frankly, I generally use the same name for both. In this case, I’ll call the measure: DistinctVisits.
Next we need to define the DAX formula for the measure. We learned earlier that all DAX formulas begin with an equal sign. Then as we type additional characters, the tool-tip help displays the possible commands that begin with that letter(s) along with a helpful hint about what the function returns. In this case, as we type ‘di’ for DISTINCT, we learn that this function returns a one column table in which the distinct values appear in a single column. The fact that this function returns not a column or list of values, but a table is something to remember. The reason this is important will become evident shortly. As we press the Tab key to complete the selection of this function, we see the function expects as an input parameter a column. At this point the expression builder box contains =DISTINCT(. If I type the first letter of any of the tables, the tooltip help will display a reference to all the fields in the table. To count distinct visits, I will use the VisitID field in the Patient_Vaccinatinos table as shown in the following figure.
If we were to close off the equation at this point and attempt to use it in our pivot table, we would get the error message:
ERROR – CALCULATION ABORTED: MdxScript(Sandbox) (8,66) A table of multiple values was supplied where a single value was expected.
A measure must return a single value, not a table or even a column. Therefore, we have to wrap this calculation inside of another function that can count the rows in a table. Looking through the available DAX functions, the obvious choice is the function =COUNTROWS() which fortunately accepts a table as input. Similar functions like =COUNT() and =COUNTA() expect to have a column passed to them, not a table. Remember a column is essentially a field in one of our source tables from the PivotTable Window. (By the way, the different between =COUNT() and =COUNTA() is that….)
Thus the final expression for our measure is shown in the following figure:
When we press OK, Excel not only adds this new measure to the field list for the selected table, it also updates the PowerPivot table to include the measure as the value (or one of the values) displayed. In the following figure, we can see that there were 655,212 distinct visits over the two years to receive the three and half million vaccinations. Obviously, many of the patients received multiple vaccinations within the same visit.
One last thing to note this week is that even though we created our own calculated measure, we can combine it with the build-in aggregations we looked at last week such as % of total or % of row or % of column. These calculations will now be based on the number of distinct visits rather than the number of vaccinations as they were last week.
While this was but the simplest of DAX measures we could create, the basic concept is the same for all DAX measures. They are defined not as column, but as expressions added to one of the measure tables so that PowerPivot can use the expression to calculate each cell of the pivot table, not each row of one of the source tables. In the above example, we had only 3 columns and 7 rows or 21 cells in which the calculation needed to be performed. A DAX column expression on the other hand is performed on each row of the table which for the Patient_Vaccination table is almost three and half million rows. This is an example of how DAX can provide superior performance to a pivot table by minimizing the amount of calculations performed.