Last time I walked you through an example of how calculated columns and calculated measures in a pivot table can sometimes result in the same value and sometime in different values. This time before I start, I need to show you an image that was accidentally omitted close to the end of that blog. I talked about adding the calculated column profit margin and the calculated measure profit margin to show that the results were not only different, but that the calculated column was not correct. That missing image is reproduced here:
In general terms, this example demonstrates the fact that you do not sum percentages of individual items to get an overall percentage. You cannot even average individual percentages to get the overall percentage. Rather you must calculate the percentage of a group of items by first summing the individual values the make up the calculation and then perform the calculation of the percentage. That is why the measure calculation subtracts the sum of total cost from the sum of total sales (which is the same as saying the sum of total profit) and divides that result by the sum of total sales to get an overage percent.
However, not all calculated columns are necessarily incorrect. Suppose I needed to apply a discount to sales based on the sales channel. The following table shows the discount amount as a percentage that I need to apply by channel to sales in that channel.
The following figure shows these discount amounts applied to our model using a calculated column with the name DiscountedSales based on the following equation:
=[TotalSales] * (1-[Discount])
In this figure, I can easily verify that the final discounted sales amount reduces the total sales amount by the indicated discount percent for each of the channels. However, remembering that calculated columns must perform this calculation for every row in the model, I may decide to check the possibility of using a calculated measure which only performs the calculation for the cells needed in the pivot table. (Yes, I know this is a small demo table, but imagine a table with hundreds of thousands of rows.) The first question might be, “How to I define a measure for profit margin?” You may begin with an equation like the following in which you just sum each of the terms:
= SUM([TotalSales]) * (1 – SUM([Discount])
This equation may seem reasonable at first. It does result in a Total Discounted Sales value of $6,557,300 on Total Sales of $14,255,000. However, after a little thought, I realized that this discounted sales total would represent over 50% discount of the total sales. Since none of the sales channels had a discount greater than 15%, such a result does not seem realistic.
The problem is that you cannot simply replace your column references from a row calculation with SUM(<column>) references when defining a measure. So what should the calculation be? In this case, it is rather easy to determine the correct calculation. Remember that I began with the following equation to calculated the discounted amount for each channel:
[TotalSales] * (1-[Discount])
If I wanted to sum the discounted amounts across all channels, I would use the equation:
∑([TotalSales] * (1-[Discount]))
Notice the parentheses that specify the calculation of the individual discounted amount for each item before summing the result. Using the distributed property from your math days, I can convert this equation to:
∑([TotalSales] – [TotalSales]*[Discount])
We also saw last week that
∑(A + B) = ∑(A) + ∑(B)
But it is also true that
∑(A - B) = ∑(A) - ∑(B)
Therefore, we should be able to write:
∑ [TotalSales] – ∑ ([TotalSales]*[Discount])
Translating this to DAX, you might try to write the equation as:
MyDiscountedSales:=SUM([TotalSales]) – SUM([TotalSales]*[Discount])
The first term in this equation is valid, but not the second term. SUM expects to see a column passed to it as a parameter, not an expression. Fortunately, there is another DAX function that we can use, SUMX(). SUMX() supports two parameters, a table reference and an expression. Therefore, we can replace the above equation with the following:
MyDiscSales:=SUM([TotalSales]) – SUMX(Sales,[TotalSales]*[Discount])
(where Sales is the name of table in my model)
This new measure calculates an overage discounted sales amount of $13,275,000 on sales of $14,255,000 which seems a whole lot more reasonable. Adding these measures as columns to my pivot table results in the following:
The calculated column: Sum of DiscountedSales appears to be correct. I can manually calculate the values for the individual channels and the sum for the channel and even the Grand Total can be easily verified. On the other hand, the first calculated measure (MyDiscountedSales) while calculating the individual row correctly, does not correctly calculated the values by channel or the Grand Total. The column MyDiscSales based on the measure of the same name displays values that match the calculated DiscountedSales column. The obvious column that must be incorrect is the MyDiscountedSales column because the channel sums are not correct.
In this example, the calculated column provides the correct answer, and after refactoring the equation used for the calculated measure, so does the calculated measure.
So can I do anything with the equation that calculates the profit margin column? Unfortunately, I have not yet been able to find anything that transforms the equation to correctly calculate the values as a calculated column. I believe this is because the values for the channel totals and the Grand Total cannot be represented by any simple function derived solely from the detailed row values. You cannot add, count, or average the detailed values to get these roll-up values. Rather they must be calculated individually for each cell.
So you have seen that calculations that only require addition or subtraction may be encoded as either calculated columns or calculated measures. Even equations that include multiplication may be encoded as either calculated columns or calculated measures if you can transform the equation to something that looks like only addition or subtraction although you may have to use a mix of SUM and SUMX functions.
My best recommendation is to verify the calculated results as best you can, using common sense estimates of what the data should be as well as actual manual calculations to determine if the results are correct. Don’t assume that just because an equation returns a value rather than an error, that it is correct.
C’ya next time.