Sorry about missing last week but it was a holiday here in the states where parents encourage their kids to go up to strangers and ask for candy. Also people of all ages, not just kids, get dressed up in costumes that range from the cute Elsa princesses to the DBA zombies and the slutty managers. Yeah, some of them were really scary especially since it wasn’t always a costume.
Anyway, last time I talked about creating custom columns and added the column TotalProfit to the FactSales table of my model. (If you missed that discussion, go back to my blog from two weeks ago.) Column expressions are easy to create because they largely resemble row context expressions that you might add to an Excel spreadsheet. In fact, most of the syntax and functions are exactly the same as in Excel.
But we can also add custom measures. The difference between a column and a measure is that most columns can be used as dimensions in a pivot or matrix table especially the columns that contain alphanumeric data. Typically the columns that have numeric data appear as aggregated values, summed, averaged, minimized or maximized depending on the goal and the dimensions used. In fact, the TotalProfit column, while calculated for each row in the FactSales table, is typically displayed as a summed value such as in the following image taken from where we left off last time.
In this matrix table, the TotalProfit is first calculated for each row in the FactSales table that will be included in the final matrix table, but the individual calculated values are then summed by one of five channels that exist for sales. Thus with my sample data, the calculation for TotalProfit occurs a little over 3 million times, but these calculations are made only when the column is created, not for each visualization. Then those values are summed by channel for the above visualization. As this happens on my Surface computer in less than a second, that is still pretty impressive. However, there is another way we can calculate total profit for this table.
We can create a measure to calculate total profit. A measure is calculated each time it is used in a table. In a table like the one above, a total profit measure would only be calculated four times, once for each channel, but each calculation would include three sums over potentially hundreds of thousands of rows and these calculations occur when the visualization is created and in each visualization the data might be needed. To create a measure, find and click on the New Measure button in the calculations group of the Modeling ribbon.
In the expression box, I can see the start of the expression. Note that unlike PowerPivot, the measure name is followed by just the equal sign rather than a colon and equal sign. The rest of the expression is similar to what I used before except that I have to aggregate (in this case sum) each of the values in the expression (you cannot sum an expression). In order to distinguish this expression from the prior one, I include an underscore between ‘total’ and ‘profit’.
After defining the measure, it appears in the field list on the right. While it is still selected, I go to the formatting section of the ribbon and adjust the data format to $ English (United States) which gives me comma separators and two decimal places by default. Note that I can and should do this for any other table field that I will use in a visualization.
Now I can add this measure to my matric table from above replacing the TotalProfit calculated column with the Total_Profit measure. In order to calculate the total profit for channel sales, Power BI has to sum the SalesAmount column for all channel sales and subtract from it the sum of channel total costs and the sum of channel discount amounts.
Your first thought might be to ask how is this better than calculating the total profit for each row in the FactSales table and then simply summing the included rows into a single value. Well like a lot of things in the real world, it depends. Typically a calculated column increases data load times each time you load the model because only the expression is saved, not the individual values (or at least so I’ve been led to believe), but for many visualizations that use most if not all of the data, the impact on calculating the values for the visualization are no worse and possibly even less than using a calculated measure. On the other hand, for a calculated measure that only appears in a few visualizations or when the visualization has been filtered to include a smaller subset of the total data, a calculated measure can improve both data load times and dashboard display times.
But perhaps more importantly, not all expressions can be written as either a calculated column or a measure interchangeably as I have done here. For example, if I wanted to calculate the percent that each channel sales represents from the total sales, I would have to use a measure because there is no way to aggregate percentages over multiple rows. I want you to think about that and I may return with a detailed example in a future week. In the meantime, I want to explore one additional feature of this model.
Those of you who have followed me through my travels in using PowerPivot remember the concept of using slicers to allow the user to analyze different segments of the data by clicking on dimension values. For example, using the above table, I might want to see the profit numbers for different product categories. In PowerPivot, I could create a slicer and if I had more than one visualization on a page, I could associate each chart or table with that slicer.
In PowerBi, I can also define slicers for the reports on the page. To do so, I click in a blank area and select the field from the dimension table that I want to use as the slicer/filter. In this example, I will use the ProductName field from the dimProductCategory table. This dimension only has eight values. The table is relatively short and appears initially as shown below.
To convert this table list into a slicer, I need to select the Slicer visualization as shown in the following figure while the above table has focus. This tells PowerBI to use the table as a slicer rather than just displaying the values of the field. (Yes, I could add a filter to the visualization directly, but a slicer can automatically apply to multiple visualizations on the page.)
After being defined as a slicer, each value of ProductCategoryName has a selection box to its immediate left and two additional entries have been added to the top of the list to select all values or to select only those records that have a blank for the product category.
If I click on any one of the categories, the data in the other visualizations on the same page automatically filters out all records from other categories as shown below.
I can also select multiple categories by clicking on several of the checkboxes to include in the matrix sales from all of the selected categories. Note that in a case like this, the measure I defined actually will perform fewer calculations because the sum function only acts on the filtered records of the slicer, not on all records in the channel.
I can return to displaying all categories either by clicking the Select All option, by clicking on each of the individual category names, or by using the Erase icon to the right of the table name.
C’ya next time.