Can You Put That Into Context For Me?

Over the past several years, I’ve conducted many sessions on how to use DAX in PowerPivot to create columns and measures. Throughout this time, the one thing that has confused many people with whom I have talked with is the concept of context for an expression. Many do not understand why some equations appear to work across for defining new columns while other equations only appear to work when calculating measures within a pivot table. This week, and perhaps for another week or two, I am going to take a shot at explaining the concept of context. Until you get a good understanding of context, many of the DAX expressions for columns and measures may appear to be magic that somehow get the correct answer but you may just not understand why.

Let me start by defining the two major context types that DAX expressions have to deal with:

  1. Row Context – This is probably the easiest context to understand. When you define a formula to calculate the value for a new column, the formula is applied to each row individually. In fact, the value calculated can depend only on the value of other columns in the current row. This is much like the way you calculate a new column value in Excel. However, unlike Excel, it is more restrictive. Using Row Context, it is not possible to directly access the values in a prior or successive row because those terms do not really have meaning within a table in the data model of a pivot table.
  2. Filtered Context – This is probably easiest to understand when you think of the entire set of rows in a table. If you sum a numeric column or simply count any column in the table, you are applying a filtered context. Of course, in its simplest form, the filter is really the same as all of the rows in the table, or no filter at all. But let’s assume for a moment that you have a sales table that includes sales from each of your stores in each of the states of United States. Furthermore, you only want to know the sales for stores in the state of Florida. In database terms, you apply a filter using a WHERE clause to include only those rows from the sales table in which the sales came from a store in Florida. Of course you can have more than one filter. You might filter not only on the state, but also on a specific product that was sold during a specific month of a specific year. That would require four filters to be applied to the table to reduce the total number of rows of sales to only those you wanted to sum. Filtered context is often used in expressions that involve summing, counting, averaging, finding a minimum or finding a maximum. These are also referred to as aggregate functions.

So let’s see how this works in DAX. Using my Contoso sales table, suppose I want to calculate the profit for each sale. Since I want to perform the calculation per sale and because my sales table has one row for each sale, I can use a calculated column. The formula for my calculated column begins by taking the value of the column [SalesAmount] which is the money received from the sale. This is not the profit. To calculate profit, I need to subtract costs from [SalesAmount]. To do this, I can subtract the value from the column [TotalCost]. This gives me a pretty good idea of what my potential profit might be. However, in the Contoso data, I also have to consider any returned item. These are items returned because they are damaged or do not work. The policy is to refund the customer their money. However, the returned product it not worth anything and is considered trash. Therefore, I need to also subtract from [SalesAmount] the [ReturnAmount]. If I define a new column in my sales table, I can use the formula shown in the following figure to calculate the profit for each of the sales.

As you know, the PowerPivot data model applies the column formula to each record in the table. I can then use this column in pivot tables to display profit for sales by any of the appropriate dimensions such as time, location, product, etc.

Fortunately, in PowerPivot for Excel, displaying profit in the pivot table is as simple as moving the column [Profit] to the Values area in the field list. PowerPivot lets us use any numeric field in the values area without any additional work. However, if I attempt to create the same model in Analysis Services, I must create a measure as shown below.

The SUM() function is an aggregate function and therefore defines a measure, never a column. Note however, I must still first create the column [Profit] as previously defined. I cannot simply create a measure that sums the result of the expression as shown in the following figure.

This is because the calculation: [SalesAmount] – [TotalCost] – [ReturnAmount] is a row context expression since it must be calculated on every row of the table. However the SUM() function is a filter context expression (even if the filter is all the rows). I cannot combine the two in this expression without getting the Semantic Error shown above. The SUM() function can only aggregate the values of a single column, not aggregate the values of a calculation.

What I can do however, is use one of the other DAX functions, SUMX(). This function has two parameters, the first parameter is the name of the table I want to perform a calculation on. In this case, the table name is FactSales. The second parameter is an expression that can include one or more columns from the designated table. In this case, I can use the expression I previously used to calculate the [Profit] column as shown in the following figure.

As you can see in this figure, the resulting total profit across all the records in the FactSales table is exactly the same whether I use the expression SUM([Profit]) which sums the calculated column or if I use the expressions SUMX(FactSales, [SalesAmount] – [TotalCost] – [ReturnAmount]). This second expression essentially calculates the row context expression for profit on each row in the table FactSales. However, as a measure rather a calculated column, the expression is filtered by any dimensions used in the pivot table.

In the following figure, the only dimension is sales channel. Therefore, with four sales channels, there are four rows. Note that both calculations of total profit result in the same values for each channel.

What is the difference between the two? Since all of the sales records were used to build this table since all four sales channels were included, the same number of calculations were effectively performed in both cases, calculating the profit for each sales row and then summing the profit for each channel. The only real difference is that the first one required the prior creation of a [Profit] column which takes some of the precious data model memory while the second method using the SUMX() function does not require that column.

But what if we applied other dimensions any perhaps by using a slicer displayed the profit from only some of the sales. In the following example, the row dimension displays countries. However, only countries in Asia appear based on a slicer that limits which continent I want to see profits. I also included a slicer on year to limit the years in my analysis to only two years. Therefore, in the following figure, the profit amounts represent only a subset of the total profits for the Contoso organization.

How does this affect the calculation? To calculate Total_Profit, I still need the column [Profit] to calculate value for every sales row in the FactSales table, even rows that I will not use. On the other hand, to calculate Total_Profit_2, the filter on continent and year is applied first limited the total number of rows that are needed to be calculated from the FactSales table. The calculation of profit is then only performed on this filtered subset of rows and then aggregated appropriately based on the dimensions shown.

Thus, you might say for now that column calculations are generally row context and measures are filter context expressions. However some functions when used in measures let you perform a row context expression after applying a filter context like the SUMX() function and other aggregate functions that end in ‘X’ like COUNTX(), AVERAGEX(), MAXX(), and MINX().

Next time, I will dive a little further into understanding context within DAX.

C’ya next time.

Advertisements

It’s a Fiscal Thing

In past blog articles, I’ve talked about the need for a date table, not just for BI/PowerPivot analysis, but for any application. You might be hard pressed to come up with many applications you have written or have used in your organizations that do not include some aspect of time as one of the dimensions by which you collect data and/or report on it. Therefore, building a Date table once that you can use for many different applications can save you a great deal of time when trying to determine how to group your data for different reporting period or to display different date labels along one of your dimensions.

One of the easiest ways to build a date table that you can use anywhere is to begin with Excel. It is a simple matter to create a column called DateKey and enter the first two dates of a large general purpose date range as shown in the following figure.

Then to create any size table, simply select the first two cells and drag the bottom right corner down to any number of rows you want. Given Excel’s limit of a million rows, this should cover you for most applications.

Once you have the first column, you can add columns for other ways of displaying date information. The following table shows a few of the more common expressions I have used to create additional columns.

<insert table here>

If you use a standard calendar, January 1 through December 31, the functions provided work well to calculate the names of months, days of the week, quarters, years, etc. However, if your organization uses a fiscal calendar, some of these calculations are not as easy. For example, suppose your organization uses a fiscal year that begins October 1 and goes through September 30th. The calculation for month number is not as simple as using the MONTH() function. Nor is the calculation of year simple either because the fiscal year for 2015 may in that case go from October 1, 2014 through September 30, 2015. So I thought you might like to see a few functions that could help you calculate fiscal columns in your data table.

Let’s start with the calculation of the fiscal year. Using the example date range above, how can I calculated the fiscal year from any given date. If I were to add 3 months to any date and then calculate the year using the YEAR() function, I would get the fiscal year. Why three months? Because there are three months in the prior calendar year. If my fiscal year started July 1, 2014 and went to June 30, 2015, I would add 6 months to the date before calculating the fiscal year using the YEAR() function. The following figure shows the column expression I need to create a CompanyFiscalYear column.

Calculating the year was not that difficult. However, calculating the month number of the fiscal year is a bit more complex. Why do I need a fiscal month number? Remember how I used the month number column as the sort by column for the month names so that the months appear in the correct order. I again have a similar problem when I am dealing with fiscal calendars. In the above example fiscal year beginning October 1, 2014, the order of the month names must be:

  • October
  • November
  • December
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September

I can calculate the fiscal month number using the expression shown in the following figure. Note that it is a bit more complex than the expression to calculate the fiscal year. Where I’m using the number ‘9’, I’m really using ’12 – 3′ in which ’12’ is the number of months in the year and ‘3’ is the number of fiscal months in the first calendar year which in this case is October, November, and December. Therefore, if my fiscal calendar started in July, I would use ’12-6′ or ‘6’. A similar argument would explain why I need a ‘3’ at the end of argument for months in the second calendar year to increase the fiscal month number of these months appropriately. Again I would use a ‘6’ for a fiscal year starting in July because there are 6 months in the first calendar year that are part of the current fiscal year.

The final column I’m going to show today is the calculation of the fiscal quarter. In my October-September scenario, the first quarter of the fiscal year includes the months October, November, and December. Since I already have calculated fiscal month numbers, I can simplify my calculation of the quarter by simply dividing the fiscal month number by ‘3’ and rounding the result up to the next whole number if necessary using the ROUNDUP() function. Thus the following figure shows an easy calculation of the fiscal quarter.

That’s all for this time. I hope this helps you create useful date tables with not just traditional calendar dates and calculated fields, but also fiscal calendar fields that you can use in your pivot tables and reports.

C’ya next time.