Statistics: PowerPivot Style, Part 3: The ‘X’ functions

This week I continue my look at some of the new statistical functions in PowerPivot 2012.  Last time I looked at the difference between regular statistical functions like AVERAGE, COUNT, MIN and MAX and the ‘A’ versions of the same functions.  I saw that the basic difference between these two sets was the way they treated non-numeric and empty values.  However, both versions of these functions assume that I want to display the resulting value in the same table as the values they aggregate.  Therefore, if I want to create an AVERAGE function against student scores, I need to create a calculated measure in the same table that holds the student scores using an expression like:

`AvgScore := AVERAGE([AssignmentScore])`

But what if I want to calculate the average of the assignment scores in a different table than the Assignments3 table.  The answer to that requires the use of the ‘X’ functions.  I suppose the ‘X’ represents the fact that rather than supplying a single column from the current table as the argument to the function, these functions require me to define a mathematical eXpression from another table.  Well, it does not actually have to be another table, and it does not have to be any more of a mathematical expression other than a single column.  For example, the following expression defines in the same assignment score values as defined by AvgScore defined in the previous expression.

`Avg := AVERAGEX(Assignments3, [AssignmentScore])`

The point however is that I can take that same expression and drop it into the calculation area of any of the other tables such as my Grades table and the calculation works exactly the same since the function is directed to a specific table and a specific calculated expression (or a single column in this case).

To show a more practical example of this technique, suppose I want to list students in order not alphabetically by their name, but by their total scores.  My pivot table displays the student names along with the sum of their assignment scores.  Calculating the sum of the students scores would be simple by itself since I could simply display their scores as the (Sum of AssignmentScore) measure in the pivot table and drag the student name to the row label area.  (Actually, I also added a calculated column to the Students table named StudentName to concatenate the student’s last name and first name using the expression:  =[StudentLName] & “, “ & [StudentFName]).  The following figure shows a portion of the resulting pivot table.

Note, the students appear alphabetically by their last name, not by the sum of their scores, which is what I want.  How can we sort the rows by the sum of the values?  The secret lies in the use of the Sort by Column feature we mentioned two weeks ago.  However, the column that I want to sort by must exist in one of the model tables, not as a generated measure ‘column’ in the resulting pivot table.  In other words, I cannot tell the pivot table to sort the rows by the Sum of Student Scores column.

Since the column I want to sort, StudentName, is in the Students table, I need to create a calculated column in Students that displays the sum of the student’s scores.  I can do that with the column DAX expression:

`SUMX(RELATEDTABLE(Assignments3), Assignments3[AssignmentScore])`

In this expression, instead of just using the name of the table as the first parameter, I used the expression RELATEDTABLE(Assignments3).  Why?  Because when I use this expression in the Students table, I want to sum the assignments for the related student, not all of the students.  Furthermore, there could be multiple records in Assignment3 that have scores for the current student.

In fact, RELATEDTABLE is exactly what I want because it returns a temporary table that only contains the rows for the current student.  Why do I need a temporary table?  Because I only need the table to exist long enough to perform a calculation, summing all the AssignmentScore values from each record in this temporary table.

On the other hand, when I define a calculated measure in a measure table that references fields in a dimension table, I can usually use just the RELATED function to reference a column from the single related dimension row since there is only one dimension record corresponding to each record in the fact table.

Of course, the second parameter in the SUMX function, as you may guess, defines the expression I want to sum.  In this case, that expression is simply a reference to a single column: [AssignmentScore].

Now I can select the StudentName column and click the Sort by Column option in the Sort and Filter group of the Home ribbon.  Returning to the pivot table that I previously built, I see that the order of the rows automatically have been updated to display the students by the Sum of StudentScore field which I’ve included to show that the order is correct.   Interesting that the default order is from the largest value to the smallest value when using the Sort by Column.

How do I change the sort to show the lowest score students at the top and the highest student scores at the button.  I could right click in any cell in the Sum of StudentScore and select the Sort Smallest to Largest option.  Perhaps you are thinking that you could have achieved the same result without using a sort by column in the Student table and just using the sort property of the measure column in the pivot table to show the students in ascending order or descending order.

And you would be right.  In fact, you could also use the dropdown menu for the StudentName field and define a sort using the following dialog that references the Sum of StudentScores field as either an ascending or descending (shown) sort.

However, these methods only works when you actually display the Sum of StudentScore field.  What would you do if you did not display that field in the pivot table.  Aaahhhh!  That is why the Sort by Column option in the Student table using a calculated column to total each student’s scores can be important.

Next time, I’ll show you another option to rank the students by their scores by using the RANKX function with creating the intermediate Sum of StudentScore column.  Furthermore, the RANKX function provides some additional options that could be important.

C’ya next time.