Statistics: PowerPivot Style, Part 2: The ‘A’ functions


This week I will continue with a look at some of the new statistical functions in PowerPivot 2012.  Perhaps you have noticed that several of the statistical functions have one version with the regular function name, another function with an ‘A’ appended to the end of the function and a third version with  an ‘X’.  This week I will look at the ‘A’ functions and next week I’ll look at the ‘X’ functions.

The following functions are both a ‘regular’ version and an ‘A’ version:

AVERAGE            –>           AVERAGEA

COUNT                 –>           COUNTA

MAX                      –>           MAXA

MIN                       –>           MINA

Let’s begin by looking at the AVERAGE function.  First, DAX functions typically act on columns or tables.  Furthermore, we all know that the AVERAGE function returns the arithmetic mean of a series of numbers.  Combining these two pieces of information, it would be a correct assumption that the AVERAGE function calculates the mean of a series of column values based on the aggregate level of the row.

Take the example from last week as shown again in the following figure.

Suppose we wanted to calculate the class average for assignment Homework 1 for the Bears.  We would add the number: 8 + 9 + 6 + 7 + 10 + 7 + 10 + 8 + 6 + 6 arriving at the total of 77.  Since there are 10 students in this class, the class average on this homework assignment is 77./10. or 7.7.  In fact, this is the value that you can see in the third figure from last week’s blog topic.

What may not be obvious is how PowerPivot treats non-numeric data.  In fact, the function returns blanks if the column contains any text values because the aggregation does not know how to handle text.

Furthermore, any cell that is either empty or contains a logical value will be ignored even though logical values are represented by either a ‘0’ or ‘1’.  If Wilson did not turn in Homework 1 and the teacher left the cell empty, the sum of the other student’s grades would be 71, but AVERAGE divides this value of 71 by 9 instead of 10 because only 9 grades were entered resulting in a value of 7.89.

However, if the teacher enters a value of zero for Wilson’s missing homework assignment, that cell is counted not only in the sum (which of course still adds nothing to the sum) as well as to the count of values.  Then our total of would again be 71 instead of 77, but we would divide the sum by 10 to get an average of 7.1.

Thus, whether the teacher enters a zero for a missing assignment or just leaves the value blank can have a major impact on the average class grade.

So what happens if I use the AVERAGEA function instead of the AVERAGE function?  First, rest assured that if all of the students had numeric grades entered for the assignment, the AVERAGEA function would return the same result as the AVERAGE function.  What happens when Wilson does not turn in her assignment?  If the teacher enters a ‘0’, the average using AVERAGEA is still 7.1.  If the teacher skips over Wilson leaving the assignment grade blank, AVERAGEA treats the blank as a zero and still calculates the AVERAGEA value of 7.1

However, some other side effects of using AVERAGEA is that logical functions are also included in the calculation of the mean.  A logical true evaluates to a value of ‘1’ and a logical false evaluates to a value of ‘0’.  Furthermore, any non-numeric text also evaluates as ‘0’.  In each of these cases, the number of values in the series is incremented by 1 just like a regular numeric value.  The net result is that for any column that could contain non-numeric text or logical values, AVERAGEA may result in a lower average than if it ignored these values.  On the other hand, AVERAGEA treats a blank field as a value of ‘0’ which in many cases may be exactly what you want.

If you understand this fundamental difference between AVERAGE and AVERAGEA, you might be able to deduce the difference between the other functions listed at the top of this article.  This difference can be especially tricky when using the MINA and MAXA functions when logical values exist in the column.  In the case of using MINA with a series of positive numbers, a logical value of FALSE could result in a zero return value and a logical value of TRUE could result in a return value of ‘1’ assuming that all the other values in the series are greater than ‘1’.

This is a case where knowing that application that generates the data file can be very important.  If you know that the application that teachers use to enter grades will allow only numeric values in the grades field and if the program requires the teacher to enter something, even if it is a zero for a missing assignment, you may want to stick with the functions AVERAGE, MIN, MAX, and COUNT.  On the other hand, if you know the program allows the teacher either to enter a numeric grade or to leave the entry blank, you may want to use the AVERAGEA, MINA, MAXA, or COUNTA functions since a blank value acts exactly like a value of zero.

C’ya next time when I look at the ‘X’ functions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s