Calculating Percentages in PowerBI

Last time I showed how you can pivot a multivalued column from a survey downloaded from SharePoint to Excel. After some transformations, I created the following report that shows the number of career responses in each of the 18 categories. While interesting in of itself, it would be more interesting to see how the career choices break down by different dimensions such as the school or grade level, the gender of the student, or perhaps even by individual school.

In my report page, I am going to reserve an area along the left side of the page to list several of the dimensions I’m interest in and to display the total number of students that took the survey along with the number of students in the filtered subset when I filter by one of the dimensions. Basically that means building something like the following:

This image actually consist of four separate visualizations. The top visualization for Total Students uses the Card visualization and initially displays the count of the ID column in the survey table that has one record for each survey. I duplicate this visualization and drag it below the first one. Then for the top visualization, I open the Format option panel in the visualization section and make the following changes:

  • I turn off the Category Label
  • I turn on the Title and then open the Title section to add the Title Text: Total Students
  • I change the font size and color to make this information stand out.

I then repeat the process for the second visualization but with the Title Text: Sampled Students because this count will represent the number of students included in the visualizations taking into account any filters applied to the page.

I then add two single column tables, one for gender and one for grade level. Because I want to use the values in these table as page filters, I change the visualization of these two tables to the Slicer.

Along with the table I created in my last blog, I can work with the slicer values to explore how the career choices change based on gender and grade level by clicking on the check boxes. When I do this, I see that the chart automatically adjusts the columns based on the changes I make to the slicers. Also the number of sampled students in the second card visualization displays the count of students included after applying the slicer. Unfortunately, the total student count also changes. This I do not want. I want the total number of students to always represent all the students in the entire survey.

I can achieve this goal with a little DAX and a custom measure back in the data page for the survey table. The custom measure needs to count the IDs for all the records in the table ignoring any filters applied to the report page. I can do this by passing the survey table name to the ALL() function. This function ignores all other page filters. Then I use the COUNTAX() function which defines the data source as the output from the ALL() function and then performs a count the number of IDs. While this may sound complicated, it is as simple as the following equation:

Notice that I name the measure Students. I must provide a unique measure name for each measure I create. However, I can then use that measure in any visualization such as the card visualization for the total students in the survey.

Back on my report page, I select the top card visualization (for Total Students) and change the field used from ID to my new measure, Students.

Now if I select any of the values in the slicer visualizations, my sampled students card displays the number students included in the filter while the total students card displays the total number of student surveys taken as shown below.

I then add on the right some additional column visualizations to display other data fields such as which subject the student finds most interesting in school or charts that display career choices by gender, by grade level or by other criteria from the survey. Each of these charts begins with a simple table visualization in which I add the columns I want to use. I then convert the visualization to a column chart.

In the image below, you see the final result of the first page of my report. Notice that I also added a vertical line shape to separate the two card visualizations and the slicers from the other column charts.

Since each student was allowed to select one or more careers from the list of possible careers, the total number of career choices is significantly larger than the number of students. In the above figure, the count of career interests, if I were to add the values in each of the columns, would total the number of career selections which is over 17,000, not the number of students. Therefore, I might decide to display the same charts as a percent of all the students rather than a count of all career selections.

Again I need another custom measure to calculate the percent based on total students. Fortunately, I already have a measure that calculates the total student count ignoring any slicer selection or page filter. Therefore, I can generate a percentage using a formula similar to the following:

With this formula, I count the number of surveys filtered by the visualizations and slicers on the page divided by the total number of students who took the survey. The maximum value of this percentage would be 100% if all the students who took the survey selected the same career, such as computers, as one of their choices. Similarly, the minimum value would be 0% if no student selected a specific career as one of their choices. Because students could select more than one career of interest, the sum of the percentages of each of the columns will not add up to 100%, but some value greater than 100%. (Lesson learned: Next time ask for their preferred career choice, then their second career choice and finally their third career choice.)

Next I take the first page of my report and duplicate it by right clicking on the page tab and selecting the option to duplicate the page. (This is a lot faster than recreating the same visualizations on a new page, isn’t it?) On the duplicated page, I modify each of the column charts to display the Percent measure just created rather than the count of ID.

Why does this work? Well, because each student can only select any specific career one time even though they may select two, three, or more careers, I can simply count the number of filtered students in each column by the total number of students in the survey. Each chart already divides the students by career choice, one career for each column. Then additional filters from the slicers may limit the gender or grade level or both. Therefore, I can count students that match all those filter criteria and divide by the total student count to get a percentage of students who have an interest in that career. In fact, this measure also correctly calculates the percent of students interested in each school subject (of which they can only select one subject each) which I can verify by summing these percentages from the chart in the upper left and getting a total of 100%.

The following figure shows the percent page of my report.

I can then proceed to add other visualizations if I want on additional pages. But I’ll suspend this example at this point for now.

C’ya next time for more exciting ways to use Power BI.

The Need To UnPivot Data

Today I am going to take a look at some interesting issues I encountered when pulling data from a survey into Power BI to perform some analysis. First some background. The survey existed on our SharePoint site and all submissions were stored within SharePoint. However, as a list, it is always possible to download the details from the survey to an Excel file. When I open the resulting Excel file, I see the following data.

Most of the columns are straight forward text columns. But, looking closely at the rightmost column I see that the data structure is a bit more complex than usual. This is the result of having a survey question in which the respondent is allowed to select one or more items from the list of possible careers that they have an interest in. While some may only respond with a single career that they are focusing on, others may not be so sure yet and so they have selected multiple careers in which they may have an interest.

You can see that the format of the data consists of responses along with the corresponding ID values of the selected responses separated by semi-colons.

Obviously, I cannot perform analysis on such a column with multiple values. I need to get this data into a single vertical column with one row for each career choice selected by the respondent. The first step to achieve this result is to split the individual career values into separate columns. I can do this in Excel by selecting the column with the multiple values and then selecting the button: Text to Columns in the Data ribbon. This function allows me to split the text in each cell of the column into multiple columns each time a specific delimiter is found. In this case, I can split the original text at each semi-colon to create a new column.

After splitting the data into columns, there are some column that I no longer need and can delete. These are the columns that contain the ID values of the selected career values. I have no need for the ID values. I can also apply a series of Replace statements to clean up the rest of the career values to remove extra characters that are not part of the career name itself. After a few transformations, I’m left with the following set of spreadsheet columns which identify the respondent with their ID and then the careers in which they have an interest.

This is still not exactly what I want because I really need to normalize this structure to get multiple rows for each responder with one career choice per row. However, as I will show in a moment, I can perform that transformation in Power BI much easier than it can be done in Excel. So let me proceed to open Power BI and select Get Data.

When I open Power BI and choose Get Data, I select the Excel data type and click the Connect button. In the next screen (not shown) I will either enter the path to the Excel file or I can browse to the file using the Browse button. Once I select the Excel file I want to work with, Power BI opens the file and displays all the tables found in that file. In the figure below, you can see that there are several tables. Some of these tables consist of the original raw data and the transformation steps I used to create the datasets that I want to work with. Each table corresponds to a different tab in the Excel workbook otherwise known as an Excel Worksheet or simply Sheet.

After selecting the tables I want to work with, I click the Load button to import the data into my Power BI model. Some transformations might be needed to ‘convert’ some of the columns to user-friendly data names. I could have also done this in Excel by changing the column header text. However, the transformation I want to focus on is the career choice table which I call PreCareers. A portion of this table appears below. This table consists of the ID column used to identify the responder along with 18 columns representing their potential career choices. While most respondents only entered three or less career choices, some entered significantly more. By default, the data is sorted with all respondents who had only a single career choice displayed first. Those are followed by the respondents who selected two career choices and so on.

In order to create a table that has the ID of the responder and a separate row for each career choice they selected, I need to unpivot the 18 career choice columns. To do that I need to select the Edit Queries button in the Power BI Home ribbon and then select the PreCareers table. Next I have to select all 18 career choice columns. Then opening the Transform ribbon, I look for and select the Unpivot Columns command as shown in the following figure.

After selecting this option, Power BI performs the transformation to create a new table shown below which now has a column that has the previous column headers in a column named Attribute and a column named Values that contains the career choices. Of course I can rename these columns and probably will, but let save our work as a safety precaution before continuing.

To save my changes, I would select the Close and Apply button from the Home ribbon. Remember that I can save my transformations multiple times and return back to Edit Queries to insert additional transformations at a later time.

However when I attempt to close and apply my changes, I get the following error message.

Reading this error message I see that there is a problem with the Column ID in my transformed PreCareers table. If I open the tables in diagram view mode as shown below, the problem is evident.

You can see that the PreCareers table is linked to the K8_Survey table using a 1-to-1 relationship which if I were to look at the relationship is attempting to connect the ID column from one table to the ID column of the other table. The problem is that now that I unpivoted the selected careers in the PreCareers table, there are now multiple records with the same ID value, not just one record.

To solve this problem, I must remove the current relationship as by right clicking on the relationship and selecting Delete as shown in the following figure.

You may have also noticed a yellow bar across top of the page, shown below, that says that there are pending changes to your queries that have not been applied. With the relationship deleted, I can try to apply the changes again by click on the Apply Changes button.

As the changes apply, I see that the field names in the PreCareers table are updated as are the field names in GradeLevel table.

I can now use drag and drop to create new relationships between the tables. For example, I can click and drag the ID field from PreCareers table to the K8_Survey table as shown below. I can also connect the School field in GradeLevel with School in K8_Survey relating these two tables.

With the relationships on my restructured tables back in place, I can go to the Report page and begin to create a report table that displays the possible career choices along with a count of the number of times that career choice was selected.

With a simple change of the Visualization from a table to a column chart, I can visually show which careers were selected the most by the respondents. Note that because each respondent can select any number of careers and they can be selected in any particular order, there is no way to reliably rank the career preferences to say one career or another is truly the top career choice. The chart can only say which career choice was selected the most often of all careers the respondents had an interest.

Another important point to remember here is that the sum of the number of times each career choice was selected is NOT equal to the number of respondents because any respondent could select any number of career choices. Therefore if you want to know the percent of respondents who selected computers as a potential career, you need to know the total number of respondents, not the total number of responses.

I’ll delve more into the analysis of this data next time, so save this file.

BTW, this weekend I am at the Orlando Code Camp held at Seminole State College in Sanford, FL. If you happen to attend, please stop by to say hello. I’ll be conducting two BI related sessions, one for Power Pivot and one for Power BI.

C’ya.