Calculating Percentages in PowerBI

Last time I showed how you can pivot a multivalued column from a survey downloaded from SharePoint to Excel. After some transformations, I created the following report that shows the number of career responses in each of the 18 categories. While interesting in of itself, it would be more interesting to see how the career choices break down by different dimensions such as the school or grade level, the gender of the student, or perhaps even by individual school.

In my report page, I am going to reserve an area along the left side of the page to list several of the dimensions I’m interest in and to display the total number of students that took the survey along with the number of students in the filtered subset when I filter by one of the dimensions. Basically that means building something like the following:

This image actually consist of four separate visualizations. The top visualization for Total Students uses the Card visualization and initially displays the count of the ID column in the survey table that has one record for each survey. I duplicate this visualization and drag it below the first one. Then for the top visualization, I open the Format option panel in the visualization section and make the following changes:

  • I turn off the Category Label
  • I turn on the Title and then open the Title section to add the Title Text: Total Students
  • I change the font size and color to make this information stand out.

I then repeat the process for the second visualization but with the Title Text: Sampled Students because this count will represent the number of students included in the visualizations taking into account any filters applied to the page.

I then add two single column tables, one for gender and one for grade level. Because I want to use the values in these table as page filters, I change the visualization of these two tables to the Slicer.

Along with the table I created in my last blog, I can work with the slicer values to explore how the career choices change based on gender and grade level by clicking on the check boxes. When I do this, I see that the chart automatically adjusts the columns based on the changes I make to the slicers. Also the number of sampled students in the second card visualization displays the count of students included after applying the slicer. Unfortunately, the total student count also changes. This I do not want. I want the total number of students to always represent all the students in the entire survey.

I can achieve this goal with a little DAX and a custom measure back in the data page for the survey table. The custom measure needs to count the IDs for all the records in the table ignoring any filters applied to the report page. I can do this by passing the survey table name to the ALL() function. This function ignores all other page filters. Then I use the COUNTAX() function which defines the data source as the output from the ALL() function and then performs a count the number of IDs. While this may sound complicated, it is as simple as the following equation:

Notice that I name the measure Students. I must provide a unique measure name for each measure I create. However, I can then use that measure in any visualization such as the card visualization for the total students in the survey.

Back on my report page, I select the top card visualization (for Total Students) and change the field used from ID to my new measure, Students.

Now if I select any of the values in the slicer visualizations, my sampled students card displays the number students included in the filter while the total students card displays the total number of student surveys taken as shown below.

I then add on the right some additional column visualizations to display other data fields such as which subject the student finds most interesting in school or charts that display career choices by gender, by grade level or by other criteria from the survey. Each of these charts begins with a simple table visualization in which I add the columns I want to use. I then convert the visualization to a column chart.

In the image below, you see the final result of the first page of my report. Notice that I also added a vertical line shape to separate the two card visualizations and the slicers from the other column charts.

Since each student was allowed to select one or more careers from the list of possible careers, the total number of career choices is significantly larger than the number of students. In the above figure, the count of career interests, if I were to add the values in each of the columns, would total the number of career selections which is over 17,000, not the number of students. Therefore, I might decide to display the same charts as a percent of all the students rather than a count of all career selections.

Again I need another custom measure to calculate the percent based on total students. Fortunately, I already have a measure that calculates the total student count ignoring any slicer selection or page filter. Therefore, I can generate a percentage using a formula similar to the following:

With this formula, I count the number of surveys filtered by the visualizations and slicers on the page divided by the total number of students who took the survey. The maximum value of this percentage would be 100% if all the students who took the survey selected the same career, such as computers, as one of their choices. Similarly, the minimum value would be 0% if no student selected a specific career as one of their choices. Because students could select more than one career of interest, the sum of the percentages of each of the columns will not add up to 100%, but some value greater than 100%. (Lesson learned: Next time ask for their preferred career choice, then their second career choice and finally their third career choice.)

Next I take the first page of my report and duplicate it by right clicking on the page tab and selecting the option to duplicate the page. (This is a lot faster than recreating the same visualizations on a new page, isn’t it?) On the duplicated page, I modify each of the column charts to display the Percent measure just created rather than the count of ID.

Why does this work? Well, because each student can only select any specific career one time even though they may select two, three, or more careers, I can simply count the number of filtered students in each column by the total number of students in the survey. Each chart already divides the students by career choice, one career for each column. Then additional filters from the slicers may limit the gender or grade level or both. Therefore, I can count students that match all those filter criteria and divide by the total student count to get a percentage of students who have an interest in that career. In fact, this measure also correctly calculates the percent of students interested in each school subject (of which they can only select one subject each) which I can verify by summing these percentages from the chart in the upper left and getting a total of 100%.

The following figure shows the percent page of my report.

I can then proceed to add other visualizations if I want on additional pages. But I’ll suspend this example at this point for now.

C’ya next time for more exciting ways to use Power BI.

The Need To UnPivot Data

Today I am going to take a look at some interesting issues I encountered when pulling data from a survey into Power BI to perform some analysis. First some background. The survey existed on our SharePoint site and all submissions were stored within SharePoint. However, as a list, it is always possible to download the details from the survey to an Excel file. When I open the resulting Excel file, I see the following data.

Most of the columns are straight forward text columns. But, looking closely at the rightmost column I see that the data structure is a bit more complex than usual. This is the result of having a survey question in which the respondent is allowed to select one or more items from the list of possible careers that they have an interest in. While some may only respond with a single career that they are focusing on, others may not be so sure yet and so they have selected multiple careers in which they may have an interest.

You can see that the format of the data consists of responses along with the corresponding ID values of the selected responses separated by semi-colons.

Obviously, I cannot perform analysis on such a column with multiple values. I need to get this data into a single vertical column with one row for each career choice selected by the respondent. The first step to achieve this result is to split the individual career values into separate columns. I can do this in Excel by selecting the column with the multiple values and then selecting the button: Text to Columns in the Data ribbon. This function allows me to split the text in each cell of the column into multiple columns each time a specific delimiter is found. In this case, I can split the original text at each semi-colon to create a new column.

After splitting the data into columns, there are some column that I no longer need and can delete. These are the columns that contain the ID values of the selected career values. I have no need for the ID values. I can also apply a series of Replace statements to clean up the rest of the career values to remove extra characters that are not part of the career name itself. After a few transformations, I’m left with the following set of spreadsheet columns which identify the respondent with their ID and then the careers in which they have an interest.

This is still not exactly what I want because I really need to normalize this structure to get multiple rows for each responder with one career choice per row. However, as I will show in a moment, I can perform that transformation in Power BI much easier than it can be done in Excel. So let me proceed to open Power BI and select Get Data.

When I open Power BI and choose Get Data, I select the Excel data type and click the Connect button. In the next screen (not shown) I will either enter the path to the Excel file or I can browse to the file using the Browse button. Once I select the Excel file I want to work with, Power BI opens the file and displays all the tables found in that file. In the figure below, you can see that there are several tables. Some of these tables consist of the original raw data and the transformation steps I used to create the datasets that I want to work with. Each table corresponds to a different tab in the Excel workbook otherwise known as an Excel Worksheet or simply Sheet.

After selecting the tables I want to work with, I click the Load button to import the data into my Power BI model. Some transformations might be needed to ‘convert’ some of the columns to user-friendly data names. I could have also done this in Excel by changing the column header text. However, the transformation I want to focus on is the career choice table which I call PreCareers. A portion of this table appears below. This table consists of the ID column used to identify the responder along with 18 columns representing their potential career choices. While most respondents only entered three or less career choices, some entered significantly more. By default, the data is sorted with all respondents who had only a single career choice displayed first. Those are followed by the respondents who selected two career choices and so on.

In order to create a table that has the ID of the responder and a separate row for each career choice they selected, I need to unpivot the 18 career choice columns. To do that I need to select the Edit Queries button in the Power BI Home ribbon and then select the PreCareers table. Next I have to select all 18 career choice columns. Then opening the Transform ribbon, I look for and select the Unpivot Columns command as shown in the following figure.

After selecting this option, Power BI performs the transformation to create a new table shown below which now has a column that has the previous column headers in a column named Attribute and a column named Values that contains the career choices. Of course I can rename these columns and probably will, but let save our work as a safety precaution before continuing.

To save my changes, I would select the Close and Apply button from the Home ribbon. Remember that I can save my transformations multiple times and return back to Edit Queries to insert additional transformations at a later time.

However when I attempt to close and apply my changes, I get the following error message.

Reading this error message I see that there is a problem with the Column ID in my transformed PreCareers table. If I open the tables in diagram view mode as shown below, the problem is evident.

You can see that the PreCareers table is linked to the K8_Survey table using a 1-to-1 relationship which if I were to look at the relationship is attempting to connect the ID column from one table to the ID column of the other table. The problem is that now that I unpivoted the selected careers in the PreCareers table, there are now multiple records with the same ID value, not just one record.

To solve this problem, I must remove the current relationship as by right clicking on the relationship and selecting Delete as shown in the following figure.

You may have also noticed a yellow bar across top of the page, shown below, that says that there are pending changes to your queries that have not been applied. With the relationship deleted, I can try to apply the changes again by click on the Apply Changes button.

As the changes apply, I see that the field names in the PreCareers table are updated as are the field names in GradeLevel table.

I can now use drag and drop to create new relationships between the tables. For example, I can click and drag the ID field from PreCareers table to the K8_Survey table as shown below. I can also connect the School field in GradeLevel with School in K8_Survey relating these two tables.

With the relationships on my restructured tables back in place, I can go to the Report page and begin to create a report table that displays the possible career choices along with a count of the number of times that career choice was selected.

With a simple change of the Visualization from a table to a column chart, I can visually show which careers were selected the most by the respondents. Note that because each respondent can select any number of careers and they can be selected in any particular order, there is no way to reliably rank the career preferences to say one career or another is truly the top career choice. The chart can only say which career choice was selected the most often of all careers the respondents had an interest.

Another important point to remember here is that the sum of the number of times each career choice was selected is NOT equal to the number of respondents because any respondent could select any number of career choices. Therefore if you want to know the percent of respondents who selected computers as a potential career, you need to know the total number of respondents, not the total number of responses.

I’ll delve more into the analysis of this data next time, so save this file.

BTW, this weekend I am at the Orlando Code Camp held at Seminole State College in Sanford, FL. If you happen to attend, please stop by to say hello. I’ll be conducting two BI related sessions, one for Power Pivot and one for Power BI.

C’ya.

An Aster Plot for the Holidays

Yesterday was Christmas and for those who celebrate Christmas, I hope you had a very happy day. However, I just wanted to ask this. Why do we wait for Christmas to be nice to one another, to wish our fellow man peace on earth and goodwill to all men? And do we really mean it or do we just go back to our old ways the day after. Do we need a holiday to be nice to each other? Why cannot the Christmas spirit spread throughout the year? And why does it have to be tied to a religious holiday. Do we need a religious holiday to tell us when to be nice to others?

On a similar note, New Year’s resolutions are coming up in another week. Could we all possibly resolve to act a little more like Christmas every day of the year? Or is that just as hard as resolving to lose weight?

Anyway, since many of you are on vacation anyway this week, I’m going to do just a short blog on the Aster Plot visualization for Power BI. This visualization, like many of the others I’ve talked about can be found on Microsoft’s Power BI Visualization Gallery page. When you click on the Aster Plot tile, the following diagram displays telling you a little about the visualization which as they state in the description is related to a donut chart. However, unlike a donut chart which defines the size of the donut segment on the value associated with that segment much like a pie chart, an aster plot uses one dimension to define the segments and each segment is equal in the amount of the arc it uses. The value associated with the segment defines the radius of the segment instead. Let’s see how this might work with some of the sales data by product category from Contoso.

I’m not going to go through all the dialogs that appear when you download a new visualization. I’ve done that before. So if you are joining in for the first time, check back through some of my previous blogs for details on that.

After the visualization is downloaded, I can load it into my Visualizations toolset/panel by clicking the ellipsis (3 dots) at the bottom of the visualizations panel.

Power BI will display a warning to caution you about importing custom visualization, so you should always do this on a test machine first to make sure they do not cause any issues. Just because it is on a Microsoft page does not mean that it is safe to use and does not violate any privacy concerns.

Next, I loaded my Contoso data and displayed the Sales Amount by Product Category as shown in the table below. (Again, if you do not know how to do this is Power BI, I’ve covered this several times in the past several months.) I should mention that I specifically sorted the sales data in descending order for this visualization.

Then with my table selected, I can click on my new visualization, the Aster Plot which displays the chart shown below.

Notice that if I hover over any of the segments, a box appears that tells me the product category name along with the corresponding sales amount for that category. One of the things that I don’t particularly care for is that the name of the dimension appears in the center of the chart and apparently behind the arcs so some that some of the name is covered by the chart. I wish I could move the dimension name either to the top of the bottom of the chart at the very least. Of course I could customize the name to make it smaller or to add spaces between the words back on the data page, but for this demonstration, I didn’t feel the need.

You can have other charts on the same page like the one shown here that displays the Sales Amounts by calendar year for Contoso.

Like other visualizations, I can click on any of the Aster plot segments to select that category and the column chart to the right will automatically update to highlight the annual sale for the selected category while still showing in a lighter shade of the column color to total annual sales.

Similarly, I can select a year and the Aster plot will be redrawn to display the relative sizes of each of the categories as shown below.

Well, that is all I’m going to cover for this week because it is time to get back to end-of-year celebrations, visiting relatives, after-Christmas sales, and all those things.

C’ya next year.

Power BI – Measures and Slicers

Sorry about missing last week but it was a holiday here in the states where parents encourage their kids to go up to strangers and ask for candy. Also people of all ages, not just kids, get dressed up in costumes that range from the cute Elsa princesses to the DBA zombies and the slutty managers. Yeah, some of them were really scary especially since it wasn’t always a costume.

Anyway, last time I talked about creating custom columns and added the column TotalProfit to the FactSales table of my model. (If you missed that discussion, go back to my blog from two weeks ago.) Column expressions are easy to create because they largely resemble row context expressions that you might add to an Excel spreadsheet. In fact, most of the syntax and functions are exactly the same as in Excel.

But we can also add custom measures. The difference between a column and a measure is that most columns can be used as dimensions in a pivot or matrix table especially the columns that contain alphanumeric data. Typically the columns that have numeric data appear as aggregated values, summed, averaged, minimized or maximized depending on the goal and the dimensions used. In fact, the TotalProfit column, while calculated for each row in the FactSales table, is typically displayed as a summed value such as in the following image taken from where we left off last time.

In this matrix table, the TotalProfit is first calculated for each row in the FactSales table that will be included in the final matrix table, but the individual calculated values are then summed by one of five channels that exist for sales. Thus with my sample data, the calculation for TotalProfit occurs a little over 3 million times, but these calculations are made only when the column is created, not for each visualization. Then those values are summed by channel for the above visualization. As this happens on my Surface computer in less than a second, that is still pretty impressive. However, there is another way we can calculate total profit for this table.

We can create a measure to calculate total profit. A measure is calculated each time it is used in a table. In a table like the one above, a total profit measure would only be calculated four times, once for each channel, but each calculation would include three sums over potentially hundreds of thousands of rows and these calculations occur when the visualization is created and in each visualization the data might be needed. To create a measure, find and click on the New Measure button in the calculations group of the Modeling ribbon.

In the expression box, I can see the start of the expression. Note that unlike PowerPivot, the measure name is followed by just the equal sign rather than a colon and equal sign. The rest of the expression is similar to what I used before except that I have to aggregate (in this case sum) each of the values in the expression (you cannot sum an expression). In order to distinguish this expression from the prior one, I include an underscore between ‘total’ and ‘profit’.

After defining the measure, it appears in the field list on the right. While it is still selected, I go to the formatting section of the ribbon and adjust the data format to $ English (United States) which gives me comma separators and two decimal places by default. Note that I can and should do this for any other table field that I will use in a visualization.

Now I can add this measure to my matric table from above replacing the TotalProfit calculated column with the Total_Profit measure. In order to calculate the total profit for channel sales, Power BI has to sum the SalesAmount column for all channel sales and subtract from it the sum of channel total costs and the sum of channel discount amounts.

Your first thought might be to ask how is this better than calculating the total profit for each row in the FactSales table and then simply summing the included rows into a single value. Well like a lot of things in the real world, it depends. Typically a calculated column increases data load times each time you load the model because only the expression is saved, not the individual values (or at least so I’ve been led to believe), but for many visualizations that use most if not all of the data, the impact on calculating the values for the visualization are no worse and possibly even less than using a calculated measure. On the other hand, for a calculated measure that only appears in a few visualizations or when the visualization has been filtered to include a smaller subset of the total data, a calculated measure can improve both data load times and dashboard display times.

But perhaps more importantly, not all expressions can be written as either a calculated column or a measure interchangeably as I have done here. For example, if I wanted to calculate the percent that each channel sales represents from the total sales, I would have to use a measure because there is no way to aggregate percentages over multiple rows. I want you to think about that and I may return with a detailed example in a future week. In the meantime, I want to explore one additional feature of this model.

Those of you who have followed me through my travels in using PowerPivot remember the concept of using slicers to allow the user to analyze different segments of the data by clicking on dimension values. For example, using the above table, I might want to see the profit numbers for different product categories. In PowerPivot, I could create a slicer and if I had more than one visualization on a page, I could associate each chart or table with that slicer.

In PowerBi, I can also define slicers for the reports on the page. To do so, I click in a blank area and select the field from the dimension table that I want to use as the slicer/filter. In this example, I will use the ProductName field from the dimProductCategory table. This dimension only has eight values. The table is relatively short and appears initially as shown below.

To convert this table list into a slicer, I need to select the Slicer visualization as shown in the following figure while the above table has focus. This tells PowerBI to use the table as a slicer rather than just displaying the values of the field. (Yes, I could add a filter to the visualization directly, but a slicer can automatically apply to multiple visualizations on the page.)

After being defined as a slicer, each value of ProductCategoryName has a selection box to its immediate left and two additional entries have been added to the top of the list to select all values or to select only those records that have a blank for the product category.

If I click on any one of the categories, the data in the other visualizations on the same page automatically filters out all records from other categories as shown below.

I can also select multiple categories by clicking on several of the checkboxes to include in the matrix sales from all of the selected categories. Note that in a case like this, the measure I defined actually will perform fewer calculations because the sum function only acts on the filtered records of the slicer, not on all records in the channel.

I can return to displaying all categories either by clicking the Select All option, by clicking on each of the individual category names, or by using the Erase icon to the right of the table name.

C’ya next time.

Deriving Columns from What You Know in Power BI

Before I get started with the topic of the day, I want to remind you that Power BI is still being updated by Microsoft. In fact, there was an update just last week on October 20th that appears to have added some missing features that I mentioned before. So if you haven’t updated Power BI recently, be sure to do that before reading the rest of this blog.

A few weeks ago when I was looking at defining relations between tables in the Relationship dialog I was disappointed in the fact that when I display my tables in what I would have previously called a Dialog view in PowerPivot, I could not use drag and drop to define my relations. Well, my disappointment is over. The latest update, among other things, adds this capability.

Since last Wednesday, October 21, was ‘Back to the Future’ day, I want to go back to last week’s blog just after I added the DimProductCategory table. This time however, rather than using the Management Relationships dialog, I am just going to the Relationships view and drag the field ProductCategoryLabel to ProductCategoryKey as shown in the following image.

Now I must say that I am use to just identifying the fields from my two tables that I want to connect to form the relationship and expect PowerPivot to figure out which is the one and which is the many table. Unfortunately, Power Bi still is not quite this smart. Because when I attempt to define this relationship from the one site to the many side, the relationship definition fails as shown in the following figure.

Maybe the next update will automatically reverse the direction of the relationship for me. However, for now I can click the OK button in the message box shown above and then redefine the relationship correctly from the DimProductSubcategory to the DimProductCategory table.

Now the relation is created and I am ready to continue. By the way, when I click on the relationship, Power BI highlights the two tables as well as the relationship line. It also encloses the connecting fields in a box to make it easy to identify the relationship fields.

If instead of left clicking on the relationship, suppose I right click on the relationship. Now I get an expected option menu which in this case consists of only a single option, Delete. Clicking this option will of course delete the relationship.


On the other hand, double clicking on the relationship opens the Edit Relationship dialog shown below. I can use this dialog to view the relationship or a sampling of the data, or make changes to the relationship.

That’s great. But like an infomercial, there’s more! In the Relationship view (Diagram view) I can also now right click on a field and delete a field, hide a field from the Report view or rename the field.

As with PowerPivot, if I delete a field, it is gone for good. If I realize later that I need the field, I would need to delete the table and reload it to get the missing field. Of course there are other consequences to doing this like needing to redefine the relationships and possibly rebuilding reports (visualizations) that used that table. Because at this time, I cannot visually define the fields to include or exclude during my original data load from my data sources, it is important that I know my data and delete any fields that I know that I will not need before I begin creating new columns, measures, or reports. In PowerPivot, we called these columns Useless columns.

I can also hide some columns from the Report view. For example, I can often hide columns used to define relationships between tables because end users typically do not include these columns in reports. Hidden columns are referred to as Technical columns because they are required in the data model and cannot be deleted without destroying relationships or perhaps calculations of other fields. I never want to show more columns to a user than they know what to do with.

Finally, renaming columns can be very beneficial. Often column names in databases have cryptic or abbreviated names. End users may not be comfortable with these shorten names. Use the Rename feature to make names user-friendly and descriptive.

Not only can I delete, hide, or rename columns in a table, but by right clicking on the table header, I can perform these same actions on an entire table. For example, I may not like dimension tables that begin with the letters ‘Dim’ like many DBAs prefer but end users may have no idea why the table is Dim. Similar to columns, I don’t delete tables from my model unless I am positive that I do not need them. Hiding tables only makes sense if I want to use only one or two fields from a table. In this case, I may need a column from another table for a calculation, but I would never display those columns directly in reports.

Returning to the data view of my tables, I can also right click on any of the column headers and delete, hide or rename the column. There are also several other options ranging from sorting, to creating new columns or new measures.

I can also right click on the column names in the fields list along the right side of the Data view. The dropdown list of options shown below is similar to options in the context menu above. So I have several different ways to manage columns

Let’s try something new, a new column in fact. In my FactSales table, I can find several columns like SalesAmount, TotalCost, and several others, but there is not Total Profit column. I can easily calculate that value from other values in the table. To begin, I need to create a new column in my data model. That means clicking the New Column button in the Modeling ribbon.

New columns are created at the right end of the table. By default, the column name is cleverly called Column. Of course I can change that by entering a new name. Then after an equal sign which indicates that an expression will be used to define the column, I can begin entering the column definition using DAX expressions. Yes, DAX is still alive and well. If you need a review of using DAX, I’ve covered multiple DAX topics over the last couple of years of this blog.

As in PowerPivot, I can select a column from the current table by just typing the left square bracket. This action opens a dropdown of all column names in the current table listed alphabetically. I can scroll down through the list and select a column by double clicking on its name. I can also type a few characters of the column name to narrow down my list as shown below.

My full expression to calculate TotalProfit is shown below

When I click the Enter key, the entered expression is used to calculate the values for all the table rows.

Before moving off the column, I might want to define custom formatting for the values. The formatting definition here is carried forward to all reports generated with the data. In the above example, I might want to only display the dollar amounts to two decimal places. (Actually, because of the size of aggregated data, I might later decide to format the values with no decimal places.)

Suppose that I want to display some of this data using a standard table with Channel names for the rows and a few select columns from the table. Notice that the values displayed here obey the formatting definition set on the Data page.

That’s it for this week. Next time I cover creating New Measures and why you might want to do so.

C’ya.

Creating Relationships Manually in Power BI

Last time we loaded a BI model with data from several different data sources, but the data sources were carefully selected so that the columns that would relate one table to another table would already have the same name and data type before attempting to upload the data to Power BI Desktop. Because of this assumption, I was able to let Power BI auto detect the relations between the tables. Unfortunately, in the real world, that does not always happen. So this time I will show you how to create relationships manually and how you may need to massage the data a little first before defining those relationships.

I am going to use the Contoso data again and will begin with the following tables already loaded into the model from my SQL Server instance: FactSales, DimChannel, DimDate, DimProduct, and DimProductSubCategory. To complete my model, I need the Product Category data which I am going to load from an Access table in this case.

Since we covered how to load data from SQL Server last time, I am going to pick this discussion up after I have loaded the SQL data and am about to load the Product Category data. The image below shows that I have already click the Get Data button and have selected Access database from the list of possible data sources and have defined where my Access database could be found. As you can see in this figure, Power BI only finds a single table in my Access database named DimProductCategory. When I select this table, I see a preview of the data.

In the table preview (on the right), I see that there are only two columns named ProductCategoryLabel and ProductCategoryName. You might be able to guess from the way the data in the column is formatted that the ProductCategoryLabel data has been defined as text, not numbers because it is left justified.

After clicking the Load button to import this data into my data model, I switch to the relationships view and see the table diagram shown below.

I see that there is no relationship defined with the DimProductCategory table because there are no connecting lines leading into or from this table. Therefore, I first want to select the Manage Relationships button. I can see the four relationships that tie together the other five tables, but I am missing a relationship. I may first try to click the auto detect button which I talked about last time to see if Power BI can find the missing relationship

Unfortunately, Power BI very quickly responds back that it cannot. The problem is that while the DimProductSubcategory table has a field named ProductSubCategoryKey, the new table, DimProductCategory, does not have a field with that name.

So I might try to create the relationship manually by clicking the New button on the Manage Relationships screen (after closing the Autodetect message screen of course).

On this screen I have to specify the names of the tables and the linking columns to define the relationship. I can begin with either table on the top. After I select the table, I see a grid of the available fields along with a couple of rows of data. To select the field that I want to use in the relationship, I merely need to click the column header to select the field.

I then specify the name of the linking table along with selecting the column that I want to link to.

There are some advanced options which I am going to skip over for now and simply click the OK button. Power BI then attempts to create the relationship. But wait a minute, you might say, the column names are different and the data types of these two fields are different. That is true. I am not surprised that I can link two tables on fields with a different name, but different data types? In fact, I expected Power BI to reject this relationship because of the different data types, but it did not. It created the relationship as shown in the following image

To find out if this relationship is really valid, I next go to the report desktop and select the ProductCatgoryName field from the DimProductCategory table and the ProductSubcategoryName field from the DimProductSubcategory table. You can see from the figure below that the data makes sense. The general category Audio should include subcategories like Headphones, Radios, and Speakers, but not Camcorders, Cameras, or Cell Phones. Somehow, Power BI was able to transform automatically one of the data types to the other (I am thinking it converted the string field to an integer) and then formed the relationship.

Yes, I could prove that this relationship is correct another way by creating a simple table that lists ProductCategoryLabel from the DimProductCategory table and ProductCategoryKey from the DimProductSubcategory table. You can clearly see that the relationship links these two tables correctly.

Surprised by this, I wanted to try something else that may not be as easy for Power BI to automatically convert. I took another table, the Stores table, and modified the Excel version of the table to change StoresKey to prefix the store number with the letter ‘S’. I then loaded that table into my model as shown below.

Next I went to the Manage Relationships dialog as before and attempted to add a relationship between the FactSales table and the Stores table as shown in the following figure.

Again Power BI did not complain about creating the relationship. However, when I went to the reports page this time and attempted to display a report of sales amount (from FactSales) against the StoreID from the Stores table, I got the result shown below which indicates a total sales for each year, but no StoreID value at all. (I used the Matrix visualization here with StoreID for the rows and YearLabel for the columns and SalesAmount for the value.) So clearly this relationship did not work.

Therefore, I need to edit one table or the other to ‘fix’ this relationship. I choose to edit the Stores table. Remember from last time, to edit a table, I click the Edit Queries button to open a new window where I can edit the tables.

In this case, the solution is to remove the ‘S’ from the front of each of the StoreID values. I can do this with the Split Column transformation and specifically to split the column based on the number of characters from the left rather than splitting the column based on any specific character or character string like I did in an earlier blog example.

This action results in two columns, the first, StoreID.1, contains only the first character of the StoreID field, the ‘S’. The second field, StoreID.2, contains the numeric portion of the store ID. As you can see in the formatting of this column in the following figure, Power BI also treats StoreID.2 as a numeric value.

I then removed the StoreID.1 column as something I will not need. I also renamed StoreID.2 to just StoreID.

I then clicked Close and Apply.

Now I can create a new relationship between FactSales and Stores using StoreKey and StoreID respectively as shown below. Note that I did not change the field names to match. If I had, I could probably use the auto-detect feature to find and create the relationship for me.

This time when I attempt to create the same report on sales by store and year, I get reasonable results as shown in the table below.

Well, I hope you found that interesting. Next time I plan to probe a little deeper on creating calculated columns within a query.

Till next time, c’ya!

Power BI can Load Data from Multiple Data Sources

This week I’ll show you how to use Power BI to pull together data from multiple data sources that you can then use to create visualizations.

The first step is to open your copy of Power BI Desktop. Then click on the Get Data button. I’m going to begin with some data from SQL Server so I do not have to click the More… button to see all the available data sources, SQL Server is right on the main dropdown.

To work with a SQL Server database, I must first enter the name of the SQL Server that I want to connect to. Note that while the dialog shown below prompts for the name of the database, this information is optional. You might think, ‘How can the database name be optional?’ In a moment, I will show you that Power Bi first needs to connect to the server, but then it can retrieve the database names that you have access to and display them for you to select. This protects you from the possible misspelling of the database name in this dialog. Since I am running everything on a single machine, my SQL Server name is LocalHost.

Of course before you can connect to the server, you must provide credentials. I am running everything on a Surface Pro 3 so I will use my Windows credentials. However, if you need to connect to another server on your network, you may need separate credentials.

The following dialog may or may not appear depending on the database you are connecting to and your version of SQL Server. However, in most cases, clicking the OK button allows you to connect to your data as long as you have rights to the SQL Server box.

When Power BI connects to the server, it first begins with a list of the possible databases along the left side of the screen. By opening a database (clicking on the arrow to the left of the database symbol, I can see the tables in the database. I can select multiple tables from this database. I can even select multiple tables from multiple databases. With each table I select, I will see a sample of the data in the right panel. Note that unlike PowerPivot, there is no way to limit the columns or rows from the table at this point. Maybe in the next major release.

You’ll notice that this week I return to my favorite sample dataset, Contoso.

If I select multiple tables to load, they all appear in the following dialog that shows me the progress of loading my data. In the image below, I can see that I’ve selected five tables from the database. As the data load progresses, Power BI shows me the number of records loaded in the line beneath the table name.

After the data loads, I notice that the table names appears in the Fields column on the far right of the desktop. But you say that these are table names? If I click the arrow to the left of each table name, the table expands to show the names of the fields essentially grouped by table.

Next I can click on one of the icons on the left side of the desktop. The first icon is the report surface. I typically work there after I have loaded all my data and made any changes to the columns and/or rows. So some might think that this icon should be on the bottom. The second icon is the data icon. I can use this icon to edit the column definitions, add new columns, or remove columns that I no long need. I showed some of these techniques in prior weeks of this blog.

However, I usually start with the bottom icon, Relationships. Relationships are the heart of any multi-table analysis. If I do not define the relationships between the tables first, I really cannot do any analysis. I might not be able to create some column either. So let’s start there.

The diagram that Power BI creates shows the tables that have been loaded into the model and if relationships were defined in the source data, it tries to create those relationships for me automatically. Notice that the diagram clearly identifies the one side of the relations with a ‘1’ and the many side of the relationship with an asterisk.

Unlike the diagram view in Power BI, there is no ability to edit, delete, or create new relationships from this dialog. Nor can I delete columns, define hierarchies, or rename columns here. I can hide individual columns or entire tables from the report view of the model and I can maximize the view of the individual table diagram, but that’s it. Again, maybe in the next major release of Power BI these additional features will appear so that the functionality matches that of PowerPivot diagram views.

Next I’m going to load some additional data into my model beginning with some data from an Access database and then data from Excel. I’m not going to take the space here to show screen images of these processes because between the above description and prior week blog discussions, I believe that you can figure out how to load data from different data sources. The only problem you may encounter is the need for new or updated connectivity tools to access your other data sources.

Once I have loaded all my data, my first action is to go to the Relationships view. You can see in the following figure that Power BI automatically shows my additional tables that I’ve loaded, but they show no connection to the rest of the tables in the data model. I need to fix this before going forward.

Find the Manage Relationships button in the Relationships group in the ribbon and click on it.

A dialog appears that displays the current relationships between the tables. This view is similar to the manage relationships dialog in Power Pivot. I can add, edit and even delete relationships here. However, the interesting button is the one labelled Autodetect

When I click Autodetect, Power BI attempts to find and create relationships between the tables that are not currently connected to the data model. It does this by analyzing the field names in the new tables with names in the tables of the data model to find other fields with the same name and the same data type definition. Of course, one table must serve as the parent (one-side) and the other table must serve as the child (many-side). In most cases, you will start with a fact table and create branching child tables so they usually serve as the one side in each of the new relationships. If Power IB is successful, it will attempt to create that relationship and will first tell you in a dialog box such as the one shown below how many relationships it detected.

After closing the information dialog reporting the number of relationships found, the relationships are created and the results for this model appear in the updated Manage Relationships dialog shown here.

Note at this point, I could edit any of the relationships that were not correctly created, I could also add relationships between tables that perhaps had different field names that could not be automatically detected. It is a good practice to check the Relationships diagram to insure that every table is connected to the model through at least one relationship.

I am now ready to go to the Reports desktop to create some simple charts.

Creating a report is as simple. First click in a blank area of the Reports desktop. Then click in the checkbox to the left of the value field you want to count, sum, or average such as Sales Amount. Then click on the field or fields by which you want to report Sales Amount (or any other value field). For example, I might want to see sales by year or by Product category or by Channel. You can create as many visualization as you want on the desktop although you should keep in mind that they need to be large enough to be readable. Let’s start with the three shown below.

Not a bad start. Here is a fun feature that Power BI supports that Power Chart does not. Just click on the Year 2009 column in the chart in the upper left. Immediately, all charts on the page change to shows the sales related to the year 2009 in a darker color and sales from other years in a light color.

Note that I did not have to do anything special to get this to work. I did not have to create a filter, a slicer, or anything. Similarly, I could click on columns in other charts to slice and dice the data different ways.

Filtering the data on the page is actually done entirely differently. To filter the data that appears on the page, you have to define a page filter by dragging the field or fields that you want to filter by into the area beneath Page Level Filters found in the column to the immediate left of the column with all the table fields. Filter fields typically are string values and typically, there are not a large number of unique value. To filter the page, just click in the checkbox to the left of each value that you want to include on the page. You can filter on one or many values at a single time.

After selecting a page level filter, the visualizations on the page change automatically to represent only the filtered data rather the entire dataset. The following desktop image represents only the data from the year 2009.

Well, that’s all for this week, but if you have been following along, save your data from this week because I’ll pick up this sample data from this point next time.

C’ya.

Formatting Power BI Visualizations

Being able to mash together data from multiple sources to create meaning full analysis in Power BI or even PowerPivot can be lots of fun, at least to my data geek friends, but presenting visualizations of that data to management without the ability to customize the look would fall flat on its face at your next executive presentation. Therefore, I am taking a side trip this week to show you a little about formatting your visualization, enough to get you started exploring how to make individual visualizations look good.

I also want to introduce you to a set of simple data table that you can use for free while learning how to use Power BI. A team from the Paris Technical college put together some data and perform some basic clean-up of the data. The team includes: Petra Isenberg, Pierre Dragicevic and Yvonne Jansen and the data can be found at: http://www.perso.telecom-paristech.fr/~eagan/class/as2013/inf229/labs/datasets.

For today’s blog, I am going to use their CARS dataset. While it has over 400 cars with 8 characteristics, it does not have the new Tesla X model L, but it is a good dataset to show some basic visualization formatting technics.

Thus I start by importing the CARS.CSV file by using the Get Data option.

After loading the dataset (shown below), you can see a preview of the data in which the first two data rows are not cars. In fact, the first row looks like it could serve as column headers or in terms of our data set, column names. While the second row looks like it defines the preferred data type for that column. To begin loading the data, click the Load button in the bottom right of the screen.

After loading the data, the first thing I need to do is to make some quick changes to the dataset by clicking the Edit Queries button.

In the Transform group of the Home ribbon of the Query Editor, I can click on the option: Use First Row As Headers. This is similar to the PowerPivot load technique in which you could define whether your data included headers in the first row.

Next I want to get rid of the row with the data type (which is now the first row). But before going further, I take note of the data type of each column. I will need that information in just a moment. In the Reduce Rows option group, there is an option to Remove Rows.

Click the bottom half of the Remove Rows button. This action opens a dropdown letting you specify how you want to remove rows. I am going with Remove Top Rows because the data type row is now the first row in the dataset. This option now opens a dialog that lets me specify how many rows I want to remove from the top of the dataset (Do you see why I had to deal with the header row first?). In this case, I only need to remove 1 row, but you may encounter other datasets that require you to remove more than a single row.

Next I noticed the car column appears to be a concatenation of the car make and model.

Looking through the data, the car model appears to always be a single word. Therefore, there is an easy way to split this column by click on the Split Column button and then select By Delimiter.

The obvious delimiter here is the left-most space character.

After splitting this column and renaming the columns as Make and Model, my data table looks like the following image.

Next, I noticed that all the numeric data was loaded as text. This is a direct result of reading data from the CSV file. Remember the second data row which told us the preferred data type? We can use that information to change the data type (in the Formatting group of the Modeling ribbon) of each column as shown in the following image.

I also want to change the year column to a four digit year. Because this data does not include cars from the 21st century, I can get away with simply concatenating all the values in the YEAR column with a prefix of ’19’.

After completing all my data preparation, I can click on Close and Apply so I can begin creating reports (visualizations) with this data.

Let’s begin with a simple table of average MPG for each year. On a new report page, I can drag the columns MPG and YEAR from my list of columns for the table. Note that by default, numeric values are summed. To change MPG to calculate the average MPG, open the dropdown for MPG in the Values section of the Visualizations column on the right and click on Average. If YEAR is being treated as a number, you can select ‘Do Not Summarize‘ from its dropdown. My table now looks like the following image.

I would rather see the year first, then the MPG so I need to change the order of the columns. You can do this by clicking on and dragging the field name in the Values section of the Visualizations (where I changed the aggregation types) and dragging the field up or down as appropriate to place the columns in the order I want.

While this table is interesting and shows a general increase in the average MPG of cars in the dataset, it is not as clear to see how the average rises and falls from one year to the next. Therefore, I might want to change to a column report.

While this is a nice basic chart, it needs labels to explain to someone viewing it for the first time what they are seeing. Let’s begin by clicking on the Format icon just below the Visualization selection grid.

Now instead of seeing the Values and Filters groups under the Visualizations, we see various chart components. Each of these components have a dropdown arrow on their left which will open that section to expose additional properties for that chart element. They also include a simple sliding switch to turn that element on or off within the chart. Note that currently the X-Axis and Y-Axis are turned on, but Data Labels, Title, and Background are not.

If we open the properties for the X-Axis, we see formatting options for this chart element. Although not needed here, I can change the axis type, the axis scale (linear or logarithmic), a start and end value for the axis, and a Title, Style, and Color. For the X-Axis, I only want to turn the Title property on by moving its slider to the On position.

Next I open the Y-Axis properties. I change my axis scale to begin at 10 and default to automatic for the end value. Often when data values are very similar, plotting their values with a y-axis that starts at zero de-emphasizes the variations from one value to the next. Setting the starting point of the y-axis to a value closer to the smallest y-axis value will emphasize those differences. Think of a stock that trades in the lower $200s range with a daily fluctuation of a single dollar or less. Those fluctuations are masked if the chart has a y-axis data range of 0 to $250. On the other hand, changing the y-axis to begin span only the range 200 to 250 emphasizes the daily fluctuations.

As you can see below, my chart is starting to look a little better.


Next I open the data colors group. I can change the default color which is an aqua shade to any of the other colors by selecting either from the Theme Colors provided in the Default Color dropdown by going to the Custom Color option and defining my own unique color for the chart.


However, this changes the color of all the columns to that one custom color. What if I want to display different columns using different colors? If I slide the Show All slider to On, the property dialog now shows a separate color selection for each column in my chart as shown below.


Did you notice that the years are not displayed in chronological order? Did you wonder why? The years are actually displayed in order by the size of the Average MPG so that you could, if you want, define a progressive shift in color shade as the value of Average MPG increases. That being said, I decided to customize the colors by year as shown in the following image.


Finally I went back to turn on the Y-Axis title to arrive at the visualization shown below.


To add detail to the chart, I also turned on the Data Labels group which causes the data value to display at the top of each bar.


Now my chart shows both the actual data values as well as the relative change from year to year.


Finally, I turn the title element on. Here, unlike the x-axis or y-axis, I can specify the text to display as the title. For the x-axis and y-axis labels, the text is taken directly from the column names. I can also specify the font color, a background color and the alignment of the title across the top of the chart.


My chart is starting to look pretty good but it is still missing something.


I open the last chart element group, Background, and turn on the background giving it a custom color.


Now I have a final version of the chart that I might want to publish and add to a dashboard (which I will talk about in a future blog.)


Okay, while I did not show everything that formatting can do, I did want to give you a start. Please note that if you use other visualizations, they have their own formatting options as well which work the same or similar to these. Have fun playing with visualizations until next time.

C’ya.

Analyzing SharePoint Survey Results with Power BI

The last two times, I introduced some of the basics of using Power BI with web site data from a third party. This time, I am going to show you how you can retrieve data from SharePoint and analyze it with Power BI.

To get started, I am going to open a new Power BI desktop instance and select More from the Get Data button to define a new data source, a data source within my SharePoint farm.

When the Get Data dialog appears, I can scroll down through the available data sources until I encounter SharePoint List.

When I want to pull data from a SharePoint list, I need to enter the URL of the SharePoint site, not the URL for the list itself as shown in the following figure.

I can navigate to my SharePoint site and capture the URL from the address bar in the top of the browser.

Note however that I do not need the entire URL, I only need the portion of the URL that defines the site. Therefore, I can remove the /Pages/default.aspx which references the home page of the site.

When I click OK, Power BI will go out the SharePoint site and look for the lists defined in the site. But before I get to that, I want to make a point that Power BI did not ask me for my credentials when I went to this collaboration site. The reason is that particular farm uses Kerberos authentication so my Windows authentication was passed through to SharePoint which allows me to see the contents of the site. If I did not have rights to the site, Power BI would not authenticate me or provide a list of the lists. A little later, I show you what you may see if you don’t use Kerberos to pass through your Windows credentials.

The next thing I noticed when I saw the Power BI Navigator showed me the names of the available lists is that some of the items were not lists, at least not strictly speaking. When I looked at the names, I noticed that some of them were from the Discussion Board area and some were from the Surveys area. Interestingly, the Navigator does not treat libraries as lists. Therefore, if I use a form tool like InfoPath which writes the metadata back to the form library, I do not think at this time that I can access that data from Power BI.

On the other hand, Power BI could provide an interesting way to access and analyze survey data. I just happen to have a survey result list available to take a look at, so I begin by entering the URL of the survey site.

This instance of SharePoint is not currently setup to work with Kerberos. Therefore, I get the Access SharePoint dialog shown below by clicking Windows authentication in the left side menu. In this case, I need to switch from using my current credentials to using alternate credentials and then supplying the appropriate username and password. I still want the full address to the survey site as before. Then I can click Connect.

Now Power BI can successfully interrogate my site to look for lists and display them in the Navigator as shown below. For this demo, I only need the single list Senior Exit Survey. (Note that all blanks are removed from the list names.) In a future blog, I will demo how to work with multiple lists/tables from a single source, so be patient.

After selecting a list, the Navigator displays a preview of the data in the right side panel. Note that the column names, like the list names, have had all blanks removed. Actually Navigator removes other non-alpha characters as well. In fact, column names cannot even begin with a number. If you have column names that begin with a number, Power BI will automatically precede the names with “c_”.

Another thing I noticed is that the preview only shows the first several lines of data and then displays the text: ‘The data in the preview has been truncated due to size limits.’ Don’t worry about this. The data is all there.

After I click the Load button, Power BI starts to load the data into its data model. Depending on the number of column and rows, this process can take from a few seconds to a minute or more. However, eventually, the load completes and I can see the table by clicking on the Data icon along the left side of the screen and then clicking on the table I want to view (if I had more than one table).

Note that I could click on the Edit Queries button at this point to clean up some of the data getting rid of columns that I will not need, but I’ll skip that step for the sake of keeping this blog a little shorter.

Next, I click the Report icon in the left margin of the desktop. Here I can see the columns in my table. If there were multiple tables, I could collapse and expand the column list for any table by clicking the arrow to the left of the table name.

Let’s say that I want to plot the number of each response to the question: ‘I clearly understood the requirements for graduation.’ I can click in the check box to the left of that column to identify the question and then click the check box to the left of the ID value which I can use to count the responses.

Note that by default, Power BI may try to sum the ID value. Remember from last week I should you how to change the aggregation of a value from SUM to COUNT.

I’m also going to initially default the visualization of this data to table. Therefore, my two right columns for Visualizations and Fields should look like the following figure.

My table should look like the following figure. If the columns are not in the correct order, you can simply click and drag the column names in the Values section of Visualizations to get the order you want.

Instead of seeing my data as a table however, I might want to display the data using the Funnel visualization and sort by the count of ID as shown in the following figure. Changing the sort can be accomplished by clicking the ellipsis in the upper right corner and selecting Sort By to change the sort field.


In a similar fashion, I can click in a blank area of the desktop and create a column graph visualization for the responses to the question: “This high school has prepared me for my career choices.” The chart shows the degree to which the respondents agreed or disagreed with that statement. Note that in this case neither sorting by the count of the IDs or by the text of the answers provides a satisfactory ordering of the data. I’ll return to this at a later date.

Clicking in another blank area, I selected the count of ID along with the date the record was created or when the survey was taken. As you can see in the following figure, the data appears to be a little odd in that there is no date with more than 4 surveys. I did not believe this because with over 9500 records spread out over less than 150 days, that low count per day just does not make mathematical sense. Remember last time when I said that Power BI can create good charts and bad charts. The user must know their data to determine when the data is not being displayed correctly. In this case, the reason for the unusual result is that the create date is not just a date, but a date and time. Therefore, most surveys, even when taken by over 9500 students, do not occur within the same minute although statistically, some will.

So how can I fix this? I need to return to the data table and make a few changes. I really don’t want to lose the original data, so I create a duplicate of the Create column that I can modify.

In my new column, I can change the column name to: Day Created and then by right clicking on the header, select Change Type and then Date to change the field to a simple date with no time associated with it.

When I go back and change my chart to display the count of IDs by Day Created rather than the Created column which included time, I get a more realistic chart as shown below.

If I were to zoom out, you can see that I have all four of these visualization on the desktop at one time. In fact, other than the amount of screen space, there is no limit to the number of visualizations you put on a single screen.

Now let’s try something interesting. Suppose I select one of the days in the last chart that I create, specifically, Friday, January 23, 2015. I can see that the popup information box tells me that were 697 surveys taken on that day.

But more interestingly, you will notice that the rest of the visualization on the page automatically changed to emphasize the counts representing that one day with a darker shading and for the table, the numbers in the first column represent the count for each answer for only that one day.

That’s quite a bit for this week. Next time I take a further look into working with Power BI and SharePoint Survey data.

C’ya then.

The Best Year for Television Was 2010 says Power BI and imDB.

So last week I started by showing you how to get Power BI Desktop downloaded and installed on your computer. Hope you were able to do that because I’m going to focus on it for the next couple of months.

In addition, I took you through the steps of how to reference data from the movie and television web site: www.imdb.com. I specifically suggested that you download the top 250 television shows by going to a specific page URL that I included in the blog. (Go back to last week’s blog to see it if you are just joining us now. Really! I’ll wait right here.) I specifically left you in the Navigator dialog which showed a preview of the first 20+ records in the table. Now I must say that there is really nothing special about this page or the data. Any data displayed within a table structure on a web page can be referenced from any site. While I am not going to show it this week, this includes referencing data from a SharePoint list displayed as a web part on a page within SharePoint. (You were wondering if I would tie this all back to SharePoint. Admit it.)

After clicking the load button on the Navigator, Power BI begins to load the data into its data model. While this step acts much like the data load step in PowerPivot when getting data from an external source, there is at least one major difference. Power BI does not currently have a preview capability like PowerPivot to select columns and filter rows. While I will show how to do some editing of the loaded data in a moment, the thing that bothers me about this is that it inflates the size of the data model during load. When I talk about pulling data from a SharePoint list in a future week, remember that I can customize the list view to show only the data that I need to load into Power BI.

So here is the view of the data table loaded from the top 250 television shows.

Notice that the table has some columns that I do not need and it has a column which concatenates three different pieces of information: Rank, Show Title, and Year. I need to edit the table and I can do that by clicking on the Edit Queries button in the Home ribbon.

Now the first thing I want to do is to get rid of columns that I don’t want. In this case, the first column appears empty. The Your Rating column and the column with a heading value of ‘2‘ are not needed either. My first thought might be to select each column and then select Remove Columns from the Home ribbon as shown the prior image here. But I can also select the columns I want to keep and then select the Remove Other Columns option shown in the following figure. Note, I can press and hold the CTRL key while I click on each column header I want to select or I can click on the first column header I want and then while pressing the SHIFT key, click the last column I want and all columns between these two columns will also be selected. Which method I use to select the columns I want or don’t want is up to the individual situation as one method may be easier than the other.

I now have two columns, the Rank & Title column and the IMDb Rating column. Next, select the Rank & Title column and click Split Column in the Transform group of the Home ribbon. I want to split the column into multiple columns. In some cases, I may have a data structure which lends itself to be split at a specific number of columns (characters). But that is not the case here. I want to split the data at the first space character in the field which may be the third, fourth or fifth character from the left. Therefore. I select the By Delimited option.

In the dialog that appears, I can choose from one of 6 common delimiters. If the data includes these delimiters within their values, I may need to create a data set that uses a different, custom delimiter. The interesting feature of the custom delimiter is that it is not limited to a single character. However, in this case, I can use the Space character delimiter. Using the Custom delimiter (period-space) might also be a good choice in this case so that Rank can be displayed as an integer without a decimal point.

After selecting the delimiter, I can choose whether to apply this rule at the left-most occurrence of the delimiter, the right-most occurrence or at every occurrence. In this case, I only want to apply it to the first occurrence from the left.

You should also note that when Power BI Query Editor splits the column it creates one column from all the characters to the left of the delimiter and a second column with all the characters to the right of the delimiter. The delimiter itself is thrown away.

After applying the command to split the column, I may want to rename the new column. I can do this by right clicking the column header and selecting Rename from the drop-down menu.

In a similar fashion, I can split the year of the TV show into its own column. (I’ll leave you to figure out the steps you need to do this based on my description above.) After renaming the column with the year and renaming the Rank & Title column, my table now looks like the following.

Hint: Use the Split Column function to remove the right parenthesis and then deleted the resulting created column.

I am now ready to start using my data to perform some analysis. To do this, I click on the Close and Apply button on the far left side of the Home ribbon.

This returns me to the working desktop as shown in the following figure. Note the three icons along the left side. The top icon opens the visualization desktop where I can display different charts and tables of my data. The second icon shows me my data tables. If I have more than one table in my data model, I can switch between the tables using the list of tables on the right side of the screen. I will not need to do that here since I only have a single table. Finally the last icon lets me see and work with the relationships between multiple tables. Again, for this week, I do not have to worry about relationships between tables because I only have a single table. I will cover working with multiple tables at a future date.

Switching to the visualization page, I can see the fields in my table along the right side of the screen. To begin a visualization, I can simply click on a table field and drag it into any blank area on the desktop. For example, supple I drag over the fields ID and TV Show and then change the visualization to a vertical column chart as shown below.

While Power BI did create a chart, the chart does not make much sense. I can see the names of the TV Shows across the bottom of the chart, but plotting the value of the show ID field as the vertical value of each column does not make any sense. Maybe if I had the total number of people who watched the show, it would be a better chart.

However, my point in making this chart is to emphasize the need to know and understand your data before you just go off and create visualizations. While Power BI can make analyzing your data easier, it can also make creating meaningless charts easier as well.

So what can I do? Suppose I plot year across the horizontal axis instead of the TV Show name. Now Power BI plots the vertical column as the sum of the show ID values. Again, not very useful. However, if I open the dropdown for the ID field in the second column on the right side of the screen under Value, I can change the default action from Sum to Count.

Now I have something useful, the number of top TV shows by year as shown below.

You can see that 2010 had the most top TV shows. I wonder what they are? Next, I drag the TV Show field by itself to an empty area of the desktop and choose the Table visualization. (Just hover over the visualization icons and you will see their names.) But I don’t want to see all the TV shows, just those in the best year for TV, 2010. So I add the field Year to the Visual Level Filters in the second column on the right. Clicking on the down arrow to open the properties, I unselect (All) and select 2010.

Now my list only displays TV shows in 2010.

Finally, I want to see the rating of each show so I can click on the checkbox for Ratings or I can click on Ratings and drag it over to my table. To see the TV Shows from the highest rating to the lowest, I can click on the Ratings header. Repeated clicking on the header changes the sort from ascending to descending. Once I sort the shows by descending order, my list of top TV shows from 2010 looks like the following:

My point in doing this exercise was not really to get a list of the top TV shows for the year with the most top TV shows (although that is what I did), my point was to show you how easy it is to grab data from a web page that you did not create and perform analysis on the data found there. That is HUGE.

In future blogs, I will cover additional features of Power BI.

Hope you found this blog interesting and want to learn more about Power BI. Be sure to come back regularly.

C’ya next time.