The Best Year for Television Was 2010 says Power BI and imDB.


So last week I started by showing you how to get Power BI Desktop downloaded and installed on your computer. Hope you were able to do that because I’m going to focus on it for the next couple of months.

In addition, I took you through the steps of how to reference data from the movie and television web site: www.imdb.com. I specifically suggested that you download the top 250 television shows by going to a specific page URL that I included in the blog. (Go back to last week’s blog to see it if you are just joining us now. Really! I’ll wait right here.) I specifically left you in the Navigator dialog which showed a preview of the first 20+ records in the table. Now I must say that there is really nothing special about this page or the data. Any data displayed within a table structure on a web page can be referenced from any site. While I am not going to show it this week, this includes referencing data from a SharePoint list displayed as a web part on a page within SharePoint. (You were wondering if I would tie this all back to SharePoint. Admit it.)

After clicking the load button on the Navigator, Power BI begins to load the data into its data model. While this step acts much like the data load step in PowerPivot when getting data from an external source, there is at least one major difference. Power BI does not currently have a preview capability like PowerPivot to select columns and filter rows. While I will show how to do some editing of the loaded data in a moment, the thing that bothers me about this is that it inflates the size of the data model during load. When I talk about pulling data from a SharePoint list in a future week, remember that I can customize the list view to show only the data that I need to load into Power BI.

So here is the view of the data table loaded from the top 250 television shows.

Notice that the table has some columns that I do not need and it has a column which concatenates three different pieces of information: Rank, Show Title, and Year. I need to edit the table and I can do that by clicking on the Edit Queries button in the Home ribbon.

Now the first thing I want to do is to get rid of columns that I don’t want. In this case, the first column appears empty. The Your Rating column and the column with a heading value of ‘2‘ are not needed either. My first thought might be to select each column and then select Remove Columns from the Home ribbon as shown the prior image here. But I can also select the columns I want to keep and then select the Remove Other Columns option shown in the following figure. Note, I can press and hold the CTRL key while I click on each column header I want to select or I can click on the first column header I want and then while pressing the SHIFT key, click the last column I want and all columns between these two columns will also be selected. Which method I use to select the columns I want or don’t want is up to the individual situation as one method may be easier than the other.

I now have two columns, the Rank & Title column and the IMDb Rating column. Next, select the Rank & Title column and click Split Column in the Transform group of the Home ribbon. I want to split the column into multiple columns. In some cases, I may have a data structure which lends itself to be split at a specific number of columns (characters). But that is not the case here. I want to split the data at the first space character in the field which may be the third, fourth or fifth character from the left. Therefore. I select the By Delimited option.

In the dialog that appears, I can choose from one of 6 common delimiters. If the data includes these delimiters within their values, I may need to create a data set that uses a different, custom delimiter. The interesting feature of the custom delimiter is that it is not limited to a single character. However, in this case, I can use the Space character delimiter. Using the Custom delimiter (period-space) might also be a good choice in this case so that Rank can be displayed as an integer without a decimal point.

After selecting the delimiter, I can choose whether to apply this rule at the left-most occurrence of the delimiter, the right-most occurrence or at every occurrence. In this case, I only want to apply it to the first occurrence from the left.

You should also note that when Power BI Query Editor splits the column it creates one column from all the characters to the left of the delimiter and a second column with all the characters to the right of the delimiter. The delimiter itself is thrown away.

After applying the command to split the column, I may want to rename the new column. I can do this by right clicking the column header and selecting Rename from the drop-down menu.

In a similar fashion, I can split the year of the TV show into its own column. (I’ll leave you to figure out the steps you need to do this based on my description above.) After renaming the column with the year and renaming the Rank & Title column, my table now looks like the following.

Hint: Use the Split Column function to remove the right parenthesis and then deleted the resulting created column.

I am now ready to start using my data to perform some analysis. To do this, I click on the Close and Apply button on the far left side of the Home ribbon.

This returns me to the working desktop as shown in the following figure. Note the three icons along the left side. The top icon opens the visualization desktop where I can display different charts and tables of my data. The second icon shows me my data tables. If I have more than one table in my data model, I can switch between the tables using the list of tables on the right side of the screen. I will not need to do that here since I only have a single table. Finally the last icon lets me see and work with the relationships between multiple tables. Again, for this week, I do not have to worry about relationships between tables because I only have a single table. I will cover working with multiple tables at a future date.

Switching to the visualization page, I can see the fields in my table along the right side of the screen. To begin a visualization, I can simply click on a table field and drag it into any blank area on the desktop. For example, supple I drag over the fields ID and TV Show and then change the visualization to a vertical column chart as shown below.

While Power BI did create a chart, the chart does not make much sense. I can see the names of the TV Shows across the bottom of the chart, but plotting the value of the show ID field as the vertical value of each column does not make any sense. Maybe if I had the total number of people who watched the show, it would be a better chart.

However, my point in making this chart is to emphasize the need to know and understand your data before you just go off and create visualizations. While Power BI can make analyzing your data easier, it can also make creating meaningless charts easier as well.

So what can I do? Suppose I plot year across the horizontal axis instead of the TV Show name. Now Power BI plots the vertical column as the sum of the show ID values. Again, not very useful. However, if I open the dropdown for the ID field in the second column on the right side of the screen under Value, I can change the default action from Sum to Count.

Now I have something useful, the number of top TV shows by year as shown below.

You can see that 2010 had the most top TV shows. I wonder what they are? Next, I drag the TV Show field by itself to an empty area of the desktop and choose the Table visualization. (Just hover over the visualization icons and you will see their names.) But I don’t want to see all the TV shows, just those in the best year for TV, 2010. So I add the field Year to the Visual Level Filters in the second column on the right. Clicking on the down arrow to open the properties, I unselect (All) and select 2010.

Now my list only displays TV shows in 2010.

Finally, I want to see the rating of each show so I can click on the checkbox for Ratings or I can click on Ratings and drag it over to my table. To see the TV Shows from the highest rating to the lowest, I can click on the Ratings header. Repeated clicking on the header changes the sort from ascending to descending. Once I sort the shows by descending order, my list of top TV shows from 2010 looks like the following:

My point in doing this exercise was not really to get a list of the top TV shows for the year with the most top TV shows (although that is what I did), my point was to show you how easy it is to grab data from a web page that you did not create and perform analysis on the data found there. That is HUGE.

In future blogs, I will cover additional features of Power BI.

Hope you found this blog interesting and want to learn more about Power BI. Be sure to come back regularly.

C’ya next time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s