Update to Power BI Desktop

On September 23rd 2015, Microsoft has released an upgrade to the Power BI Desktop that includes 44 new features. If you are following along with my blog entries on Power BI, you might want to download this release before reading my next posting. You can get the new version (it’s free so what are you waiting for) from: http://blogs.msdn.com/b/powerbi/archive/2015/09/23/44-new-features-in-the-power-bi-desktop-september-update.aspx. The new version number is: 2.27.4163.351.

This article includes details on all of the new features along with helpful images that help explain how to use them.


Analyzing SharePoint Survey Results with Power BI

The last two times, I introduced some of the basics of using Power BI with web site data from a third party. This time, I am going to show you how you can retrieve data from SharePoint and analyze it with Power BI.

To get started, I am going to open a new Power BI desktop instance and select More from the Get Data button to define a new data source, a data source within my SharePoint farm.

When the Get Data dialog appears, I can scroll down through the available data sources until I encounter SharePoint List.

When I want to pull data from a SharePoint list, I need to enter the URL of the SharePoint site, not the URL for the list itself as shown in the following figure.

I can navigate to my SharePoint site and capture the URL from the address bar in the top of the browser.

Note however that I do not need the entire URL, I only need the portion of the URL that defines the site. Therefore, I can remove the /Pages/default.aspx which references the home page of the site.

When I click OK, Power BI will go out the SharePoint site and look for the lists defined in the site. But before I get to that, I want to make a point that Power BI did not ask me for my credentials when I went to this collaboration site. The reason is that particular farm uses Kerberos authentication so my Windows authentication was passed through to SharePoint which allows me to see the contents of the site. If I did not have rights to the site, Power BI would not authenticate me or provide a list of the lists. A little later, I show you what you may see if you don’t use Kerberos to pass through your Windows credentials.

The next thing I noticed when I saw the Power BI Navigator showed me the names of the available lists is that some of the items were not lists, at least not strictly speaking. When I looked at the names, I noticed that some of them were from the Discussion Board area and some were from the Surveys area. Interestingly, the Navigator does not treat libraries as lists. Therefore, if I use a form tool like InfoPath which writes the metadata back to the form library, I do not think at this time that I can access that data from Power BI.

On the other hand, Power BI could provide an interesting way to access and analyze survey data. I just happen to have a survey result list available to take a look at, so I begin by entering the URL of the survey site.

This instance of SharePoint is not currently setup to work with Kerberos. Therefore, I get the Access SharePoint dialog shown below by clicking Windows authentication in the left side menu. In this case, I need to switch from using my current credentials to using alternate credentials and then supplying the appropriate username and password. I still want the full address to the survey site as before. Then I can click Connect.

Now Power BI can successfully interrogate my site to look for lists and display them in the Navigator as shown below. For this demo, I only need the single list Senior Exit Survey. (Note that all blanks are removed from the list names.) In a future blog, I will demo how to work with multiple lists/tables from a single source, so be patient.

After selecting a list, the Navigator displays a preview of the data in the right side panel. Note that the column names, like the list names, have had all blanks removed. Actually Navigator removes other non-alpha characters as well. In fact, column names cannot even begin with a number. If you have column names that begin with a number, Power BI will automatically precede the names with “c_”.

Another thing I noticed is that the preview only shows the first several lines of data and then displays the text: ‘The data in the preview has been truncated due to size limits.’ Don’t worry about this. The data is all there.

After I click the Load button, Power BI starts to load the data into its data model. Depending on the number of column and rows, this process can take from a few seconds to a minute or more. However, eventually, the load completes and I can see the table by clicking on the Data icon along the left side of the screen and then clicking on the table I want to view (if I had more than one table).

Note that I could click on the Edit Queries button at this point to clean up some of the data getting rid of columns that I will not need, but I’ll skip that step for the sake of keeping this blog a little shorter.

Next, I click the Report icon in the left margin of the desktop. Here I can see the columns in my table. If there were multiple tables, I could collapse and expand the column list for any table by clicking the arrow to the left of the table name.

Let’s say that I want to plot the number of each response to the question: ‘I clearly understood the requirements for graduation.’ I can click in the check box to the left of that column to identify the question and then click the check box to the left of the ID value which I can use to count the responses.

Note that by default, Power BI may try to sum the ID value. Remember from last week I should you how to change the aggregation of a value from SUM to COUNT.

I’m also going to initially default the visualization of this data to table. Therefore, my two right columns for Visualizations and Fields should look like the following figure.

My table should look like the following figure. If the columns are not in the correct order, you can simply click and drag the column names in the Values section of Visualizations to get the order you want.

Instead of seeing my data as a table however, I might want to display the data using the Funnel visualization and sort by the count of ID as shown in the following figure. Changing the sort can be accomplished by clicking the ellipsis in the upper right corner and selecting Sort By to change the sort field.

In a similar fashion, I can click in a blank area of the desktop and create a column graph visualization for the responses to the question: “This high school has prepared me for my career choices.” The chart shows the degree to which the respondents agreed or disagreed with that statement. Note that in this case neither sorting by the count of the IDs or by the text of the answers provides a satisfactory ordering of the data. I’ll return to this at a later date.

Clicking in another blank area, I selected the count of ID along with the date the record was created or when the survey was taken. As you can see in the following figure, the data appears to be a little odd in that there is no date with more than 4 surveys. I did not believe this because with over 9500 records spread out over less than 150 days, that low count per day just does not make mathematical sense. Remember last time when I said that Power BI can create good charts and bad charts. The user must know their data to determine when the data is not being displayed correctly. In this case, the reason for the unusual result is that the create date is not just a date, but a date and time. Therefore, most surveys, even when taken by over 9500 students, do not occur within the same minute although statistically, some will.

So how can I fix this? I need to return to the data table and make a few changes. I really don’t want to lose the original data, so I create a duplicate of the Create column that I can modify.

In my new column, I can change the column name to: Day Created and then by right clicking on the header, select Change Type and then Date to change the field to a simple date with no time associated with it.

When I go back and change my chart to display the count of IDs by Day Created rather than the Created column which included time, I get a more realistic chart as shown below.

If I were to zoom out, you can see that I have all four of these visualization on the desktop at one time. In fact, other than the amount of screen space, there is no limit to the number of visualizations you put on a single screen.

Now let’s try something interesting. Suppose I select one of the days in the last chart that I create, specifically, Friday, January 23, 2015. I can see that the popup information box tells me that were 697 surveys taken on that day.

But more interestingly, you will notice that the rest of the visualization on the page automatically changed to emphasize the counts representing that one day with a darker shading and for the table, the numbers in the first column represent the count for each answer for only that one day.

That’s quite a bit for this week. Next time I take a further look into working with Power BI and SharePoint Survey data.

C’ya then.

The Best Year for Television Was 2010 says Power BI and imDB.

So last week I started by showing you how to get Power BI Desktop downloaded and installed on your computer. Hope you were able to do that because I’m going to focus on it for the next couple of months.

In addition, I took you through the steps of how to reference data from the movie and television web site: www.imdb.com. I specifically suggested that you download the top 250 television shows by going to a specific page URL that I included in the blog. (Go back to last week’s blog to see it if you are just joining us now. Really! I’ll wait right here.) I specifically left you in the Navigator dialog which showed a preview of the first 20+ records in the table. Now I must say that there is really nothing special about this page or the data. Any data displayed within a table structure on a web page can be referenced from any site. While I am not going to show it this week, this includes referencing data from a SharePoint list displayed as a web part on a page within SharePoint. (You were wondering if I would tie this all back to SharePoint. Admit it.)

After clicking the load button on the Navigator, Power BI begins to load the data into its data model. While this step acts much like the data load step in PowerPivot when getting data from an external source, there is at least one major difference. Power BI does not currently have a preview capability like PowerPivot to select columns and filter rows. While I will show how to do some editing of the loaded data in a moment, the thing that bothers me about this is that it inflates the size of the data model during load. When I talk about pulling data from a SharePoint list in a future week, remember that I can customize the list view to show only the data that I need to load into Power BI.

So here is the view of the data table loaded from the top 250 television shows.

Notice that the table has some columns that I do not need and it has a column which concatenates three different pieces of information: Rank, Show Title, and Year. I need to edit the table and I can do that by clicking on the Edit Queries button in the Home ribbon.

Now the first thing I want to do is to get rid of columns that I don’t want. In this case, the first column appears empty. The Your Rating column and the column with a heading value of ‘2‘ are not needed either. My first thought might be to select each column and then select Remove Columns from the Home ribbon as shown the prior image here. But I can also select the columns I want to keep and then select the Remove Other Columns option shown in the following figure. Note, I can press and hold the CTRL key while I click on each column header I want to select or I can click on the first column header I want and then while pressing the SHIFT key, click the last column I want and all columns between these two columns will also be selected. Which method I use to select the columns I want or don’t want is up to the individual situation as one method may be easier than the other.

I now have two columns, the Rank & Title column and the IMDb Rating column. Next, select the Rank & Title column and click Split Column in the Transform group of the Home ribbon. I want to split the column into multiple columns. In some cases, I may have a data structure which lends itself to be split at a specific number of columns (characters). But that is not the case here. I want to split the data at the first space character in the field which may be the third, fourth or fifth character from the left. Therefore. I select the By Delimited option.

In the dialog that appears, I can choose from one of 6 common delimiters. If the data includes these delimiters within their values, I may need to create a data set that uses a different, custom delimiter. The interesting feature of the custom delimiter is that it is not limited to a single character. However, in this case, I can use the Space character delimiter. Using the Custom delimiter (period-space) might also be a good choice in this case so that Rank can be displayed as an integer without a decimal point.

After selecting the delimiter, I can choose whether to apply this rule at the left-most occurrence of the delimiter, the right-most occurrence or at every occurrence. In this case, I only want to apply it to the first occurrence from the left.

You should also note that when Power BI Query Editor splits the column it creates one column from all the characters to the left of the delimiter and a second column with all the characters to the right of the delimiter. The delimiter itself is thrown away.

After applying the command to split the column, I may want to rename the new column. I can do this by right clicking the column header and selecting Rename from the drop-down menu.

In a similar fashion, I can split the year of the TV show into its own column. (I’ll leave you to figure out the steps you need to do this based on my description above.) After renaming the column with the year and renaming the Rank & Title column, my table now looks like the following.

Hint: Use the Split Column function to remove the right parenthesis and then deleted the resulting created column.

I am now ready to start using my data to perform some analysis. To do this, I click on the Close and Apply button on the far left side of the Home ribbon.

This returns me to the working desktop as shown in the following figure. Note the three icons along the left side. The top icon opens the visualization desktop where I can display different charts and tables of my data. The second icon shows me my data tables. If I have more than one table in my data model, I can switch between the tables using the list of tables on the right side of the screen. I will not need to do that here since I only have a single table. Finally the last icon lets me see and work with the relationships between multiple tables. Again, for this week, I do not have to worry about relationships between tables because I only have a single table. I will cover working with multiple tables at a future date.

Switching to the visualization page, I can see the fields in my table along the right side of the screen. To begin a visualization, I can simply click on a table field and drag it into any blank area on the desktop. For example, supple I drag over the fields ID and TV Show and then change the visualization to a vertical column chart as shown below.

While Power BI did create a chart, the chart does not make much sense. I can see the names of the TV Shows across the bottom of the chart, but plotting the value of the show ID field as the vertical value of each column does not make any sense. Maybe if I had the total number of people who watched the show, it would be a better chart.

However, my point in making this chart is to emphasize the need to know and understand your data before you just go off and create visualizations. While Power BI can make analyzing your data easier, it can also make creating meaningless charts easier as well.

So what can I do? Suppose I plot year across the horizontal axis instead of the TV Show name. Now Power BI plots the vertical column as the sum of the show ID values. Again, not very useful. However, if I open the dropdown for the ID field in the second column on the right side of the screen under Value, I can change the default action from Sum to Count.

Now I have something useful, the number of top TV shows by year as shown below.

You can see that 2010 had the most top TV shows. I wonder what they are? Next, I drag the TV Show field by itself to an empty area of the desktop and choose the Table visualization. (Just hover over the visualization icons and you will see their names.) But I don’t want to see all the TV shows, just those in the best year for TV, 2010. So I add the field Year to the Visual Level Filters in the second column on the right. Clicking on the down arrow to open the properties, I unselect (All) and select 2010.

Now my list only displays TV shows in 2010.

Finally, I want to see the rating of each show so I can click on the checkbox for Ratings or I can click on Ratings and drag it over to my table. To see the TV Shows from the highest rating to the lowest, I can click on the Ratings header. Repeated clicking on the header changes the sort from ascending to descending. Once I sort the shows by descending order, my list of top TV shows from 2010 looks like the following:

My point in doing this exercise was not really to get a list of the top TV shows for the year with the most top TV shows (although that is what I did), my point was to show you how easy it is to grab data from a web page that you did not create and perform analysis on the data found there. That is HUGE.

In future blogs, I will cover additional features of Power BI.

Hope you found this blog interesting and want to learn more about Power BI. Be sure to come back regularly.

C’ya next time.

Power BI Desktop – Your First Hour

For those of you who have been reading my blog for a while, you know my interest in PowerPivot, Power Chart, and Power View go back several years. (If you don’t, you have a lot of old blog entries to catch up on.) Anyway, I’ve been spending some time recently looking for what the next big thing is for data analysis. Several others have indicated that Power BI might fill that requirement but I spent some time looking around first before agreeing and picking up Power BI to see what it could do. I’m still getting acquainted with it, but I decided that I like it enough to take you along on my journey. So first I need to help you catch up a little.

The first thing you will need to do is to get a copy of Power BI Desktop installed on your local computer. Note that I said local computer, not your server. In fact, I’ve been running it quite successfully on a Surface Pro 3 with Windows 10 and SQL Server 2014. If you do not have SQL Server on your desktop but have it on a server that you can access, that will work just fine. For today though, you will not even have to worry about SQL at all.

To download and install a copy of Power BI Desktop, go to https://powerbi.microsoft.com/desktop.

You can also go directly to the download page at: http://www.microsoft.com/en-us/download/details.aspx?id=45331. You will see the system requirements on this page and you may notice that Windows 10 and SQL Server 2014 are not listed. That was all the more reason to try it and I can tell you that it appears to work just fine.

However you get to the download page, start the download and then install the application on your desktop.

Download and install the desktop. After the desktop is installed, it should start automatically. If it does not, go to your Start menu an entry that says Power BI Desktop. On Windows 10, this appears in a separate section along the left side under Recently Added. It may also create an icon on your desktop and even possible get added to your taskbar across the bottom of the screen in Windows 8 through 10. In some way, start it and you should see the following dialog at appears on top of the desktop application which fills in the background:

Along the left side are options to get data. We will see and explore some of the many ways you can get data from different sources over the coming weeks. You can also return to recent sources that you used. Next is a menu item that says, ‘Open Other Reports’. Desktops with their various charts, tables, and other visualization are referred to as Reports and are stored with a .pbix extension. If you had previous reports that you worked on and saved, they would also appear in this section allowing you to click on them to directly open them.

There are also several videos and tutorials listed in this dialog to help you get started. I strongly recommend that you watch these videos and perhaps read some of the tutorials. In fact, there are many more videos and tutorials for Power BI Desktop. Another good page to help you get started is the Microsoft Power BI Support page at: https://support.powerbi.com/knowledgebase/articles/471664.

Microsoft provides several sample databases and in future weeks, I will probably use the ContosoDW databases that I previously used for many of the PowerPivot blogs. But for today, let’s just try something to amaze your co-workers.

Almost everyone knows the site www.imDB.com. Using the new Edge browser, I went to the site and discovered that there are some new pages available. Most people use it to look for information about movies and the actors in them. The site also has information on television shows. In fact, it has two new pages of interest. One of the new additions is a page that displays the top 250 movies and another page shows the top 250 TV shows. Just because everyone does movies, I went to the top 250 TV shows page just to find out which year had the most highly rated shows. The following image shows a little of that page.

But what I’m really interested in is the URL. I can select the page URL from the Address box at the top of my browser. It should look something like the following:

Next I go back to my Power BI Desktop and click the Get Data icon in the External Data group of the home ribbon. Like many buttons, this one has a top portion and a lower portion. The bottom portion will open a dropdown of common data sources as shown below:

However, I can also click on the top portion of the button to display the following dialog box. This box shows categories of external data sources on the left and the names of specific data source types on the right. Since I want to get data from a web page, I need to specify a Web source. I could check each category just to see what was available and eventually find Web in the Other grouping as shown in the following figure or I could have selected Web from the previous figure. Either way, takes me to the same place. (But it is interesting to see all the possible data sources, isn’t it?)

After selecting Web, I am prompted to enter the URL of the web page that holds my data. This is where I past the URL that I captured previously when I displayed the page of top 250 TV shows.

When I click OK, Power BI analyzes the page and displays objects that might be something I’m interested in.

Of course the names like Table 0 and Tab1 have little meaning. However by selecting any of the objects, Power BI Desktop displays a preview of the contents of the object on the right side. This is how I ‘discovered’ that Table 0 contains the data representing the top 250 TV shows as shown below.

When I click load, Power BI Desktop loads the data into its local data model. In many cases, that model may contain columns that we do not need. It may also contain columns of concatenated data such as in this case in which the show ranking, show name, and show year appear concatenated together. We will need to fix this. Ultimately, I want to get to a table that looks something like this:

However, that will be next time. Have a great Labor Day weekend and get Power BI Desktop installed on your machines because next time we are going to start manipulating the data and creating visualizations.