The last two times, I introduced some of the basics of using Power BI with web site data from a third party. This time, I am going to show you how you can retrieve data from SharePoint and analyze it with Power BI.
To get started, I am going to open a new Power BI desktop instance and select More from the Get Data button to define a new data source, a data source within my SharePoint farm.
When the Get Data dialog appears, I can scroll down through the available data sources until I encounter SharePoint List.
When I want to pull data from a SharePoint list, I need to enter the URL of the SharePoint site, not the URL for the list itself as shown in the following figure.
I can navigate to my SharePoint site and capture the URL from the address bar in the top of the browser.
Note however that I do not need the entire URL, I only need the portion of the URL that defines the site. Therefore, I can remove the /Pages/default.aspx which references the home page of the site.
When I click OK, Power BI will go out the SharePoint site and look for the lists defined in the site. But before I get to that, I want to make a point that Power BI did not ask me for my credentials when I went to this collaboration site. The reason is that particular farm uses Kerberos authentication so my Windows authentication was passed through to SharePoint which allows me to see the contents of the site. If I did not have rights to the site, Power BI would not authenticate me or provide a list of the lists. A little later, I show you what you may see if you don’t use Kerberos to pass through your Windows credentials.
The next thing I noticed when I saw the Power BI Navigator showed me the names of the available lists is that some of the items were not lists, at least not strictly speaking. When I looked at the names, I noticed that some of them were from the Discussion Board area and some were from the Surveys area. Interestingly, the Navigator does not treat libraries as lists. Therefore, if I use a form tool like InfoPath which writes the metadata back to the form library, I do not think at this time that I can access that data from Power BI.
On the other hand, Power BI could provide an interesting way to access and analyze survey data. I just happen to have a survey result list available to take a look at, so I begin by entering the URL of the survey site.
This instance of SharePoint is not currently setup to work with Kerberos. Therefore, I get the Access SharePoint dialog shown below by clicking Windows authentication in the left side menu. In this case, I need to switch from using my current credentials to using alternate credentials and then supplying the appropriate username and password. I still want the full address to the survey site as before. Then I can click Connect.
Now Power BI can successfully interrogate my site to look for lists and display them in the Navigator as shown below. For this demo, I only need the single list Senior Exit Survey. (Note that all blanks are removed from the list names.) In a future blog, I will demo how to work with multiple lists/tables from a single source, so be patient.
After selecting a list, the Navigator displays a preview of the data in the right side panel. Note that the column names, like the list names, have had all blanks removed. Actually Navigator removes other non-alpha characters as well. In fact, column names cannot even begin with a number. If you have column names that begin with a number, Power BI will automatically precede the names with “c_”.
Another thing I noticed is that the preview only shows the first several lines of data and then displays the text: ‘The data in the preview has been truncated due to size limits.’ Don’t worry about this. The data is all there.
After I click the Load button, Power BI starts to load the data into its data model. Depending on the number of column and rows, this process can take from a few seconds to a minute or more. However, eventually, the load completes and I can see the table by clicking on the Data icon along the left side of the screen and then clicking on the table I want to view (if I had more than one table).
Note that I could click on the Edit Queries button at this point to clean up some of the data getting rid of columns that I will not need, but I’ll skip that step for the sake of keeping this blog a little shorter.
Next, I click the Report icon in the left margin of the desktop. Here I can see the columns in my table. If there were multiple tables, I could collapse and expand the column list for any table by clicking the arrow to the left of the table name.
Let’s say that I want to plot the number of each response to the question: ‘I clearly understood the requirements for graduation.’ I can click in the check box to the left of that column to identify the question and then click the check box to the left of the ID value which I can use to count the responses.
Note that by default, Power BI may try to sum the ID value. Remember from last week I should you how to change the aggregation of a value from SUM to COUNT.
I’m also going to initially default the visualization of this data to table. Therefore, my two right columns for Visualizations and Fields should look like the following figure.
My table should look like the following figure. If the columns are not in the correct order, you can simply click and drag the column names in the Values section of Visualizations to get the order you want.
Instead of seeing my data as a table however, I might want to display the data using the Funnel visualization and sort by the count of ID as shown in the following figure. Changing the sort can be accomplished by clicking the ellipsis in the upper right corner and selecting Sort By to change the sort field.
In a similar fashion, I can click in a blank area of the desktop and create a column graph visualization for the responses to the question: “This high school has prepared me for my career choices.” The chart shows the degree to which the respondents agreed or disagreed with that statement. Note that in this case neither sorting by the count of the IDs or by the text of the answers provides a satisfactory ordering of the data. I’ll return to this at a later date.
Clicking in another blank area, I selected the count of ID along with the date the record was created or when the survey was taken. As you can see in the following figure, the data appears to be a little odd in that there is no date with more than 4 surveys. I did not believe this because with over 9500 records spread out over less than 150 days, that low count per day just does not make mathematical sense. Remember last time when I said that Power BI can create good charts and bad charts. The user must know their data to determine when the data is not being displayed correctly. In this case, the reason for the unusual result is that the create date is not just a date, but a date and time. Therefore, most surveys, even when taken by over 9500 students, do not occur within the same minute although statistically, some will.
So how can I fix this? I need to return to the data table and make a few changes. I really don’t want to lose the original data, so I create a duplicate of the Create column that I can modify.
In my new column, I can change the column name to: Day Created and then by right clicking on the header, select Change Type and then Date to change the field to a simple date with no time associated with it.
When I go back and change my chart to display the count of IDs by Day Created rather than the Created column which included time, I get a more realistic chart as shown below.
If I were to zoom out, you can see that I have all four of these visualization on the desktop at one time. In fact, other than the amount of screen space, there is no limit to the number of visualizations you put on a single screen.
Now let’s try something interesting. Suppose I select one of the days in the last chart that I create, specifically, Friday, January 23, 2015. I can see that the popup information box tells me that were 697 surveys taken on that day.
But more interestingly, you will notice that the rest of the visualization on the page automatically changed to emphasize the counts representing that one day with a darker shading and for the table, the numbers in the first column represent the count for each answer for only that one day.
That’s quite a bit for this week. Next time I take a further look into working with Power BI and SharePoint Survey data.