Know Your Audience

Know Your Audience

I found this post in an old folder that I never posted two years ago. But with a few changes, it is just as relevant today.

In my prior posts, I’ve focused on the technical issues of how to gather data into a Power BI Data Model, how to transform that data using calculated dimensions and measures, and how to display the data using the reporting and charting objects built into Power BI Desktop and even a few that are provided by third part tools. However, I need to discuss another important aspect of data analysis. That is determining the best way to present your analysis to different audience types.

The easiest way to classify audience types is by management levels. At the top level are the C-level executives. You might know them better as the CEO, CIO, CFO, COO, and probably several other TLAs (three letter acronyms) that represent the people that set the overall goals or direction of what your organization will be doing in the future. People at this level rely on well-designed charts to guide their decisions. This is not because they cannot interpret tables and matrices. Rather it is because they need to quickly grasp what the data is telling them and how it affects their decisions. This is best accomplished with charts often in the form of dashboards.

When a data analyst is asked to create a dashboard for their organization’s leaders, they may feel that they must somehow fit every piece of information into a single dashboard. But that would be the wrong approach. The correct approach is to first determine what decisions leadership is trying to make and then determine what specific information they need to make those decisions. Each dashboard should then be created to address the information needed to make a single decision. That decision might be to expand the organization in one part of the country while shutting down operations in another less profitable region. The decision might be to add new product/service lines or to decrease the number of similar offerings within a category that are competing against each other rather than against the competition. Or perhaps they need to evaluate difference source of raw materials or to determine where the best location for the new facility should be or maybe what type of advertising works best for different market segments. In any case, each dashboard must focus on a single issue and completely address that issue within the confines of that single dashboard if at all possible.

When it comes to Power BI, the dashboard capabilities exist to create compelling presentations that can be included in PowerPoint slides or displayed directly. The worse thing you can give managers at this level is a boring presentation that included tables and tables of numbers whose relevance to the decision is hard or even impossible to visualize. Each item on a dashboard may have the ability to drill down to a deeper level of detail, but rarely if ever will they need to drill down to the original data source level. It is always limited to summarized data.

Beneath this top level of management is typically a middle layer of management and staff who develop the strategy on how to achieve the goals and implement the decisions of top management. This level also benefits from similar dashboards with data that applies only to their area of concern. Interactivity in these dashboards to drill down and filter data is more important than that needed by top management but still should not rely heavily on table and matrix reports except as a backup to verify the analysis or to verify or explain outlier data. I would suggest that if management at this level regularly insists on getting full table and matrix reports of the data, they may not trust the summarized data prepared for them, or perhaps they just don’t know what the right questions are quite yet and still need to explore different data relationships. Some might refer to this activity as ‘playing with the data’ to determine which strategies make the most sense to achieve the overall organization goals. Keep in mind that once these strategies are finalized, the information defining what dimensions and measures are most important will be used to create the dashboard for top management to guide their decisions. The analysts at this level may be called the strategic data analysts because their focus is in determine what data is needed to support and define the organizations strategy.

Finally, there are what I call the functional data analysts. While they may work in part from a set of requests for data needed by the strategic data analysts, they also spend considerable time gathering as much data as possible and building different data models to test different assumptions. Often these assumptions are based on their general knowledge of the business/service, but sometimes it is just a ‘game’ of playing one set of dimensions against another set of dimensions to see if there is any relationship that best predicts the observed measures. A tool like Power BI makes it relatively easy to test different assumptions to see if a pattern emerges that might predict the measures. Sometimes the best relationships in the data are discovered quite by accident causing the data analyst to exclaim “That’s funny!” Then they will proceed to adjust the dimensions to fine-tune that factors that best model the observed measures. This is what often occurs in the study of customer demographic effects on purchasing of different products/services in different parts of the county. For the functional data analyst, charts are still important in discovering those special relationships, but the tables and matrices that back up those charts take on increasing importance to verify that the observed effect was not an accident.

Ultimately, the source data needed at each of these levels may actually be the same. However, to include all the detailed data in the data models provided to top level managers of the organization may result in analysis that is slow to update. Top management does need as much flexibility in filtering or drilling down through the data using different dimensions. The strategic managers and data analysts have already determined what dimensions they want top management to focus on and can provide dashboards based on summary data for those dimensions rather than detailed data.

Similarly, strategic managers and data analysts do not need all of the data collected by their functional peers. In fact, their function managers and data analysts should have already weeded out dimensions that do not have a direct impact on the observed measures. They have reduced the data model by removing non-essential columns such as a store’s phone number and perhaps even summarizing the data slightly perhaps by including only daily sales rather than each details on each individual sale.

Think about prices on the stock market for example. Looking at the second by second price fluctuations of a stock can be distracting to the decision on whether to buy or sell a particular stock. In fact, looking at the price trend first by hour, then by day, then by month, and finally by year can give you different decisions at each point along the way. If you are a day trader, you probably need to watch the fluctuations of a stock at a finer interval. But if you are long term investor, the decision on whether to buy a stock or sell it depends on its longer-term trend. Of course, whether you are buying or selling, once you have made that decision, you probably would best be served to watch daily fluctuations or even minute by minute changes to time the exact moment to initiate your trade order.

Getting back to a business scenario, we see that there are several trends in the style of data analysis that you may need to consider. Data analysis performed for higher management levels require that:

  • the data model must eliminate all irrelevant dimensions.
  • the data model must be increasingly summarized.
  • dashboards and individual charts should have less interactivity focusing only on relevant dimensions that affect the decision.
  • the presentation layer must become increasingly graphical with data presented at a high but focused level to drive home the point they need to make at a glance.
  • the need for tables and matrices decreases and can be distracting like the noise of second by second stock price fluctuations.
  • the presentation should focus on the one or two points it is trying to make without introducing side issues and should fit on a single screen/slide with no scrolling horizontally or vertically.
  • at the highest level, the presentation must stand on its own without support from the data analyst to explain it.

It should be obviously by now that all of these goals cannot be met with a single Data Model much less a single set of reports and dashboards. You must customize the Data Model and the presentation of the data for each audience level. Of course, one way to do this is to base each higher-level Data Model on the data model at a lower level. This method provides continuity. It also provides a single upgrade path as more recent data flows from the lowest and most detailed levels up to the highest and most summarized data levels.

Now one last point. This cannot be easily accomplished by using just the Power BI Desktop version. At the lowest analysis level, Power BI Desktop may be the best way to build the initial data models and ‘play’ with the possible relationships of different dimensions and measures. Depending on the amount of detailed data, this may even require the data to be stored in Analysis Services either locally or within Azure. Keep in mind that calculated columns and measures will update faster in Analysis Services than if stored in local data. But even then, some data analysts prefer to use a data sample in Power BI Desktop to get a ‘feel’ for the data and to explore different visualizations of the data. Only then will they build the final model in Analysis Services with Power BI or Power BI Premium.

While Azure with Power BI Premium might be where your organization ultimately needs to go, don’t be afraid to start small by using Power BI Desktop. Gain acceptance and recognition for what data analysis can provide at each audience level. Build confidence and complexity over time. But get started and do it. It is better to succeed through a series of smaller steps than to fail attempting to immediately implement your grand vision or even worse, to never start at all.

Advertisements

What Use Are OneNote Tags?

What Use Are OneNote Tags?

To those of you who have wondered why there have been no new posts to this blog for many weeks, rest assured that I will be trying to get back on track again. It has been an eventful two years with many job changes that I will not go into here. However, from a technical point of view, I have also used that time to learn more about some of the features of O365 as well as Power BI, particularly related to ADA compliance issues and just overall better design methods for reports and dashboards. I’ll get to all of that eventually. However, first I wanted to tell you about a hidden gem in OneNote. Okay, maybe not hidden so much as underutilized, at least by most people. That hidden gem is tags within OneNote.

Although I have been using OneNote for several years, I have to admit that initially I did not think of tags as more than just decoration inside my OneNote pages. I’ve used the To Do square tag when making lists of things I needed to do. The cool thing about them was that when I first added them to an item in a list, the square would first appear as unchecked and by simply clicking on the square, I could toggle between unchecked, checked, and back again. I also would star important items in a page of notes to indicate the most important points so that I could find them easily enough 6 months (or even 6 weeks) later. The same thought applied to the Question Mark tag which I would use to mark thoughts where I needed to conduct more research or contact others about. I might have used a few other tags over the years, but I really did not think too much about them other than as a visual marker of where certain types of information appeared on a page.

I hadn’t even noticed that you could define your own custom tags, perhaps because I rarely scrolled all of the way to the bottom of the list of available tags. However, last year I discovered this feature. It was the first step on my path of using tags more productively. Let me first show you how easy it is to create a custom tag.

You begin by opening the tag list by clicking on the bottom box on the right side of the tag list which is commonly referred to as the More button. (The Tags group can be found in the Home ribbon of OneNote if you are having trouble finding it.)

Scrolling to the bottom of the list is the option: Custom Tags… Clicking on this option displays the Customize Tags dialog box shown below.

To create a new tag, click on the New Tag button as shown above to display the New Tag dialog.

The first thing you must supply is the Display name or the Tag name if you prefer. This name should be unique amongst the existing list of available tags. This is important because the symbols, font and highlight colors do not have to be unique among all the define tags.

After specifying a name, symbol, font color, and highlight color, click the OK bottom at the bottom the dialog. This action adds your new tag to the top of the list. It also gives it the keyboard shortcut: Ctrl_1. You can move the tags in the list to reflect the tags you use most to appear near the top of the list. You could also order the tags alphabetically, but that is not as practical. Why? Because the further down the list you have to look, the harder they are to find. Secondly, the first 9 tags are given shortcut keys (Ctrl+1 to Ctrl+9) beginning at the top of the list. Note however that the shortcut keys are not really a function of the tag, but of the tag’s position in the list. You can click on a tag in the Customize Tags dialog and then click on the up or down arrow keys to the right of the list to change the tag order. Notice that as you move a tag up or down, the shortcut key for that tag (as well as the one it jumped over) changes to keep the order of the shortcuts fixed from the top of the list.

You can see in the image below that the new tag I created with the name Link which began with the shortcut Ctrl+1 when it was first added was changed to Ctrl+2 when I moved it down to the second position in the list.

When you close the Customize Tags dialog by clicking the OK button, the tag order and the shortcut keys are set as shown in the Customize Tags dialog.

Now to use the tags, you can select the text in the notepad page and select the tag you want from the Tags list, or if the tag is one of the first nine tags from the top, you can use its shortcut key combination such as (Ctrl+2) for the Link tag. You don’t even have to select the text. If you select a tag, it will appear at the beginning of the current line for a single line note or at the beginning of the paragraph for a multi-line note. Even if the information you want to tag is in the middle of a line, the tag appears at the beginning of the line, not just to the left of the selected text. As an example of that, look at the image below in which I selected the site URL and then pressed Ctrl+2 from the keyboard to add the link tag.

Any line or paragraph can be tagged with any one or more of the available tags. If you have more than a single tag, they all appear to the left side of the text. After tagging items on your page, you should easily be able to spot the important information in any page your OneNote notebook.

For a long time, that is all I thought you could do with tags. But that does not showcase the real power of tags. To be fair, I guess the real power of tags does not become important until you have dozens of pages and subpages within multiple tabs within one or more notebooks, much like the notebooks in your office. Let’s say you have 50 to 100 pages of notes within a notebook perhaps representing different projects, different meetings, different notes, etc. How would be begin to find all of your To Do items or all of the links, or all of the phone numbers you have stored? One way of course would be to perform a search on a unique word, portion of a word or even a word phrase. In fact, Search allows you to find a unique string of characters across multiple pages in multiple tabs across multiple notebooks that you have open.

That is pretty powerful, but you must know the unique string you want to search for and there is no way to limit the search to perhaps just the pages within the current tab or to perform the search in such a way as to find all of the links or phone numbers at once. Furthermore, what if you do not remember a unique string or what if the string might appear in multiple places in different contexts? Or what if you wanted to find all the links in the current notebook? In these cases, Search may not be the right tool.

To the right of the tag dropdown list on the Home ribbon is an option called: Find Tags. When you click this option, a panel appears along the right side of the screen titled: Tags Summary which lists all of your tags grouped by tag name as a default (see why tag name is important?). An example of this panel is shown in the next image. To jump to the page for any of these tags, just click on the label associated with each tag in the Tags Summary panel. However, if you have used a large number of tags to tag a large number of items on multiple page and tabs within your notebook or notebooks, this list can be rather large (don’t worry, OneNote will automatically create a scrolling list) and you may want to click on the option at the bottom of this panel to create a summary page which adds another page to the notebook listing all the tags used within the notebook. Unfortunately, you cannot click on the labels associated with the tags here to go to the page where the tag was defined like you can when you click within the Tags Summary panel. But all is not lost. If you hover of any of the tags in this summary list, you will see a small OneNote icon to the left of the entry. Simply slide your cursor over and click on this icon to go to the page referenced page.

Now if you notebook is as active as mine is, the Summary Page of tags will soon become obsolete as you add more content and tags it. You might have noticed the other button at the bottom of Tags Summary panel that says: Refresh Results. You could click on this button. However, beware that if you created a Summary Page, the tags found here will be repeated resulting in many repeated tag references. Not good! If you want to refresh results, you need to first delete the existing Summary Page, click Refresh Results, and then create a new Summary Page.

Another way you can limit the size of the Tags Summary details is to use the Search option found just above the Refresh Results button. This option lets you select the scope of the summary from a page group up through all the notebooks you have open (The default is the current notebook). You can also specify which tags to display based on their age from today’s tags to yesterday’s tags, this week’s tags, last week’s tags, or older tags.

Before I end this blog post, let me give you one last hint on using tags. Suppose you have a notebook with hundreds of individual subpages, pages, and tabs. How can you find the page you want if you know the name of the page but not under which tab it can be found? Why not use a tag to tag the page names themselves. Then when you create a, the section of the summary for that special tag that you use only for page names will create a Table of Tag Summary Contents list with the pages listed alphabetically. Just find the page you want and click on the page name to go to that page.

And if you like generating a Table of Contents of your pages within OneNote, think about creating a Table of References, a Table of Definitions, a Contact List, etc. There is no limit to the imaginative ways you can use tags to organize your OneNotes.

Calculating Percentages in PowerBI

Last time I showed how you can pivot a multivalued column from a survey downloaded from SharePoint to Excel. After some transformations, I created the following report that shows the number of career responses in each of the 18 categories. While interesting in of itself, it would be more interesting to see how the career choices break down by different dimensions such as the school or grade level, the gender of the student, or perhaps even by individual school.

In my report page, I am going to reserve an area along the left side of the page to list several of the dimensions I’m interest in and to display the total number of students that took the survey along with the number of students in the filtered subset when I filter by one of the dimensions. Basically that means building something like the following:

This image actually consist of four separate visualizations. The top visualization for Total Students uses the Card visualization and initially displays the count of the ID column in the survey table that has one record for each survey. I duplicate this visualization and drag it below the first one. Then for the top visualization, I open the Format option panel in the visualization section and make the following changes:

  • I turn off the Category Label
  • I turn on the Title and then open the Title section to add the Title Text: Total Students
  • I change the font size and color to make this information stand out.

I then repeat the process for the second visualization but with the Title Text: Sampled Students because this count will represent the number of students included in the visualizations taking into account any filters applied to the page.

I then add two single column tables, one for gender and one for grade level. Because I want to use the values in these table as page filters, I change the visualization of these two tables to the Slicer.

Along with the table I created in my last blog, I can work with the slicer values to explore how the career choices change based on gender and grade level by clicking on the check boxes. When I do this, I see that the chart automatically adjusts the columns based on the changes I make to the slicers. Also the number of sampled students in the second card visualization displays the count of students included after applying the slicer. Unfortunately, the total student count also changes. This I do not want. I want the total number of students to always represent all the students in the entire survey.

I can achieve this goal with a little DAX and a custom measure back in the data page for the survey table. The custom measure needs to count the IDs for all the records in the table ignoring any filters applied to the report page. I can do this by passing the survey table name to the ALL() function. This function ignores all other page filters. Then I use the COUNTAX() function which defines the data source as the output from the ALL() function and then performs a count the number of IDs. While this may sound complicated, it is as simple as the following equation:

Notice that I name the measure Students. I must provide a unique measure name for each measure I create. However, I can then use that measure in any visualization such as the card visualization for the total students in the survey.

Back on my report page, I select the top card visualization (for Total Students) and change the field used from ID to my new measure, Students.

Now if I select any of the values in the slicer visualizations, my sampled students card displays the number students included in the filter while the total students card displays the total number of student surveys taken as shown below.

I then add on the right some additional column visualizations to display other data fields such as which subject the student finds most interesting in school or charts that display career choices by gender, by grade level or by other criteria from the survey. Each of these charts begins with a simple table visualization in which I add the columns I want to use. I then convert the visualization to a column chart.

In the image below, you see the final result of the first page of my report. Notice that I also added a vertical line shape to separate the two card visualizations and the slicers from the other column charts.

Since each student was allowed to select one or more careers from the list of possible careers, the total number of career choices is significantly larger than the number of students. In the above figure, the count of career interests, if I were to add the values in each of the columns, would total the number of career selections which is over 17,000, not the number of students. Therefore, I might decide to display the same charts as a percent of all the students rather than a count of all career selections.

Again I need another custom measure to calculate the percent based on total students. Fortunately, I already have a measure that calculates the total student count ignoring any slicer selection or page filter. Therefore, I can generate a percentage using a formula similar to the following:

With this formula, I count the number of surveys filtered by the visualizations and slicers on the page divided by the total number of students who took the survey. The maximum value of this percentage would be 100% if all the students who took the survey selected the same career, such as computers, as one of their choices. Similarly, the minimum value would be 0% if no student selected a specific career as one of their choices. Because students could select more than one career of interest, the sum of the percentages of each of the columns will not add up to 100%, but some value greater than 100%. (Lesson learned: Next time ask for their preferred career choice, then their second career choice and finally their third career choice.)

Next I take the first page of my report and duplicate it by right clicking on the page tab and selecting the option to duplicate the page. (This is a lot faster than recreating the same visualizations on a new page, isn’t it?) On the duplicated page, I modify each of the column charts to display the Percent measure just created rather than the count of ID.

Why does this work? Well, because each student can only select any specific career one time even though they may select two, three, or more careers, I can simply count the number of filtered students in each column by the total number of students in the survey. Each chart already divides the students by career choice, one career for each column. Then additional filters from the slicers may limit the gender or grade level or both. Therefore, I can count students that match all those filter criteria and divide by the total student count to get a percentage of students who have an interest in that career. In fact, this measure also correctly calculates the percent of students interested in each school subject (of which they can only select one subject each) which I can verify by summing these percentages from the chart in the upper left and getting a total of 100%.

The following figure shows the percent page of my report.

I can then proceed to add other visualizations if I want on additional pages. But I’ll suspend this example at this point for now.

C’ya next time for more exciting ways to use Power BI.

The Need To UnPivot Data

Today I am going to take a look at some interesting issues I encountered when pulling data from a survey into Power BI to perform some analysis. First some background. The survey existed on our SharePoint site and all submissions were stored within SharePoint. However, as a list, it is always possible to download the details from the survey to an Excel file. When I open the resulting Excel file, I see the following data.

Most of the columns are straight forward text columns. But, looking closely at the rightmost column I see that the data structure is a bit more complex than usual. This is the result of having a survey question in which the respondent is allowed to select one or more items from the list of possible careers that they have an interest in. While some may only respond with a single career that they are focusing on, others may not be so sure yet and so they have selected multiple careers in which they may have an interest.

You can see that the format of the data consists of responses along with the corresponding ID values of the selected responses separated by semi-colons.

Obviously, I cannot perform analysis on such a column with multiple values. I need to get this data into a single vertical column with one row for each career choice selected by the respondent. The first step to achieve this result is to split the individual career values into separate columns. I can do this in Excel by selecting the column with the multiple values and then selecting the button: Text to Columns in the Data ribbon. This function allows me to split the text in each cell of the column into multiple columns each time a specific delimiter is found. In this case, I can split the original text at each semi-colon to create a new column.

After splitting the data into columns, there are some column that I no longer need and can delete. These are the columns that contain the ID values of the selected career values. I have no need for the ID values. I can also apply a series of Replace statements to clean up the rest of the career values to remove extra characters that are not part of the career name itself. After a few transformations, I’m left with the following set of spreadsheet columns which identify the respondent with their ID and then the careers in which they have an interest.

This is still not exactly what I want because I really need to normalize this structure to get multiple rows for each responder with one career choice per row. However, as I will show in a moment, I can perform that transformation in Power BI much easier than it can be done in Excel. So let me proceed to open Power BI and select Get Data.

When I open Power BI and choose Get Data, I select the Excel data type and click the Connect button. In the next screen (not shown) I will either enter the path to the Excel file or I can browse to the file using the Browse button. Once I select the Excel file I want to work with, Power BI opens the file and displays all the tables found in that file. In the figure below, you can see that there are several tables. Some of these tables consist of the original raw data and the transformation steps I used to create the datasets that I want to work with. Each table corresponds to a different tab in the Excel workbook otherwise known as an Excel Worksheet or simply Sheet.

After selecting the tables I want to work with, I click the Load button to import the data into my Power BI model. Some transformations might be needed to ‘convert’ some of the columns to user-friendly data names. I could have also done this in Excel by changing the column header text. However, the transformation I want to focus on is the career choice table which I call PreCareers. A portion of this table appears below. This table consists of the ID column used to identify the responder along with 18 columns representing their potential career choices. While most respondents only entered three or less career choices, some entered significantly more. By default, the data is sorted with all respondents who had only a single career choice displayed first. Those are followed by the respondents who selected two career choices and so on.

In order to create a table that has the ID of the responder and a separate row for each career choice they selected, I need to unpivot the 18 career choice columns. To do that I need to select the Edit Queries button in the Power BI Home ribbon and then select the PreCareers table. Next I have to select all 18 career choice columns. Then opening the Transform ribbon, I look for and select the Unpivot Columns command as shown in the following figure.

After selecting this option, Power BI performs the transformation to create a new table shown below which now has a column that has the previous column headers in a column named Attribute and a column named Values that contains the career choices. Of course I can rename these columns and probably will, but let save our work as a safety precaution before continuing.

To save my changes, I would select the Close and Apply button from the Home ribbon. Remember that I can save my transformations multiple times and return back to Edit Queries to insert additional transformations at a later time.

However when I attempt to close and apply my changes, I get the following error message.

Reading this error message I see that there is a problem with the Column ID in my transformed PreCareers table. If I open the tables in diagram view mode as shown below, the problem is evident.

You can see that the PreCareers table is linked to the K8_Survey table using a 1-to-1 relationship which if I were to look at the relationship is attempting to connect the ID column from one table to the ID column of the other table. The problem is that now that I unpivoted the selected careers in the PreCareers table, there are now multiple records with the same ID value, not just one record.

To solve this problem, I must remove the current relationship as by right clicking on the relationship and selecting Delete as shown in the following figure.

You may have also noticed a yellow bar across top of the page, shown below, that says that there are pending changes to your queries that have not been applied. With the relationship deleted, I can try to apply the changes again by click on the Apply Changes button.

As the changes apply, I see that the field names in the PreCareers table are updated as are the field names in GradeLevel table.

I can now use drag and drop to create new relationships between the tables. For example, I can click and drag the ID field from PreCareers table to the K8_Survey table as shown below. I can also connect the School field in GradeLevel with School in K8_Survey relating these two tables.

With the relationships on my restructured tables back in place, I can go to the Report page and begin to create a report table that displays the possible career choices along with a count of the number of times that career choice was selected.

With a simple change of the Visualization from a table to a column chart, I can visually show which careers were selected the most by the respondents. Note that because each respondent can select any number of careers and they can be selected in any particular order, there is no way to reliably rank the career preferences to say one career or another is truly the top career choice. The chart can only say which career choice was selected the most often of all careers the respondents had an interest.

Another important point to remember here is that the sum of the number of times each career choice was selected is NOT equal to the number of respondents because any respondent could select any number of career choices. Therefore if you want to know the percent of respondents who selected computers as a potential career, you need to know the total number of respondents, not the total number of responses.

I’ll delve more into the analysis of this data next time, so save this file.

BTW, this weekend I am at the Orlando Code Camp held at Seminole State College in Sanford, FL. If you happen to attend, please stop by to say hello. I’ll be conducting two BI related sessions, one for Power Pivot and one for Power BI.

C’ya.

Appending Data Sources Manually in PowerBI

Last week I looked at how I can load into PowerBI an entire library of source data files and let PowerBI determine how to append the resulting files together. This method works great if all the data source files are of the same time and have the same schema/structure. But what if some of the files come from an Access data source, some from CSV files, some from Excel, etc. Furthermore, what if the data column order is different or perhaps if the column headers are different. These are cases in which you may need to import each data source individually. Then edit the query for each source to align the column, column names, data types, etc. before combining the data sets together. Let’s see how that looks.

For simplicity sake, I am going to start from the same folder as last week that holds 10 CSV files each representing a different sales region.

Although I am not going to show images from each data load, I would proceed to load each CSV file into my PowerBI data model one file at a time rather than loading the folder like last time. This method creates a separate table within PowerBI for each data set as shown in the following figure.

I can view the data in each table by selecting the table name in the Fields panel as shown above. The first thing I would want to do is to ‘fix’ any inconsistencies in the data schema in these tables by selecting the table and then clicking on the Edit Queries button in the External Data group of the Data Tools Modeling ribbon. I basically want to build a common table structure across all of the data sources including a consistent column name for each column that appears in multiple tables.

You should note that if I have a data source with a unique column that does not appear in the other tables, I can keep it. Later when I append the individual tables into a single table, the unique column will be brought into the final data model, but the field will be blank for all the other tables that do not have that column.

In my first example here, all of my data schemas from each data source are the same but they would not have to be. So after simply loading the data from each data source, I am ready to start combining the data into a single table for analysis. To do this, I click the Edit Queries button in the External Data group of the Home ribbon and then select one table to start the process.

There is a button on the far right of the Home tab while in Edit Query mode in the Combine group called Append Queries. I can use this button to begin the process of combining the tables.

The Append Queries button opens the dialog shown below that lets me select which table to append. I can only append one table at a time so to append together all 10 tables, I need to do this step 9 times.

As shown in the figure below, each appended query gets the generic name of: Appended Query followed by a number (after the first one which has no number).

This default name is not descriptive enough to help me identify which appended query refers to which data set. If I want to remove one of the data sets from the final table, I would have to click the settings button on the far right of the applied step (the gear icon) to reopen the Append dialog to see which table is being appended. Then if I want to remove that dataset, I could click the ‘X’ to the left of the applied step to delete that one step.

However, a better option is to right click on the default applied step name and select Rename from the popup menu that appears.

This option allows me to select the current applied step name and replace it with a more meaningful name.

The following figure shows a much clearer picture of which data set is being referenced in each step. It also shows that I have finished appending my 9 additional data sets.

Note that any steps applied to the individual data sets are still applied first prior to the data being appended to the final dataset.

Note: One thing that I did not do here but probably should have is to begin by making a duplicate copy of the table that I wanted to begin with so that I could preserve the original table with only its own custom transforms. Then using the duplicated table, I could append the rest of the data sets.

When all the data is appended into a single dataset, I can close and apply this transformation so that any data refreshes can repeat these steps.

I can now go to my Report tab and start to build the visualizations that I want. In the final figure for this session shown below, I create a table of Sales Amount by Sales Territory and included the Sales Territory Group, Region, and Country. I then click on the Sales Amount column to sort the table by this column in ascending order. You can see very quickly that most of the sales occur in Australia and the least sales are in the Central Region of the United States.

Beneath the table, I create another table with the Sales Territory Group and the Sales Amount. PowerBI automatically sums the sales for each Territory Group and displays a chart with only three segments. After creating the table, I change the visualization to the Donut chart to create the appearance shown below.

As with all PowerBI visualizations, the data in the table is linked to the data in the donut chart. If I click on the North America segment of the donut, the table on top refreshes to display only the 6 rows representing sales in North America. Similarly, if I click on the Pacific segment of the donut chart, the table above immediately updates and displays only the one line for Australia.

That’s it for this week. Come back next week for more PowerBI fun.

C’ya.

Loading and Combining Multiple CSV Files in Power BI

Suppose my job is to collect sales information for my company and I current receive text files of that data from each of our major sales offices around the world. Before I can do any analysis using Power BI, I will need to both load the data from each sales office and then combine the data into a single file. For today, let me assume that the format of the data from each sales office is exactly the same and the order of the fields is also exactly the same. As each sales office sends a copy of their data to me, I store their CSV file in a common folder called CSV_Sales as shown below, eventually getting data from all ten locations.

I am now ready to open Power BI and load my data to begin my analysis. I start by selecting the Get Data option after opening Power BI. This displays the following dialog which lets me specify the type of data I want to load. In the past, I showed several different ways to access individual files from different sources including CSV files. Indeed, I could again load each of the CSV files separately and then ‘somehow’ combine them into a single table for analysis purposes.

However, I notice an option that I had not selected before, Folder.

When I click the folder option, Power BI prompts me for the URL of the folder. If I am not sure of the path, I can click the Browse button and navigate to the folder and Power BI will figure out the path for me. Either way, I click on the OK button to continue.

Power BI then shows me the contents of the folder. For each file in the folder it provides metadata about the file such as its name, extension, date last accessed, date last modified and more. There is also a column at the far left of the grid named Content which in all cases has the word ‘Binary’ in it. This mysterious field in each row actually represents the data in the file. In fact, it is in most cases the only field that I care about.

If I click the Load button, highlighted the previous figure, Power BI loads the folder information into a table as shown in this figure. This is not what I want.

So instead, I click the Edit button on the previous screen which loads the data directly into the Query editor. (Yes, I could just click the Edit Queries button in the Home ribbon to get to the same place, but why go through two steps when one will do.)

However, as I said previously, I don’t need all these other columns that provide information about the data files. I only care about the data inside the files. Therefore, I select the first column, the Content column and from the menu select the submenu under Remove Columns and click the option to Remove Other Columns. This is a faster way of getting rid of columns I do not want rather than selecting each column and then clicking the Remove Columns option.

Once I have only the Content column, I can focus on the button on the right side of the column header. Notice that it is a little different than the buttons on the right side of the other columns. Instead of just a single arrowhead pointing down, this button has two arrows pointing down to a line. This button means that I want to download the actual data from within the binary files into a separate table. Therefore I want to click on it.

As you can see in the following figure, I now have all of the columns from the sales table. Because Power BI only presents a preview of the data in this mode, it is not clear whether this file just contains data from the first table or whether it contains data from all of the data tables in the selected folder.

Because I am still in Edit Query mode, I need to close and apply my transformations to the folder by clicking the Close & Apply button in the Home ribbon.

Now when I return to the Data page of the Power BI Desktop, I can see my data and in the Fields dialog on the right side, I can open the CSV_Sales table definition to see all the fields in the table. But I still don’t truly know if I have the data from all the CSV files or not.

Next I open the Report page and create a simple table that displays several of the columns from the CSV_Sales table. I select the fields: SalesTerritoryGroup, SalesTerritoryCountry, SalesTerritoryRegion, and SalesAmount. I can quickly see now that the table does indeed include all my regional data from all 10 sales regions.

Just for fun, I can build a second table that includes only the fields SalesTerritoryRegion and SalesAmount and then convert the table into a TreeMap chart by clicking the TreeMap visualization. This visualization shows the contribution by sales territory region to the total sales by creating proportional rectangles within a larger rectangle that represents total sales. If I hover over any of the boxes, a popup displays the name of the sales region along with the sales amount for that region.

But wait a minute, I previously said that there were 10 regions, but I only see 7 colored rectangles representing only seven of the regions. What happened to the other three? Well, they are actually there, but they are so small compared to the total sales that they are nothing more than slivers at this scale. In fact, even as I expanded the size of the chart. It was difficult to see these last three sales regions which even added together represent less than 0.1% of the total sales. However, they do exist along the right edge of the chart as you can see as I zoom into the bottom right corner.

Well, I hope you found that interesting. I’ll look at some more Power BI goodies next time.

C’ya.

Power BI Knows Dates

Back in PowerPivot days, if I wanted to be able to drill down through a report by dates (Year, Quarter, Month, Day), I had to build the hierarchy and typically I had to build custom columns to hold these date parts. I would even have to do some fancy ‘Sort by Column’ changes to get the months of the year to display in the correct order. Power BI takes a lot of that work away and makes it easy to drill down through your data.

Let me start by loading some data from the Adventure Works Data Warehouse database. In the figure below, you can see that I’ve loaded several of the tables from this database, but for today I’m going to focus on just two tables, FactInternetSales and DimDate. The first thing I notice is that these two tables are connected with three relations represented by three lines, one solid and two dotted. This is because DimDate is a role playing dimension since there are multiple dates in the FactInternetSales table that could be associated with the date table to create reports.

By clicking on any of the relationships, I can see which fields are associated with that relationship since the field names will be enclosed in a box. In the figure below, you can see that the active relationship (the solid line) connects the DateKey field in DimDate with the DueDateKey in the FactInternetSales table.

However, I want to use one of the other relationships. To change which relationship is active, I can open the Data Tools Modeling toolbar and select the Manage Relationships button.

You can see that 6 active relationships have been set between the tables and two are currently inactive.

Suppose I want to use the OrderDateKey for my report. It would be nice if I could just click on the relationship that I want and like radio buttons (or option buttons) the previous relationship would be marked inactive and the new relationship would become active. Unfortunately, this just results in an error screen as shown here.

You have to first mark the relationship that you do not want as inactive by clicking on the yellow checkbox to the left of the relationship. Then you can select the relationship you want as shown in the next figure.

Now when I return to my Relationships page, the active connection is now linking the DateKey field in DimDate with the OrderDateKey field in the FactInternetSales table

I did this just to show that I can still work with the tables like I did in PowerPivot, but I really did not need to do any of this. If I open the Reports page and in the right side fields panel select first Sales Amount and then Order Date which is defined as the data type Date/Time, I will get a table that looks something like the following:

If I then switch to a stacked column chart, I will see a chart like the following (after a few formatting changes) that displays the Sales Amount by Year.

If I click on any of the columns, a dialog box appears with some details about that column. However, that is not what I want. I want to be able to drill down on any given year when I click on it.

In a relatively recent enhancement to PowerBI Desktop, several buttons have been added to the header. If I click on the icon third from the right as shown in the next figure, I can toggle the Drill Down feature. Drill down is turned on when the icon is mostly black instead of white.

Now when I click on the third column from the left (the 3rd quarter) the chart automatically updates to display sales by month and it displays the months July, August, and September in the correct order as shown next.

Clicking on the August column of this chart drills down to the next date level which is days in the month as shown here.

At any time, I can move back up the hierarchy by clicking the icon on the far left side of the header which is the Drill Up icon.

In fact, if I click on this icon twice, I can return to the Sales by Quarter chart. Now suppose I want a chart that displays all the months in the year. The button second from the left lets me drill down for all the current column on the current chart to the next level. So if I am displaying quarters, it show me the sales by month.

As you can see in the following chart, all the months in the year now appear. But perhaps more surprising is that the months appear in the correct chronological order, something we would have had to do with a Sort by Column option in a standard PowerPivot chart.

I hope you found that ability to drill down through a date hierarchy automatically without having to build that hierarchy first as interesting as I did. I’m at a SQL Saturday today in Tampa presenting an introduction to Power BI to the attendees. Maybe I’ll see you there.

C’ya next time.

Spring Cleaning Time for SharePoint Sites

We have been deeply involved in a cleanup of our SharePoint portal plan for the last several weeks and I thought I would share with you some of our observations because I know most of you with both Internet and intranet sites that are more than a few years old probably share the same issues we do. Our Internet and intranet portals have been around for 9 and 8 years respectively and over time, a lot of ‘junk’ has accumulated. The example I like to use is that it is similar to the way your email slowly fills up with ‘junk’ over time. For example, if you received 20 emails a day, but only have time to address 19 of them, you may go home feeling pretty good about the amount of work you got done. However, that 1 email you did not get to if you consider have only 1 email that you do not get to each day over the course of a typical work year can amount to over 250 emails. Of course, if you get more than 20 emails and the number that you do not get to is greater than 1, that total can expand much more rapidly.

So how do you get a handle on the problem? You could simply delete anything older than 2 weeks old, but that might delete important email messages that you really need to see. On the other hand, the email that you received yesterday and did not get to may be a total waste of time. Clearly the age of the email is not the only deciding factor. Maybe you choose to delete all emails from people outside of your department. Unfortunately, some of those emails from other departments may be more important than the email that circulated around asking people where they wanted to go to lunch on Friday. You could delete anything that comes from outside of the company. That would certainly help keep you focused on your work, but you would also miss notifications of appropriate training or white papers relevant to your job.

Thus you can see that cleaning out your inbox can be more complex than any simple rule or a set of rules (although they may help). It is also important to perform that cleanup on a more frequent basis than once every year. One a month may not be too often and even once a week perhaps on Friday afternoon as you are winding down for the week might be a good choice.

In a similar way, if you have both an Internet and intranet portal (hey, even if you only have a collaboration site), periodic cleanup is still something you need to consider doing. If nothing else, cleaning out old obsolete content will make search run faster and return more relevant results. So here are a few tips that may help you perform your own portal cleanup.

10 Steps for site owners to consider when performing their next portal cleanup:

  1. Remove obsolete or unnecessary sites – Sites where all content pages/documents have not been updated for 2 or more years are candidates.
  2. Examine all pages for duplicate or obsolete content and update or remove – This could result in removing the page itself if all content on the page is obsolete and removed.
  3. Remove obsolete/duplicate documents/files – Multiple instances of files all get indexed and results in bloating the search results with many invalid entries that do not point to the most recent data. Delete obsolete/duplicate files.  Burn copies onto a DVD if you want to keep them.  Adding them to your intranet site or collaboration site is not a valid solution.
  4. Remove content that appears on other sites for your organization that you do not own – Copying/duplicating content that appears on other sites within your organization bloats search results and diminishes the relevance rating of the correct document if multiple occurrences exist.  Any content not ‘owned’ by the department should be removed and replaced with a link to the content on the ‘true’ owner’s site.
  5. Remove content found on sites outside your organization – Not only is this a potential copyright issue, but updates made to the content on the ‘true’ source will not be reflected in the copied content resulting in misinformation.  Just link to external content.
  6. Clean out your calendar/announcements – If your site has a calendar or an announcements list, clean out old events that are no longer relevant.  This will improve the performance of the calendar and/or announcement list.
  7. Consolidate sites – Sometimes subsites were create when all that was needed was another page on the site that owns the subsite (parent site).  Unless the subsite requires a different set of permissions (owners, content managers, approvers, etc.) you may be able to simplify your site structure by moving content/pages/documents up a level.  This will also improve navigation and reduce the number of clicks to find the content you need.
  8. Remove content that really does not need to be public – For any content item (subsite/page/document) ask yourself if anyone in the public really needs to see this content on a public web site or whether it just clutters the public facing sites with content that no one really looks at.  Perhaps all you need is a ‘Contact Us’ link for anyone in the public to request additional information if necessary.  Some current public content probably should only be internal intranet content.  If so, move it there if it does not already exist and delete the public version.
  9. Do not duplicate content between the Internet and intranet – If the content needs to be seen by both the public and organization’s employees, place the content on the Internet and only add links to that content from the intranet. Don’t place the content on both and definitely don’t place the content only on the intranet.
  10. If content is not owned, remove it – If you have current Internet content that is not officially owned consider removing it.  Content that is not owned probably is not updated.  If the content is necessary, an owner for the content must be identified.

Well, that’s it for this time. C’ya.

The Sky is Not Falling

This week an article was published on CIO.com by Jonathan Hassell entitled, ‘7 Things CIOs Should Know About SharePoint Server 2016’. This article found it way circulating around our company from several vendors hoping to convert our SharePoint portal over to their system. They point to various statements about the new SharePoint as evidence that it is time to switch from SharePoint now. I call this ‘The Sky is Falling’ syndrome. Vendors of third party products are using it specifically to freak out SharePoint users to get them to switch to their company’s product.  However, the truth is rather different for existing SharePoint users.  Here is my personal response to some of these statements.  Don’t fall for ‘The Sky is Falling’ sales tactics.

  1. SharePoint Server 2016 might be the last version of SharePoint designed to run on premises.

    Jeff Teper just took over SharePoint at Microsoft last year and has been rebuilding the SharePoint team from the disaster it was in 2014 and early 2015.  At the European SharePoint Conference and the Microsoft MVP conference in November of 2015, a very different picture of the future of SharePoint emerged.  He has not said that Microsoft is abandoning SharePoint.  However, the definition of ‘last version’ is muddy. I believe it is more like what Microsoft has said concerning Windows 10 being the last version of the OS for PCs.  Really?  What does that really mean?  It means that all future updates will be done through regular updates/patches/etc. Haven’t they already done that with Windows 10?  In fact, the Release Candidate for SharePoint 2016 was released, I believe, as a patch to test this capability in SharePoint.  It updated our Beta 2 installation without shutting SharePoint down or reinstalling everything.  Last release?  I guess that depends on what you call a release.  It may be the beginning of the end of release numbers as we once knew them. It is a very different release world coming about.

  2. There will no longer be a free version of SharePoint for smaller shops or department use.

    True, but that is because the free release was collaboration only and that functionality has moved to SharePoint online as part of select Office 365 subscriptions where they call it Team sites.  You can also apparently buy SharePoint online as a standalone offering for $5.00 per user per month. Note that like WSS 3.0, these sites are meant for internal collaboration, not for the creation of public facing web sites.

  3. Excel Services has vanished, and to get Excel functionality, you must use Office Online Server

    Again a good choice to provide the latest updates as soon as possible.  Excel is an office product and will be baked into the Office 365 subscriptions.  Also Microsoft is pushing a new product for data analysis called Power BI which appears to be part of select Office 365 subscriptions although you can get a desktop version for free.  Gartner recently ranked Microsoft’s Power BI as most innovative in the upper right quadrant in their recent Magic Quadrant for Business Intelligence and Analytics Platforms.

  4. Managing SharePoint from the command line is pretty much a PowerShell-only affair these days

    This was announced years ago as a goal for all Microsoft servers.  There is better control over administration of Exchange and even standard Windows servers using PowerShell.  Our team has been honing our PowerShell skills for at least the last 4-5 years because we knew this was coming.  Also the SYSADM functionality was awkward at best. For some time, you have been able to install a Windows Server as a Server Core only version that skips the GUI to provide better performance and a smaller footprint which uses PowerShell for administering the server.  This direction for servers is not new for Microsoft. They gave substantial warning that administrators need to learn PowerShell in the near future.

  5. The migration process to SharePoint 2016 will be a bit involved, depending upon from where you are starting.

    While it is true that out of the box that migration from anything other than SharePoint 2013 is not directly possible, there are tools companies like ShareGate, Metalogix, and AvePoint to name just a few that support migration between SharePoint 2010, SharePoint 2013, SharePoint 2016, and SharePoint online so this is a non-issue.

  6. The Microsoft workflow and forms solution InfoPath is now Moribund

    Microsoft has promised support for InfoPath until at least 2026.  However, I do not expect Microsoft to completely eliminate support until they or a third party comes up with a viable alternative and path to get there.  Their attempt at a different forms solution in 2014-2015, Forms on SharePoint List (FoSL) failed and was removed from SharePoint 2016 until a new solution is obtained.  Unfortunately, they released the news that InfoPath was going to end prematurely without that replacement solidly in place.  In the meantime, there are other third party forms tools like K2 for SharePoint which provides a forms and workflow alternative that might be a reasonable alternative if you feel you need to start converting to something else now.

  7. There is a big focus on hybrid connectivity in SharePoint Server 2016.

    Very true because the vast majority of companies that responded to one survey expressed an interest in a hybrid approach in the future, the bottom line is that today 49% of a recent survey from Rencore use SharePoint Server on premises only, 24 % use SharePoint Online only and 25% that use a “Hybrid” configuration. Because 64% of the survey respondents also use Office 365 currently, I suspect that the percent of hybrid might increase in the near future. One factor that will keep on premise going for some time is the need to perform customizations which are easier when you have local control over the server. Over the much longer term, will hybrid on Online continue to grow at the expense of the on premise only installations, probably.

Finally I want to address the removal of public facing SharePoint sites from the Online version of SharePoint. While I have not heard anything definitive about the reasons for this decisions, I suspect the reasons are either technical (branding) or licensing related or both. I suspect at some point, Microsoft will either resolve these issues and add public facing SharePoint sites back or they will create an entire new product that will allow for the creation and support of public facing sites hosted from their online Office offerings.

In the meantime, I will strongly resist the cry that ‘The Sky Is Falling!’ and focus on things that I can control.

C’ya next time.

Using Power BI On-line

During the last several months, I’ve taken you through an initial exploration of Power BI Desktop, a program that you can download directly from Microsoft. However, there is another version of Power BI, one that resides in the cloud. Many of its features are similar to Power BI Desktop, but there are also significant differences. Let’s begin the new year by taking a look.

First you have to sign up to use Power BI. Click on the following link to go to the self-service signup page:

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-self-service-signup-for-power-bi/

You should see a link on the page that says: self-service signup process which takes you to the link:

https://powerbi.microsoft.com/en-us/

The screen (shown below) tells you that you can sign up and get started with Power BI for free. That is almost true. There are some notable exceptions. For example, you cannot sign up with an email address provided by consumer email services or telecommunication providers, this includes the fact that you cannot sign up with a Microsoft Hotmail account. Also .gov and .mil addresses are not allowed at this time. I guess that surprised me a bit because government and military intelligence is something that we are in great need of. However, I guess if you want to call it BI, then strictly speaking it is business intelligence only. J

Another reason you may not be able to signup is because your IT department has disabled self-service signup for Power BI even though your organization may have the Office 365 SKU that supports Power BI. Again Microsoft provides instructions for your administrator on how to fix that problem should they so decide to let you use Power BI. Again, I can only assume you must work for an organization that has no interest in business intelligence if they tell you no.

Of course, Microsoft does provide a way for you to personally register for a new Office 365 trial subscription and use that ID to sign up to use Power BI.

You will get an email from PowerBI/Microsoft asking you to confirm your email address.

After you have successfully navigated all the steps to get a Power BI account, you should see an initial power BI screen like the following:

If it is not already there, you can downloaded the financial sample file from: http://go.microsoft.com/fwlink/?linkID=619356. I just don’t remember how I got it originally. To install the data after downloading it, I used the Get Data button at the bottom of the left navigation panel

I can then click the Get button within the Files box to load the CSV file I just downloaded.

This will provide me with a flat file dataset to work with as a sample. The next figure shows what appears to be an entry for both the Database group and the Dashboard group with one entry in each named Financial Sample.

By opening the Dataset Financial Sample, I can choose which fields I want to work with from the Fields list.

I can simply click the check box to the left of the field name or I can click on a field name and drag it over to the working area. For this demo, I will use Sales and Month Name.

While I do have the ability to customize some features such as the formatting of the title text and even to highlight one of the columns as shown below, I quickly realize that starting from a raw data file like this in Power BI, I can not define a custom order for the Month dimension to force the months to appear in chronological order.

Rather than being in chronological order, it appears that the month columns are ordered from the largest to the smallest in sales value. While possible, this is typically not what I would want to display.

I’m going to leave that problem as it is for now and attempt to click on the Dashboard entry for Financial Sample. I am told that I have a report with unsaved changes. When did I work on a report since I never clicked inside the Reports group? Well, it appears that when I define visualizations in the dataset area, I am actually creating a report and must save it if I want to keep it.

When I click Save, Power BI prompts for a report name. For now, I am just going to call it: TestReport.

With the report saved, I now can see the Financial Sample default dashboard.

The first thing I notice at the top of the page is a text box which is asking me to enter a question about the data. This is something new that we have not seen in the Power BI Desktop. The interesting thing is, I can enter a question (or even a statement since the syntax of a question is not as important as the words used in the sentence) that references one or more of the data fields in my dataset. In this case, I can ask a question like: What are the total sales? (Interestingly, I can just type: Total sales and get the same result.) As I type enough of the question for Power BI to guess at what I want, it starts displaying results

I can continue to drill down to get sales by product by just adding the words by product to my question/statement from the above example. Note that Power BI automatically decides to switch from a simple text answer for a single value to a bar chart with the bars representing my product dimension.

I can just as easily request to see total sales by month. Note in this image, the months are displayed in the correct chronological order even though the report I created earlier did not show them in that order by default.

I can even display sales by country as shown below.

By simply changing the visualization, however, I can create a map of sales by country which might be a better way to present geographic data to management.

So I’m going to stop here in this week’s tour of Power BI. I’ve discussed some of the issues you may need to deal with to get access to Power BI and have shown how to load and use a simple flat data file in the form of a CSV file. In future weeks, I will explore more of the features of Power BI and especially how you can use Power BI Desktop or even Excel to create the datasets you need to display in Power BI.

On last point that I will cover in more detail at a later time, but which you might want to be aware of at least for now. One of the advantages of using Power BI to display your data analysis is that you can save your dashboards and make them available to others within your organization as a simple download. Yes, with Power BI Desktop you can save your model and copy the file from your machine to others. However, if you make a change, you must redistribute the changed file each time. Also refreshing the data is a manual process at this time when using Power BI Desktop. We will see in future blogs how these issues are addressed by Power BI to make your life simpler.

C’ya next time I’ll take our Contoso data and show how easy it is to use it from a One Drive file.