Being able to mash together data from multiple sources to create meaning full analysis in Power BI or even PowerPivot can be lots of fun, at least to my data geek friends, but presenting visualizations of that data to management without the ability to customize the look would fall flat on its face at your next executive presentation. Therefore, I am taking a side trip this week to show you a little about formatting your visualization, enough to get you started exploring how to make individual visualizations look good.
I also want to introduce you to a set of simple data table that you can use for free while learning how to use Power BI. A team from the Paris Technical college put together some data and perform some basic clean-up of the data. The team includes: Petra Isenberg, Pierre Dragicevic and Yvonne Jansen and the data can be found at: http://www.perso.telecom-paristech.fr/~eagan/class/as2013/inf229/labs/datasets.
For today’s blog, I am going to use their CARS dataset. While it has over 400 cars with 8 characteristics, it does not have the new Tesla X model L, but it is a good dataset to show some basic visualization formatting technics.
Thus I start by importing the CARS.CSV file by using the Get Data option.
After loading the dataset (shown below), you can see a preview of the data in which the first two data rows are not cars. In fact, the first row looks like it could serve as column headers or in terms of our data set, column names. While the second row looks like it defines the preferred data type for that column. To begin loading the data, click the Load button in the bottom right of the screen.
After loading the data, the first thing I need to do is to make some quick changes to the dataset by clicking the Edit Queries button.
In the Transform group of the Home ribbon of the Query Editor, I can click on the option: Use First Row As Headers. This is similar to the PowerPivot load technique in which you could define whether your data included headers in the first row.
Next I want to get rid of the row with the data type (which is now the first row). But before going further, I take note of the data type of each column. I will need that information in just a moment. In the Reduce Rows option group, there is an option to Remove Rows.
Click the bottom half of the Remove Rows button. This action opens a dropdown letting you specify how you want to remove rows. I am going with Remove Top Rows because the data type row is now the first row in the dataset. This option now opens a dialog that lets me specify how many rows I want to remove from the top of the dataset (Do you see why I had to deal with the header row first?). In this case, I only need to remove 1 row, but you may encounter other datasets that require you to remove more than a single row.
Next I noticed the car column appears to be a concatenation of the car make and model.
Looking through the data, the car model appears to always be a single word. Therefore, there is an easy way to split this column by click on the Split Column button and then select By Delimiter.
The obvious delimiter here is the left-most space character.
After splitting this column and renaming the columns as Make and Model, my data table looks like the following image.
Next, I noticed that all the numeric data was loaded as text. This is a direct result of reading data from the CSV file. Remember the second data row which told us the preferred data type? We can use that information to change the data type (in the Formatting group of the Modeling ribbon) of each column as shown in the following image.
I also want to change the year column to a four digit year. Because this data does not include cars from the 21st century, I can get away with simply concatenating all the values in the YEAR column with a prefix of ’19’.
After completing all my data preparation, I can click on Close and Apply so I can begin creating reports (visualizations) with this data.
Let’s begin with a simple table of average MPG for each year. On a new report page, I can drag the columns MPG and YEAR from my list of columns for the table. Note that by default, numeric values are summed. To change MPG to calculate the average MPG, open the dropdown for MPG in the Values section of the Visualizations column on the right and click on Average. If YEAR is being treated as a number, you can select ‘Do Not Summarize‘ from its dropdown. My table now looks like the following image.
I would rather see the year first, then the MPG so I need to change the order of the columns. You can do this by clicking on and dragging the field name in the Values section of the Visualizations (where I changed the aggregation types) and dragging the field up or down as appropriate to place the columns in the order I want.
While this table is interesting and shows a general increase in the average MPG of cars in the dataset, it is not as clear to see how the average rises and falls from one year to the next. Therefore, I might want to change to a column report.
While this is a nice basic chart, it needs labels to explain to someone viewing it for the first time what they are seeing. Let’s begin by clicking on the Format icon just below the Visualization selection grid.
Now instead of seeing the Values and Filters groups under the Visualizations, we see various chart components. Each of these components have a dropdown arrow on their left which will open that section to expose additional properties for that chart element. They also include a simple sliding switch to turn that element on or off within the chart. Note that currently the X-Axis and Y-Axis are turned on, but Data Labels, Title, and Background are not.
If we open the properties for the X-Axis, we see formatting options for this chart element. Although not needed here, I can change the axis type, the axis scale (linear or logarithmic), a start and end value for the axis, and a Title, Style, and Color. For the X-Axis, I only want to turn the Title property on by moving its slider to the On position.
Next I open the Y-Axis properties. I change my axis scale to begin at 10 and default to automatic for the end value. Often when data values are very similar, plotting their values with a y-axis that starts at zero de-emphasizes the variations from one value to the next. Setting the starting point of the y-axis to a value closer to the smallest y-axis value will emphasize those differences. Think of a stock that trades in the lower $200s range with a daily fluctuation of a single dollar or less. Those fluctuations are masked if the chart has a y-axis data range of 0 to $250. On the other hand, changing the y-axis to begin span only the range 200 to 250 emphasizes the daily fluctuations.
As you can see below, my chart is starting to look a little better.
Next I open the data colors group. I can change the default color which is an aqua shade to any of the other colors by selecting either from the Theme Colors provided in the Default Color dropdown by going to the Custom Color option and defining my own unique color for the chart.
However, this changes the color of all the columns to that one custom color. What if I want to display different columns using different colors? If I slide the Show All slider to On, the property dialog now shows a separate color selection for each column in my chart as shown below.
Did you notice that the years are not displayed in chronological order? Did you wonder why? The years are actually displayed in order by the size of the Average MPG so that you could, if you want, define a progressive shift in color shade as the value of Average MPG increases. That being said, I decided to customize the colors by year as shown in the following image.
Finally I went back to turn on the Y-Axis title to arrive at the visualization shown below.
To add detail to the chart, I also turned on the Data Labels group which causes the data value to display at the top of each bar.
Now my chart shows both the actual data values as well as the relative change from year to year.
Finally, I turn the title element on. Here, unlike the x-axis or y-axis, I can specify the text to display as the title. For the x-axis and y-axis labels, the text is taken directly from the column names. I can also specify the font color, a background color and the alignment of the title across the top of the chart.
My chart is starting to look pretty good but it is still missing something.
I open the last chart element group, Background, and turn on the background giving it a custom color.
Now I have a final version of the chart that I might want to publish and add to a dashboard (which I will talk about in a future blog.)
Okay, while I did not show everything that formatting can do, I did want to give you a start. Please note that if you use other visualizations, they have their own formatting options as well which work the same or similar to these. Have fun playing with visualizations until next time.