Last week I looked at how I can load into PowerBI an entire library of source data files and let PowerBI determine how to append the resulting files together. This method works great if all the data source files are of the same time and have the same schema/structure. But what if some of the files come from an Access data source, some from CSV files, some from Excel, etc. Furthermore, what if the data column order is different or perhaps if the column headers are different. These are cases in which you may need to import each data source individually. Then edit the query for each source to align the column, column names, data types, etc. before combining the data sets together. Let’s see how that looks.
For simplicity sake, I am going to start from the same folder as last week that holds 10 CSV files each representing a different sales region.
Although I am not going to show images from each data load, I would proceed to load each CSV file into my PowerBI data model one file at a time rather than loading the folder like last time. This method creates a separate table within PowerBI for each data set as shown in the following figure.
I can view the data in each table by selecting the table name in the Fields panel as shown above. The first thing I would want to do is to ‘fix’ any inconsistencies in the data schema in these tables by selecting the table and then clicking on the Edit Queries button in the External Data group of the Data Tools Modeling ribbon. I basically want to build a common table structure across all of the data sources including a consistent column name for each column that appears in multiple tables.
You should note that if I have a data source with a unique column that does not appear in the other tables, I can keep it. Later when I append the individual tables into a single table, the unique column will be brought into the final data model, but the field will be blank for all the other tables that do not have that column.
In my first example here, all of my data schemas from each data source are the same but they would not have to be. So after simply loading the data from each data source, I am ready to start combining the data into a single table for analysis. To do this, I click the Edit Queries button in the External Data group of the Home ribbon and then select one table to start the process.
There is a button on the far right of the Home tab while in Edit Query mode in the Combine group called Append Queries. I can use this button to begin the process of combining the tables.
The Append Queries button opens the dialog shown below that lets me select which table to append. I can only append one table at a time so to append together all 10 tables, I need to do this step 9 times.
As shown in the figure below, each appended query gets the generic name of: Appended Query followed by a number (after the first one which has no number).
This default name is not descriptive enough to help me identify which appended query refers to which data set. If I want to remove one of the data sets from the final table, I would have to click the settings button on the far right of the applied step (the gear icon) to reopen the Append dialog to see which table is being appended. Then if I want to remove that dataset, I could click the ‘X’ to the left of the applied step to delete that one step.
However, a better option is to right click on the default applied step name and select Rename from the popup menu that appears.
This option allows me to select the current applied step name and replace it with a more meaningful name.
The following figure shows a much clearer picture of which data set is being referenced in each step. It also shows that I have finished appending my 9 additional data sets.
Note that any steps applied to the individual data sets are still applied first prior to the data being appended to the final dataset.
Note: One thing that I did not do here but probably should have is to begin by making a duplicate copy of the table that I wanted to begin with so that I could preserve the original table with only its own custom transforms. Then using the duplicated table, I could append the rest of the data sets.
When all the data is appended into a single dataset, I can close and apply this transformation so that any data refreshes can repeat these steps.
I can now go to my Report tab and start to build the visualizations that I want. In the final figure for this session shown below, I create a table of Sales Amount by Sales Territory and included the Sales Territory Group, Region, and Country. I then click on the Sales Amount column to sort the table by this column in ascending order. You can see very quickly that most of the sales occur in Australia and the least sales are in the Central Region of the United States.
Beneath the table, I create another table with the Sales Territory Group and the Sales Amount. PowerBI automatically sums the sales for each Territory Group and displays a chart with only three segments. After creating the table, I change the visualization to the Donut chart to create the appearance shown below.
As with all PowerBI visualizations, the data in the table is linked to the data in the donut chart. If I click on the North America segment of the donut, the table on top refreshes to display only the 6 rows representing sales in North America. Similarly, if I click on the Pacific segment of the donut chart, the table above immediately updates and displays only the one line for Australia.
That’s it for this week. Come back next week for more PowerBI fun.
C’ya.