Last time we loaded a BI model with data from several different data sources, but the data sources were carefully selected so that the columns that would relate one table to another table would already have the same name and data type before attempting to upload the data to Power BI Desktop. Because of this assumption, I was able to let Power BI auto detect the relations between the tables. Unfortunately, in the real world, that does not always happen. So this time I will show you how to create relationships manually and how you may need to massage the data a little first before defining those relationships.
I am going to use the Contoso data again and will begin with the following tables already loaded into the model from my SQL Server instance: FactSales, DimChannel, DimDate, DimProduct, and DimProductSubCategory. To complete my model, I need the Product Category data which I am going to load from an Access table in this case.
Since we covered how to load data from SQL Server last time, I am going to pick this discussion up after I have loaded the SQL data and am about to load the Product Category data. The image below shows that I have already click the Get Data button and have selected Access database from the list of possible data sources and have defined where my Access database could be found. As you can see in this figure, Power BI only finds a single table in my Access database named DimProductCategory. When I select this table, I see a preview of the data.
In the table preview (on the right), I see that there are only two columns named ProductCategoryLabel and ProductCategoryName. You might be able to guess from the way the data in the column is formatted that the ProductCategoryLabel data has been defined as text, not numbers because it is left justified.
After clicking the Load button to import this data into my data model, I switch to the relationships view and see the table diagram shown below.
I see that there is no relationship defined with the DimProductCategory table because there are no connecting lines leading into or from this table. Therefore, I first want to select the Manage Relationships button. I can see the four relationships that tie together the other five tables, but I am missing a relationship. I may first try to click the auto detect button which I talked about last time to see if Power BI can find the missing relationship
Unfortunately, Power BI very quickly responds back that it cannot. The problem is that while the DimProductSubcategory table has a field named ProductSubCategoryKey, the new table, DimProductCategory, does not have a field with that name.
So I might try to create the relationship manually by clicking the New button on the Manage Relationships screen (after closing the Autodetect message screen of course).
On this screen I have to specify the names of the tables and the linking columns to define the relationship. I can begin with either table on the top. After I select the table, I see a grid of the available fields along with a couple of rows of data. To select the field that I want to use in the relationship, I merely need to click the column header to select the field.
I then specify the name of the linking table along with selecting the column that I want to link to.
There are some advanced options which I am going to skip over for now and simply click the OK button. Power BI then attempts to create the relationship. But wait a minute, you might say, the column names are different and the data types of these two fields are different. That is true. I am not surprised that I can link two tables on fields with a different name, but different data types? In fact, I expected Power BI to reject this relationship because of the different data types, but it did not. It created the relationship as shown in the following image
To find out if this relationship is really valid, I next go to the report desktop and select the ProductCatgoryName field from the DimProductCategory table and the ProductSubcategoryName field from the DimProductSubcategory table. You can see from the figure below that the data makes sense. The general category Audio should include subcategories like Headphones, Radios, and Speakers, but not Camcorders, Cameras, or Cell Phones. Somehow, Power BI was able to transform automatically one of the data types to the other (I am thinking it converted the string field to an integer) and then formed the relationship.
Yes, I could prove that this relationship is correct another way by creating a simple table that lists ProductCategoryLabel from the DimProductCategory table and ProductCategoryKey from the DimProductSubcategory table. You can clearly see that the relationship links these two tables correctly.
Surprised by this, I wanted to try something else that may not be as easy for Power BI to automatically convert. I took another table, the Stores table, and modified the Excel version of the table to change StoresKey to prefix the store number with the letter ‘S’. I then loaded that table into my model as shown below.
Next I went to the Manage Relationships dialog as before and attempted to add a relationship between the FactSales table and the Stores table as shown in the following figure.
Again Power BI did not complain about creating the relationship. However, when I went to the reports page this time and attempted to display a report of sales amount (from FactSales) against the StoreID from the Stores table, I got the result shown below which indicates a total sales for each year, but no StoreID value at all. (I used the Matrix visualization here with StoreID for the rows and YearLabel for the columns and SalesAmount for the value.) So clearly this relationship did not work.
Therefore, I need to edit one table or the other to ‘fix’ this relationship. I choose to edit the Stores table. Remember from last time, to edit a table, I click the Edit Queries button to open a new window where I can edit the tables.
In this case, the solution is to remove the ‘S’ from the front of each of the StoreID values. I can do this with the Split Column transformation and specifically to split the column based on the number of characters from the left rather than splitting the column based on any specific character or character string like I did in an earlier blog example.
This action results in two columns, the first, StoreID.1, contains only the first character of the StoreID field, the ‘S’. The second field, StoreID.2, contains the numeric portion of the store ID. As you can see in the formatting of this column in the following figure, Power BI also treats StoreID.2 as a numeric value.
I then removed the StoreID.1 column as something I will not need. I also renamed StoreID.2 to just StoreID.
I then clicked Close and Apply.
Now I can create a new relationship between FactSales and Stores using StoreKey and StoreID respectively as shown below. Note that I did not change the field names to match. If I had, I could probably use the auto-detect feature to find and create the relationship for me.
This time when I attempt to create the same report on sales by store and year, I get reasonable results as shown in the table below.
Well, I hope you found that interesting. Next time I plan to probe a little deeper on creating calculated columns within a query.
Till next time, c’ya!