Can You FILTER() That Down For Me

The last several weeks I have been looking at how PowerPivot in Excel works with Row Context and Filtered Context. I showed that most column expressions use a row context while measures use a filtered context although I could add and remove filters using certain expressions that allowed me to define a filter as a Boolean expression in one of the parameters. Last time we even looked at how to remove the filters by using the ALL() function. This time I will explore the FILTER() function which allows me to define a permanent filter condition to a measure no matter what dimensions or slicers the user chooses for the pivot table. In fact, in the case I am going to show you today, I need to do this because I need one measure to use all filters defined by the dimensions in the pivot table, and I need another measure to obey those filters plus one more.

Again I will use my Contoso data model that I’ve been using for all the examples in this set. I want to look at the number of orders that have returns and compare that to the total number of orders. I initially will want to show this information by sales channel and year/month. However, once I have my pivot table defined, I could of course change the dimensions I want to explore.

Let’s begin with a basic Sales pivot table as shown below.

I built this table using my basic data model with no additional calculated columns or measures except the calculated column in the date dimension that I use to order the name of the months correctly. I can use any of the columns in the FactSales table as my value field as long as I change the aggregate function from SUM to COUNT. By default, Pivot tables assume that numeric fields are summed and non-numeric fields are counted. But as long as I change the aggregate function for numeric fields to COUNT, I will get my desired results. I also modified the formatting to get rid of any decimal places and to add a thousands separator. Other than that, I did nothing special to build this table.

However, now I am going to return to the FactSales table and add a simple measure to count the total number of sales. The expression I will use is shown in the following figure.

I use the COUNT() function which has a single parameter, the name of the column I want to count. Again I could choose any column, but I chose the column [ReturnQuantity]. I will come back to format this measure in a moment, but you can see that the count is a little over two and a quarter million sales records. In fact, I know that this is correct by simply looking at the number of rows in my FactSales table.

Next, I want to count the number of sales records that have returns. This I can do by comparing either the [ReturnQuantity] column or the [ReturnAmount] column to 0. Only sales records which have values greater than 0 for these two columns represent orders which had returns. How can I do this?

One way I could do this is to use the SUMX() functionwith a second measure named [ReturnCount2]. This function has two parameters. The first parameter must be a table and second parameter is an expression of what I want to count. So I might think that I could do something like the following expression:

ReturnCount2:=SUMX(FactSales,IF(FactSales[ReturnAmount]>0, 1, 0))

The theory is that I want to compare the column [ReturnAmount] to 0 and if it is greater than zero to add one to my ReturnCount2 value. I cannot simply sum the [ReturnAmount] because this column represents the dollar value of the return. Nor can I use [ReturnQuantity] because the buyer may have returned more than one of the item from the order and summing the quantity would over count the total number of orders with returns.

I could also use the COUNTX() function. However, if I simply replace SUMX() with COUNTX(), I will get the total number of orders in the FactSales table because COUNTX() will could all non-blank rows. But I can trick the IF() into returning a blank for orders without returns by using the following expression:

ReturnCount2:=COUNTX(FactSales,IF(FactSales[ReturnAmount]>0, 1, BLANK()))

But both of these solutions used the entire FactSales table. There is one other way I want to show you today. I can use the FILTER() function to apply a filter to the FactSales table to return a subtable that only has rows with returns by using the following expression to return a table

FILTER(FactSales, FactSales[ReturnAmount]>0.0)

I can now replace the first parameter in COUNTX() with this FILTER() result which is a filtered table. I can then use any column in FactSales as the column I want to count. Well, almost any column. Actually, I cannot reuse the [ReturnAmount] column which is used in the FILTER() expression because this confuses DAX, but as I said before, I can count on any column in the table. Therefore, my [ReturnCount2] measure expression is shown below.

In this image you can see that I already formatted my measures as numbers without decimals but with thousands separators. Why do I format the numbers here? Simply because it saves time from having to format the numbers in each pivot table in which I use the measures. If I display these two measures in my pivot table side by side, I can see the total number of order by channel in each month along with the number of orders that had returns.

Suppose I wanted to show this information to management and rather than look at the raw counts which could take a bit of time to interpret, I decide to calculate the percent of orders that have returns. I can create a third measure as shown in the following figure that uses the results of the first two measures. I can then format this measure as a percentage prior to using it in my pivot table.

Returning to my pivot table, I remove the counts which I no longer need to display and replace them with the [Percent_Returns] so that management can quickly see that Catalog sales result in the most returns and Store sales in the least returns. Returns do not vary greatly by month, something that I will leave up to you to explore with a Pivot Chart.

Well, I hoped you learned some new ways to apply different filters in your measures from this discussion. C’ya next time.

I Want It ALL()

This week I am going to reverse direction on applying filters to my pivot table and show you how and why you might want to remove all filters instead of adding filters to an expression.

Again working with the Contoso dataset, I am going to start by looking at sales by product category. In addition, I want to be able to slice my data by channel or combinations of different channels. To do this, I want to use the visually friendly slicer tool as shown in the following figure. Note that in this case, I already have selected only the Online channel which results in a total of $2.6776 billion in sales. Keep in mind that the total of all sales across all channels is a little more than $12.4 billion.

Now for each product category, I want to see the total sales in that channel compared to the total sales across all product categories and all channels. If I refer back to my earlier blog on the different ways pivot tables can represent data by using built-in features, I might try looking there first to see if there is a fast way to accomplish my task. By right clicking on any of the rows in the Sales Total column and selecting the option: Show Values As, I can pick from a dropdown menu of different built-in calculations. Many of these options calculate percentages of row, column or grand totals of either the entire pivot table or a group level. These options also provide difference and running total calculations.

For example, if I were to select % of Grand Total, I would get some interesting percentages. However, these values would be based on the total sales of the slicer filter, in other words, the total sales for online sales as shown in the following figure.

So let’s play a little with a different pivot table that shows total sales by each of the channels. In the figure below, I’ve included the channel as my row filter and a have two columns which both show the total sales amount. Note that each row of the total sales amount is filtered by the channel. This is an example of filtered context when calculating a measure.

Now I’ve labeled the first of the two column: Total Sales Amount and will therefore let the pivot table display the sum of the measure filtered by the channel.

However, I’ve labeled the second column: % Dales by Channel. I can right click on any of the values in this column to select one of the other built-in calculations. In this case, calculating a percent of the grand total will show me the percent of sales that come from each of the four channels as shown below. The value in the Grand Total row displays 100% because all sales are represented by one of the four channels. Note here that it is clear that online sales account for only 21.57% of the total sales. I can use this information to validate what I’m about to do in the next step.

I am going to create a new measure named: Percent_of_Total_Sales. To generate a value for this measure, I want to sum the column FactSales[SalesAmount] for the filtered context of each place this measure appears in my pivot table. However, to get a percentage of total sales, I need to calculate the sum of FactSales[SalesAmount] for all sales, not just sales for a channel or product category, or any other filter criteria. In effect, I want to calculate the total sales as if there was no filter context in the pivot table at all. I can do this by creating a ‘new’ table for the SUMX() function (remember SUMX() has two parameters, the first of which must reference a table of values). There is another function I must use to eliminate the filter context for this ‘new’ table. This function is aptly named: ALL(). When I use ALL(FactSales) (and yes, the parameter for the ALL function must be the name of the table and it returns a table with all filters removed), I can get a ‘copy’ of the FactSales table without applying the filter context of the pivot table. In other words, all of the records in the original FactSales table will be included in ALL(FactSales). If I use this ‘new’ table as the data source for my SUMX() function and then simply sum the Sales Amount column using the SUM() function as shown below, I can return the total sales of the unfiltered FactSales table which then can be used as my denominator in my calculation. The numerator is a SUM() function of the Sales Amount also, but is calculated on the filtered context which in my case is filtered for online sales and product category.

Initially the measure returns a value of 1 because in the data model there is no filter context so the sum of the ‘filtered’ sales amount total divided by the ‘unfiltered’ sales amount total will be equal to 1. Rather than go directly to the pivot table, let’s first format this value as a percent by right clicking on the measure definition cell and selecting Format.

This option displays a dialog that lets me select the data category which is: Number. I then select the format of the number as Percentage with 2 decimal places. I then click OK to accept the format for the measure.

My measure calculation in the data model now displays a value of 100.00%. However, if I place this measure in my pivot table that displays sales by product category and uses a slicer to include on online sales, I can see my sales percentages as shown below. These values are now correctly dividing the product category sales for online sales by the total sales of my company. I can feel confident that the calculation is correct because the percent in the Grand Total line which represents sales from all product categories made through online sales is 21.57% which is the same percent I calculated in the pivot table earlier that only looked at sales by channel.

Using the channel slicer, I can select store sales instead of online sales. My previous pivot table told me that this should be 55.93% of the total sales. As you can see in the following figure, the Grand Total of my pivot table that displays sales by product category has percentages that also (accounting for rounding) add up to 55.93% since this pivot table uses the channel slicer.

In case any reader is wondering, the second pivot table that displays sales by channel, while on the same spreadsheet as the first table that displays sales by category does not use the slicer. If you have a slicer and multiple pivot tables or pivot charts, you must associate the slicer with each table and/or chart to which you want it to apply. It is not automatic nor implemented by spreadsheet page.

I hope you are starting to get a better feel for how row context and filter context work together with DAX expressions and functions to calculate values. Being able to correctly apply and remove filter contexts is essential in calculating values in many pivot table tables.

C’ya next time.

Getting the Right Context – Part 2

Last time, I introduced the concept of context within DAX expressions using by PowerPivot to calculate columns and measures. We saw that the default context for column calculations was row context while the default context for measures was filtered context. However, I ended the discussion by showing that within a measure calculation, I could use column context with certain aggregate functions like SUMX which can be used to apply an expression across all the rows of a table. However, before that expression is evaluated, that table is automatically filtered since I used it in a measure which begins by applying the filtered context of the pivot table to the rows used by the expression.

So this time, let’s dive a little deeper. Let me begin by going to the FactSales table of my Contoso data model and calculate a measure for total sales. This is not difficult and can be achieved by using the SUM() function as shown in the following figure.

Now let’s assume that I want a measure that shows only the sales made through the store channel. At first, I might try to use the SUMX() function along with the FILTER() function. The FILTER() function also has two parameters. The first parameter is the name of the table that I want to apply a filter on. The second parameter is a filter expression. I may at first assume that because I created a relationship between the FactSales table and the DimChannel table, that I can simply reference the column [ChannelName] and compare it to the string “Store” to filter the FactSales table. However, as you can see in the following figure, this expression would result in an error.

The reason for the error is that while the FILTER() function references the table FactSales, there is no context to link records in FactSales to DimChannel. I know you might ask, “Doesn’t the relation between these two tables define that context?” The answer is that the relationship between tables while defining the ‘mechanism’ of how to connect the two tables, it does not activate a context between the rows in FactSales with a row in DimChannel. When pointing from the many side of a relationship to the one side of the relationship, we must use the RELATED() function to activate the context within the expression. I show this in the following figure.

You can see that now I have a total sales for the just the stores as a measure. If I use this measure in a pivot table that displays sales by month and by product category, I will have the additional filter of sales by store in each of the pivot table cells.

What if, however, I wanted to create a calculated column in the table DimChannel that displayed the total sales for that channel? Again you might start with the SUMX() function because you want to calculate an expression from another table. In this case, the first parameter of the SUMX() function would be the FactSales table and the column that we want to sum would be the [SalesAmount] column. However, if we were to create this column, we might be surprised by the result, shown in the following figure.

All of the values in the column are exactly the same. Furthermore, if I refer to the image earlier in this blog for the total sales across all channels, I would see that the value displayed here in each cell of the column is actually the total sales. Again the problem is context. There is no context to refer back to FactSales from DimChannel. Therefore, when SUMX() evaluates the [SalesAmount] in table FactSales, it pulls values from all the sales records, not just the sales from the channel represented by the current row in DimChannel.

In this case, because I am going from the one side of the DimChannel relationship to the many side of FactSales, I need to return a table that contains just the rows from FactSales that represent sales from the channel in the context of the current row. I can do this by using the RELATEDTABLE() function which uses a single parameter, the name of the table on the many side of the relationship. I must also have the relationship explicitly defined between DimChannel and FactSales. I have already done this. So Power Pivot can use the relationship to create a subset of rows from FactSales for the current channel. I can then use this resulting table in the SUMX() function to sum the [SalesAmount] column as shown in the following figure.

As you can see in the figure, the first row which represents the store channel displays the same sales total as we calculated from the measure in FactSales earlier.

So again, you can see that there are many different ways to define the context of an expression. When dealing with multiple tables, it is important to understand whether you can perform a row context calculation by using the RELATED() function to extend the row context to the related table on the lookup side of the relationship or whether you need to use the RELATEDTABLE() function to filter the rows used in an aggregate function like SUMX() to calculate values for a column which uses the row context to define the link to the many side table.

Next time, I’ll look at some functions that let you turn off a filtered context and show where you might use it.

C’ya next time.

Can You Put That Into Context For Me?

Over the past several years, I’ve conducted many sessions on how to use DAX in PowerPivot to create columns and measures. Throughout this time, the one thing that has confused many people with whom I have talked with is the concept of context for an expression. Many do not understand why some equations appear to work across for defining new columns while other equations only appear to work when calculating measures within a pivot table. This week, and perhaps for another week or two, I am going to take a shot at explaining the concept of context. Until you get a good understanding of context, many of the DAX expressions for columns and measures may appear to be magic that somehow get the correct answer but you may just not understand why.

Let me start by defining the two major context types that DAX expressions have to deal with:

  1. Row Context – This is probably the easiest context to understand. When you define a formula to calculate the value for a new column, the formula is applied to each row individually. In fact, the value calculated can depend only on the value of other columns in the current row. This is much like the way you calculate a new column value in Excel. However, unlike Excel, it is more restrictive. Using Row Context, it is not possible to directly access the values in a prior or successive row because those terms do not really have meaning within a table in the data model of a pivot table.
  2. Filtered Context – This is probably easiest to understand when you think of the entire set of rows in a table. If you sum a numeric column or simply count any column in the table, you are applying a filtered context. Of course, in its simplest form, the filter is really the same as all of the rows in the table, or no filter at all. But let’s assume for a moment that you have a sales table that includes sales from each of your stores in each of the states of United States. Furthermore, you only want to know the sales for stores in the state of Florida. In database terms, you apply a filter using a WHERE clause to include only those rows from the sales table in which the sales came from a store in Florida. Of course you can have more than one filter. You might filter not only on the state, but also on a specific product that was sold during a specific month of a specific year. That would require four filters to be applied to the table to reduce the total number of rows of sales to only those you wanted to sum. Filtered context is often used in expressions that involve summing, counting, averaging, finding a minimum or finding a maximum. These are also referred to as aggregate functions.

So let’s see how this works in DAX. Using my Contoso sales table, suppose I want to calculate the profit for each sale. Since I want to perform the calculation per sale and because my sales table has one row for each sale, I can use a calculated column. The formula for my calculated column begins by taking the value of the column [SalesAmount] which is the money received from the sale. This is not the profit. To calculate profit, I need to subtract costs from [SalesAmount]. To do this, I can subtract the value from the column [TotalCost]. This gives me a pretty good idea of what my potential profit might be. However, in the Contoso data, I also have to consider any returned item. These are items returned because they are damaged or do not work. The policy is to refund the customer their money. However, the returned product it not worth anything and is considered trash. Therefore, I need to also subtract from [SalesAmount] the [ReturnAmount]. If I define a new column in my sales table, I can use the formula shown in the following figure to calculate the profit for each of the sales.

As you know, the PowerPivot data model applies the column formula to each record in the table. I can then use this column in pivot tables to display profit for sales by any of the appropriate dimensions such as time, location, product, etc.

Fortunately, in PowerPivot for Excel, displaying profit in the pivot table is as simple as moving the column [Profit] to the Values area in the field list. PowerPivot lets us use any numeric field in the values area without any additional work. However, if I attempt to create the same model in Analysis Services, I must create a measure as shown below.

The SUM() function is an aggregate function and therefore defines a measure, never a column. Note however, I must still first create the column [Profit] as previously defined. I cannot simply create a measure that sums the result of the expression as shown in the following figure.

This is because the calculation: [SalesAmount] – [TotalCost] – [ReturnAmount] is a row context expression since it must be calculated on every row of the table. However the SUM() function is a filter context expression (even if the filter is all the rows). I cannot combine the two in this expression without getting the Semantic Error shown above. The SUM() function can only aggregate the values of a single column, not aggregate the values of a calculation.

What I can do however, is use one of the other DAX functions, SUMX(). This function has two parameters, the first parameter is the name of the table I want to perform a calculation on. In this case, the table name is FactSales. The second parameter is an expression that can include one or more columns from the designated table. In this case, I can use the expression I previously used to calculate the [Profit] column as shown in the following figure.

As you can see in this figure, the resulting total profit across all the records in the FactSales table is exactly the same whether I use the expression SUM([Profit]) which sums the calculated column or if I use the expressions SUMX(FactSales, [SalesAmount] – [TotalCost] – [ReturnAmount]). This second expression essentially calculates the row context expression for profit on each row in the table FactSales. However, as a measure rather a calculated column, the expression is filtered by any dimensions used in the pivot table.

In the following figure, the only dimension is sales channel. Therefore, with four sales channels, there are four rows. Note that both calculations of total profit result in the same values for each channel.

What is the difference between the two? Since all of the sales records were used to build this table since all four sales channels were included, the same number of calculations were effectively performed in both cases, calculating the profit for each sales row and then summing the profit for each channel. The only real difference is that the first one required the prior creation of a [Profit] column which takes some of the precious data model memory while the second method using the SUMX() function does not require that column.

But what if we applied other dimensions any perhaps by using a slicer displayed the profit from only some of the sales. In the following example, the row dimension displays countries. However, only countries in Asia appear based on a slicer that limits which continent I want to see profits. I also included a slicer on year to limit the years in my analysis to only two years. Therefore, in the following figure, the profit amounts represent only a subset of the total profits for the Contoso organization.

How does this affect the calculation? To calculate Total_Profit, I still need the column [Profit] to calculate value for every sales row in the FactSales table, even rows that I will not use. On the other hand, to calculate Total_Profit_2, the filter on continent and year is applied first limited the total number of rows that are needed to be calculated from the FactSales table. The calculation of profit is then only performed on this filtered subset of rows and then aggregated appropriately based on the dimensions shown.

Thus, you might say for now that column calculations are generally row context and measures are filter context expressions. However some functions when used in measures let you perform a row context expression after applying a filter context like the SUMX() function and other aggregate functions that end in ‘X’ like COUNTX(), AVERAGEX(), MAXX(), and MINX().

Next time, I will dive a little further into understanding context within DAX.

C’ya next time.

It’s a Fiscal Thing

In past blog articles, I’ve talked about the need for a date table, not just for BI/PowerPivot analysis, but for any application. You might be hard pressed to come up with many applications you have written or have used in your organizations that do not include some aspect of time as one of the dimensions by which you collect data and/or report on it. Therefore, building a Date table once that you can use for many different applications can save you a great deal of time when trying to determine how to group your data for different reporting period or to display different date labels along one of your dimensions.

One of the easiest ways to build a date table that you can use anywhere is to begin with Excel. It is a simple matter to create a column called DateKey and enter the first two dates of a large general purpose date range as shown in the following figure.

Then to create any size table, simply select the first two cells and drag the bottom right corner down to any number of rows you want. Given Excel’s limit of a million rows, this should cover you for most applications.

Once you have the first column, you can add columns for other ways of displaying date information. The following table shows a few of the more common expressions I have used to create additional columns.

<insert table here>

If you use a standard calendar, January 1 through December 31, the functions provided work well to calculate the names of months, days of the week, quarters, years, etc. However, if your organization uses a fiscal calendar, some of these calculations are not as easy. For example, suppose your organization uses a fiscal year that begins October 1 and goes through September 30th. The calculation for month number is not as simple as using the MONTH() function. Nor is the calculation of year simple either because the fiscal year for 2015 may in that case go from October 1, 2014 through September 30, 2015. So I thought you might like to see a few functions that could help you calculate fiscal columns in your data table.

Let’s start with the calculation of the fiscal year. Using the example date range above, how can I calculated the fiscal year from any given date. If I were to add 3 months to any date and then calculate the year using the YEAR() function, I would get the fiscal year. Why three months? Because there are three months in the prior calendar year. If my fiscal year started July 1, 2014 and went to June 30, 2015, I would add 6 months to the date before calculating the fiscal year using the YEAR() function. The following figure shows the column expression I need to create a CompanyFiscalYear column.

Calculating the year was not that difficult. However, calculating the month number of the fiscal year is a bit more complex. Why do I need a fiscal month number? Remember how I used the month number column as the sort by column for the month names so that the months appear in the correct order. I again have a similar problem when I am dealing with fiscal calendars. In the above example fiscal year beginning October 1, 2014, the order of the month names must be:

  • October
  • November
  • December
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September

I can calculate the fiscal month number using the expression shown in the following figure. Note that it is a bit more complex than the expression to calculate the fiscal year. Where I’m using the number ‘9’, I’m really using ’12 – 3′ in which ’12’ is the number of months in the year and ‘3’ is the number of fiscal months in the first calendar year which in this case is October, November, and December. Therefore, if my fiscal calendar started in July, I would use ’12-6′ or ‘6’. A similar argument would explain why I need a ‘3’ at the end of argument for months in the second calendar year to increase the fiscal month number of these months appropriately. Again I would use a ‘6’ for a fiscal year starting in July because there are 6 months in the first calendar year that are part of the current fiscal year.

The final column I’m going to show today is the calculation of the fiscal quarter. In my October-September scenario, the first quarter of the fiscal year includes the months October, November, and December. Since I already have calculated fiscal month numbers, I can simplify my calculation of the quarter by simply dividing the fiscal month number by ‘3’ and rounding the result up to the next whole number if necessary using the ROUNDUP() function. Thus the following figure shows an easy calculation of the fiscal quarter.

That’s all for this time. I hope this helps you create useful date tables with not just traditional calendar dates and calculated fields, but also fiscal calendar fields that you can use in your pivot tables and reports.

C’ya next time.

I Like Your Moves

Moving Files between SharePoint Libraries

You may occasionally encounter the need to move a file from one SharePoint library to another. While there is the brute force method (download the file to your local machine and then upload it back to the new library which may be in a different site), there are at least two other easier methods.

For the first method, you need to know the URL of the library to where you want to move the file before you begin the process. Then follow these steps.

Step 1: Navigate to the library containing the file you want to move. For the purposes of this article, I’m going to start from a library in our team collaboration site to move a file to our intranet site. The specific starting and ending destinations are not the important issues here. Rather the technique is what is important.

Step 2: Locate the file you want to copy or move from one library to the other and hover over the right side of the document’s Title. Click the down arrow to open the document menu as shown in the following figure and select the option: Send To and then select Other Location from the secondary fly out menu.

Step 3: This opens the Copy dialog in which you need to specify the destination library or folder as well as the filename to be used in the destination library. (If you do not know the URL of the destination library, you may need to close this dialog and find that URL first. Don’t say I didn’t warn you.).

Step 4: Note that the default location is the site in which the current library resides. If you simply want to copy a file from one library to another library within the same site, you can just add the library name to this URL. If the library name has spaces, you must replace each space with the characters %20. The following figure shows that the file is being copied to a library named: “Server Training”. It also shows that the name of the file will be changed to ‘Add a new user role’. (Yes, this means that you can create a copy of a file in the same library by giving the copy a new name.)

Step 5: When you click OK, the Copy Progress dialog appears. You can review the information on the copy here before you press OK. After you click OK, the copy proceeds and if successful updates the Copy Progress by showing the text string that: The copy operation was successful.

If you wanted to move rather than copy the file, you must go back to the source library and manually delete the original copy of the document.

If the copy fails either because you entered the URL incorrectly or you do not have Add rights to the destination library or there already was a file in the destination library with the same name, you will get a message that the copy operation failed.

This method may be okay if you only have a few files to copy, but if you have a large number files, copying them one at a time from one location to another can be time consuming. A better way to transfer multiple files from one location to another may be to open the library using Explorer. To do this, follow these steps.

Step 1: Open the source library and display the Library ribbon. In the Connect & Export group, locate and click on the Open with Explorer command/icon. (I could be wrong, but I believe you must have Internet Explorer on your local machine for this to work.)

SharePoint opens an instance of Windows explorer that looks just like the display of any local folder from your desktop, but displays the contents of the SharePoint site.

Step 2: In a similar fashion, open another tab on your browser or even another browser instance and navigate to the destination library. Again open the library and then click Open with Explorer from the Connect & Export group of the Library ribbon. You should now have two separate instances of Windows Explorer open, each displaying a different SharePoint library. Now either close or minimize your browser windows to get them out of the way.

Step 3: Arrange the two Windows Explorer instances so you can see the source file as well as the destination library.

Step 4: Now simply click on the file or files you want to move and drag them from the source window to the destination library.

Note, as the image below shows, you can even copy SharePoint files back to your local computer by opening Windows Explorer from your local computer and navigating to the folder where you want to copy or move the files.

If you just click on the files and drag them from one window to another, the files will be copied. However, if you drag the files by pressing and holding down the right mouse button while you drag the files, when you release the mouse button you will be prompted with the following dialog.

In this menu Copy here is bolded because that is the default option. However, by clicking the Move here button, you can copy the files to the new location and then have the original files from their original library automatically deleted.

You can even create shortcuts to the files and place them on your desktop by pressing the right mouse button and dragging the files from their source to the Desktop within Windows Explorer and then select Create shortcuts here. Your copy of Windows Explorer may also support other options like the ability to create a zip file from the copied files and placing just the zip file in the destination folder.

That’s all for this week.

C’ya next time.

Populating Your Data Model with A Query

Over the last two weeks we talked about normalizing your logical data model in PowerPivot to help make it easier for users to understand and work with (https://sharepointmike.wordpress.com/2015/03/21/your-logical-data-model-is-not-normal/) and (https://sharepointmike.wordpress.com/2015/03/28/dealing-with-blanks-in-your-data-model/). However, in each of these models, we did all the data manipulation inside the data model. That left a lot of technical fields and tables that we could not remove from the model, but which took extra memory which could limit the size of our data model. So how do we solve that problem? We could do much of the data manipulation outside of the data model or as part of the import step. Let’s see how.

I’m going to start by importing product data from the Access database version of Contoso into the Power Pivot data model.

In the Table Import Wizard, I’m going to select the option to write a query that will specify the data to import. This option allows me to combine data from the dimProduct, dimProductCategory, and dimProductSubCategory tables into a single file. I can also embed the calculation to fix the blank Class fields we visited last time.

Clicking Next, I see the Specify a SQL Query dialog. This is an intimidating dialog. I could manually enter a SQL query here. However, I would have to know all the field names from all the tables that I want to bring together. Not impossible, but there has got to be an easier way. Wait. What is this Design button in the bottom right of the dialog?

When I click the Design button, I get an equally intimidating dialog that looks like it has an area at the top to again manually enter a SQL statement and a grid area at the bottom to display the results of the SQL statement if I were to test run it by clicking the exclamation point (!). Obviously, this screen is not very friendly toward building Access database queries. Nowhere near as friendly as the query builder inside Access itself.

If I go back into Access and build my query there using the query builder, I can visually design my query. First, I add the tables I want to use from the Show Table dialog that appears when I open the Query Designer from the Create ribbon. Then I can add the fields I want to include from any of the included tables by double clicking on their names in the tables at the top of the dialog.

I can even create my calculated column here by entering a new name in the row of the design grid followed by a colon (:). Then I can use the same expression as I did last time to test the value in the ClassName field.

NewClassName: IIF(([ClassName]=””),”[No Class]”, [ClassName])

Note that the IIF() function in Access has a double ‘I’. Otherwise, it works the same. Also note that I’m using a different name for the new column here to keep it separate from the field in the table dimProduct.

After I have finished creating the query, I can test it by clicking the Run button in the Query Tools Design window. If the query executes correctly, I am ready to copy my query to my Power Pivot data load screen. If I open the dropdown menu under SQL View in the Query Tools Design window, I see an option: SQL View as well as some other views.

This option displays the SQL statement which I can copy and then paste back in the Excel Data Model Table Import Wizard as shown below.

If I were to try to run this query directly, it would fail. Admittedly, it took me awhile to figure out why. I finally found that it did not like the one field definition: dimProduct.Size. To test what was going on, I clicked on the Design button to take me to the screen which displays both the SQL query and a grid of results from testing the query. After some testing, I found that if I redefined this field as: dimProduct.[Size], the parsing engine was happy and gave me the results I expected as shown below. The square brackets are used to clarify to the browser that the text within the brackets represent a field name, not a command or reserve word.

I then clicked OK for this screen and Finish on the next screen. I was rewarded with the data engine reading and loading my Product data. Note that it is a single table now. By including the name of the product category and the product subcategory in my query, I no longer need those two additional tables thus reducing the overall size of my data model a bit.

Next I still need to load the Sales data from Contoso along with the date and channel tables. The follow screen shows that I can open the Access database a second time to extract the other tables leaving out the product and related tables.

After the FactSales table is loaded, I still need to create a relationship with the new Product table I pulled in with the query. Even though FactSales and dimProduct were linked in the original Access database, by loading the tables separately, the relationship is not automatically discovered. Therefore, I have to use the Create Relationship dialog to link these two tables.

If I were to display the diagram view of my Power Pivot data model, you can see that the model has been simplified down to only four tables rather than six as we used before.

If I now build the same Pivot table that I did last week to shows sales by class, you can see in the following figure that the results are the same.

I can even create a hierarchy in my product table to go from category to subcategory to product. I covered building hierarchies before (). After building the hierarchy, I strongly recommend hiding the supporting columns so as to not confuse the end-users. (See: https://sharepointmike.wordpress.com/2012/07/21/power-pivot-hierarchies-part-2-of-powerpivot-2012-series/)

Denormalizing your data as you import it may sound a bit more complex, but it can save you memory and Power Pivot’s main limitation is that it must hold the entire data model in memory. So the smaller you can make that model, the more data you can load into it.

Some people might ask how far do I denormalize the model. Of course, in theory, you could denormalize to a perfectly flat single table. However this is not practical as it can serious limit the flexibility of the model and it can even cause other data errors. I tend to think in terms of model objects. For example, Products are an object in the model. Therefore, collapsing the Product Subcategory and Product Category tables into Product makes sense to me. If I had tables Stores, Cities, Countries, and Geography, I might be tempted to collapse them together as well since they represent a Location object for sales. On the other hand, I would not denormalize all the product information into the FactSales table.

So if you are thinking that there is no hard and fast answer to when should you denormalize a model, that is somewhat true. It is a bit of an art. However, if you have a table that is used just as a lookup for a name or perhaps one or two other associated fields and it is not a role playing table (See: https://sharepointmike.wordpress.com/2012/07/21/power-pivot-hierarchies-part-2-of-powerpivot-2012-series/), denormalize that table.

That’s it for this series. C’ya next time.