Happy ‘NULL’ Year

SQL Server has provided support for NULL values in table columns for… well just about forever.  A NULL value is not really a specific value through.  Rather it indicates the absence of a value.  More to the point, one NULL is not equal to another NULL.  If two values are unknown, how can you say that they are equal or even that they are not equal to each other?  In fact, the result of comparing two NULLs must logically be a NULL indicating that the result of the comparison is unknown.

Why go through all this discussion about nothing?  Well actually, it is not nothing because nothing is a specific value also known as ‘0’ for numbers and an empty string for string variables.  NULL is not a specific value as I said before.  So you might say that a discussion about NULLs is a discussion about something that you don’t know.

Confused?  Ok, enough of that.  Suppose you had a table of customers that included their address information.  You have fields for the customer’s street address, city, state, and country.  Unfortunately, not every country has states.  They may have provinces, districts, regions, or perhaps nothing at all.  For that reason, the table designer for the customer table decided to allow NULL values for the state column so that it could be ignored in countries that it does not apply.

This method sounded great at first.  In fact, for simple queries to list or count customers by state, you could use a simple query like the following to get all customers from the state of Florida in the United States.

SELECT custname, address, city, state, country
FROM Customer
WHERE state = ‘FL’

This statement would work great.  It would work even better perhaps if you had an index on the state column, but that’s another discussion.  But what if you wanted to know all the customers from United States in which the state was not entered.  You may be tempted to write the following query.

SELECT custname, address, city, state, country
FROM Customer
WHERE COUNTRY = ‘United States’
AND state = NULL

Notice in this SQL statement that the second Where clause tries to compare the column value in State to the NULL constant.  Unfortunately, this will not work because you cannot compare one NULL value to another without resulting in a NULL as discussed earlier.  By resulting in a NULL value for the comparison, even the records you want to see will not be returned by this query.  You might even triy the following statement to return customers not in Florida but in the United States.

SELECT custname, address, city, state, country
FROM Customer
WHERE COUNTRY = ‘United States’
AND state <> ‘FL’

Again because NULL values are strictly speaking unknown, any record in which the value for state was NULL would return a NULL value for the Boolean expression: state <> ‘FL’ and therefore not be returned by the query.  Sure you would get all of the customers from other states in the United States, but you would not get any records where the value for state was omitted.  This might lead you to believe that all United States customers had properly identified states, but you would be wrong.

So how do you filter on NULL values?  You must use a query like the following:

SELECT custname, address, city, state, country
FROM Customer
WHERE COUNTRY = ‘United States’

In this expression, the SQL statement is not attempting to compare the value of state to a NULL value but rather is just checking if the value of state is NULL.  This seems like a subtle difference at first, but it is extremely important.

So when can you use the NULL constant if not in queries?  Think of the NULL constant as a way to set a value for a column in a table to NULL as in the following:

UPDATE customer
SET state = NULL
WHERE country = ‘Bahamas’

This statement sets the value of the column state to NULL for all customer records from the Bahamas.  Is that a specific value?  No, it merely indicates that the value is undefined.

After the holidays, I may come back to look at how to improved the performance of your SQL queries and some of the idiosyncrasies that you may encounter.  For example, how you compare dates depends on the data type of the date field.

C’ya next year.



That’s What Christmas is all About

“And there were in the same country shepherds abiding in the field, keeping watch over their flock by night. And lo, the angel of the Lord came upon them, and the glory of the Lord shone round about them: and they were sore afraid. And the angel said unto them, ‘Fear not: for behold, I bring unto you good tidings of great joy, which shall be to all people. For unto you is born this day in the City of David a Savior, which is Christ the Lord. And this shall be a sign unto you; Ye shall find the babe wrapped in swaddling clothes, lying in a manger.’ And suddenly there was with the angel a multitude of the heavenly host, praising God, and saying, ‘Glory to God in the highest, and on earth peace, good will toward men.'”

That’s What Christmas is all About.

SharePoint MetaData – 2

This week I’m back to finish my look at metadata as an alternative to using folders in not just your document libraries, but in any library or list.  If you missed the exciting first part of this discussion, just go back to the blog from December 15, 2012.

So to pick up from my last example, if all I could do was display my documents with metadata columns, I would not be that excited about switching from folders to metadata.  Sure I can use the metadata columns to sort and filter my view using the options in the view definition, but did you know that you can define ad hoc filters for any view by using the dropdown menu associated with most column headings.  Simply hover your mouse over a column heading and if a dropdown arrow appears, click on it.  The figure below shows the dropdown for the column named: Grant.


The dropdown displays a list of all the unique values in the selected column.  If more than one item has the same metadata value, that value only appears.  After selecting a specific grant, the list displays only the documents associated with that grant within the definition of the current view.  That could mean one or more documents, but by definition must include at least one document.

To take it a step further, I can also hover over the category column and select a document type.  When I select a document type, SharePoint adds this filter to the grant filter to display all documents of a specific document type for that specific grant.  Note that adding filters does not replace previous filters.  To clear a filter for any column, you must go to that column’s dropdown menu and select the Clear Filter from option.

One limitation of this method is that you can only define a single filter criteria for any one column.  In other words, it is not possible to define a filter on two or more different grants.  One possible solution to this limitation is to define another metadata column that groups grants in such a way that selecting values from this new column displays documents in multiple grants.

If I find myself filtering on the same criteria often, I usually consider creating a view for that filter.  Views can either be Public or Private.  Anyone who has at least view permission to a library can use a Public view when they access to the library.  On the other hand, Private views can only be used by the person who creates them.  Each library or list must have at least one public view to serve as the default view.  Once defined as a default view for the library/list, that view cannot be deleted.  Non-default views can be deleted.  Therefore, to remove a default view, first create a new public view or promote one of the other public views to be the new default view.  Then you can delete the view that previously served as the default.

A  library/list can have many different views associated with it.  Each view can have a different filter on the documents, a different set of metadata columns displayed, a different sort, a different style and even a different set of groupings.

Groupings are like another set of sort levels in some ways.  A view definition only supports two levels of sort.  However, groupings provide two additional levels that appear above the regular sort fields.  The primary advantage of groupings over sort fields is that I can expand or collapse each group individually using the small box with a plus or minus sign to the left of each group name thus showing or hiding documents to focus on only the ones of interest.

In the following figure, you can see how the grant named 21st CCLC Southwest/West was expanded along with the document type Application to show individual documents.  Everything else is compressed conserving screen real estate.  This is similar to the way filters help users focus on only the documents that are important at the moment.


One of the best values of the metadata approach for me can be illustrated by what happens when you need to add another year to the structure.  In a typical folder approach, I would have to go into each grant and then each category in each grant to create a folder for the new year.  In the metadata approach I simply add the year in the text field for the Year column when I add a document for the new year.  If I defined the Year column as a Choice column, a quick edit of the library’s column settings to add the new year value to the possible values for the Year column gets the job done.


Furthermore, if the values for the Year column come from a lookup table, I can simply add a new record in the lookup table for the new year value.


Given a list like the one below, any user with member rights (edit) can go into the list to add new values.  However, values should never be deleted from a list.  In fact, there are ways in SharePoint 2010 to define referential integrity so that values in a lookup list cannot be deleted if they are referenced by another list or library.


Anyone with Edit permission to the list can just click Add new item found at the bottom of the list to display a dialog to add a new item.

So using metadata provides greater scaling abilities as my document library grows with easier maintenance, easier search support for finding files, the ability to include identifying data in the document list to help users select the correct file and a variety of ways to display my document list including public and private views.

And speaking of supporting multiple views, check out the final image for this blog entry.  This figure shows an example of a single library that has multiple views available.  Of course, the same would apply to a list.


That is all for this Saturday.  Of course if you are reading this, we made it past the end of the Mayan calendar and you are just going to have to go out and buy those Christmas presents you were hoping not to have to buy. And if you already bought a bunch of Christmas presents thinking that you will not have to pay for them since the bills would not get to you until after the end of the world, guess again.  Oh, and to Congress, you really do have to worry about that fiscal cliff now.  If you are not reading this, oh well… I should have use the time for something else.

C’ya next time.

A Balanced Approach

There is no doubt that everyone in America has been thinking about the people and especially the children killed in Newtown Connecticut last week.  Over the next days, weeks, and months, I’m sure we are going to hear calls for gun control, maybe even making gun ownership by anyone in the public illegal.  Unfortunately, I truly do not believe that will solve anything.  If there is one thing that mankind has excelled in over the centuries is finding new and unique ways to kill his fell man.

It was only a little over a decade ago the country was rocked by the terrorist attack on America resulting in  two planes crashing into the Twin Towers in New York, the Pentagon in Washington, and a field in Pennsylvania.  The government immediately intervened to add extra inspections of people and luggage going on planes around the country.  How successful have these measures been?  It is impossible for us average citizens to know, but we do know that no additional planes have been flown into buildings, at least on purpose.  Does that mean that the terrorists have given up?  Or perhaps they are just planning a different approach?  Have our intelligence groups prevented some of these other methods?  I would like to think so.  However, I still worry that a determined terrorist group may eventually find another way to strike terror into our nation.

The same is true of these people who commit mass-murder.  Even if we were to eliminate all guns for private citizens in America (something I think will be difficult considering the number of legitimate hunters), they will just find another method to achieve their goals (and no, I’m not going to list all of the other options here.  Use your own imagination or watch TV or the movies.)

So what is the solution?  Dr. Oz  thinks that we need to focus more on mental health through better screening and testing.  I don’t know about you, but screening people for any propensity to commit mass-murder or any other crime is starting to sound a lot like the move Minority Report.  Just because someone is depressed, upset, or feels someone wronged them does not mean that they will turn to mass murder as their solution.

Stress.  I am sitting here thinking about what factors cause people to do evil things.  Honestly, I cannot think of just a single reason.  Rather, I suspect there are many factors.  One of those factors is stress.  No one will argue that stress has not increased over the last century for most people in the developed world if not all countries.  There is stress to get more done with less.  There is stress as families get smaller and the responsibility to care for sick family members or elderly family members have to be  shouldered by fewer and fewer family members.  There is stress from all the bad news on TV and the radio.  In fact many people report anxiety after watching hour after hour of disasters such as the school attack last week or the terrorists attacks of 9/11.

Greed.  Somebody said to find out who committed a crime, you often only have to follow the money.  Some people will do almost anything to get more money.  They would sell out their best friend, neighbor, or even a family member if they could get money for it.  Several months ago there was a story on a Freakanomics podcast where as an experiment, a person offered to buy someone else’s soul for $50.  He were not expecting anyone to really agree to such a contract.  But one person did.  The person who agreed to sell their soul argued that since they did not believe the soul actually existed, that getting $50 in exchange for something that did not exist in the first place just made economic sense.   They would have held out for more than $50, but no one else was willing to pay more.

Power.  Another common phrase is: ‘Power corrupts and absolute power corrupts absolutely’.  I suppose this statement could be applied to many of the ‘great’ dictators, kings, and other leaders throughout history, although in many cases, I’m not so sure that greed did not also play a part in that obsession with gathering more power since the two seem to often go hand-in-hand.

While these factors may all play a role in much of the evil in this world, there probably is also a component of just psychological imbalance because of some physical brain issue, but I don’t think I would recommend everyone going through some type of psychological test just to try to find those that might be included to violence.  After all, where would you draw the line?  Extremes are seldom the right answer.  Banning all guns from private citizens (they will still find a way to obtain guns). Submitting everyone to psychological testing (sounds like ‘Guilty until proven innocent’).  Elimining money to eliminate greed (without money, incentive to excell could also be eliminated).  Eliminate all power (without some type of power or government, there could be chaos and no planning for the common needs).  Eliminate all stress (without some stress, people might just sit around and do nothing since the stress related to providing for basic neesd is what makes things happen).

Perhaps, the answer is as it has always been a balanced approach.  Sometimes mankind swings too far in one direction or the other, but human forces tend to eventually brings things back to center.  So I guess for now I’m hoping that we can just get through the rest of this Christmas/Hanukkah/Holiday (you pick the one you want) season without any more incidents. Pray for the families of those who were murdered in Connecticut.  Pray also for our soldiers in foreign lands, our fire fighters, police, healthcare professionals and all the other service people that we depend on to keep us safe and well.  Finally, I wish you all Peace on Earth, Goodwill Toward All Mankind.

C’ya next time.

Why Use SharePoint Library Metadata Instead of Folders – Part 1

For the next two weeks, I’m going to go back to a SharePoint topic that is very important to me.  It is one topic that many people have trouble with. That topic is the use of metadata in libraries rather than nested hierarchies of folders.  Many of you know that I have been an advocate of using metadata to organize document libraries for some time.  I have tried to convince users that folders are bad.  In fact, folders within folders within folders down through a half dozen levels or more are not only bad, but I maintain that they are just plain evil.  In this blog entry and the next, I will explore some of the reasons why you want to avoid folders and never look back.

The problem with folders started back in the ancient history days when people first learned how to store files on the personal computer.  Originally, users tried to store everything in one large directory that we know as the root directory of a disk drive.  This practice soon encountered physical limitations in the size of the root directory and the number of file entries it could contain.  Rather than placing all of their files in the root directory of their hard disk (which many people actually did or at least tried to do), computer users were then taught to think of folders on their hard disk drives as folders in a filing cabinet.  Each file folder in the cabinet could hold actual documents, letters, etc.  Each folder could also hold other folders which could then hold documents or even additional folders.  This paradigm has become almost second nature to most computer users and is still in use today.

Unfortunately, as the number of folders and documents grew over time, it became easy to ‘lose’ things.  Microsoft, Apple, and other operating system vendors attempted to create utilities that could be used to search for documents.  However, these were often inefficient and at the very least slow.  SharePoint introduces a new paradigm to store documents without the use of folders while making it easy to find specific documents you may want.  This paradigm is based largely on metadata that defines the contents of the documents.  While I will show here how to manually use metadata to organize and fine documents in a library, the new search capabilities of SharePoint can also use the metadata to help narrow down your search results.  So with that said, let’s get started.

One of the problems with folders is that the number of nested levels tends to grow over time like a Mandelbrot set (think fractal image).  The result may look pretty, but it is almost impossible to navigate from one place to another.  In the land of libraries, users have trouble remembering whether the path to the file they want is:

Department —>Group —>Project —>Task —>Year —>User —>Category


Department —>Group —>Project —>Category —>User —>Year —>Task

As a result, users may save the same document in two or more places.  At that point, some users may update one version while other users update the other version.  Soon no one document has all of the updates.  Even if someone notices the problem and identifies each copy, the task of consolidating the versions back to a single version of the truth can be difficult if not impossible.

Another problem is the length of the URL to locate a file.  Several Internet sources pin the size limit of a URL within SharePoint using Internet Explorer at only 2083 characters.  That sounds like a lot of characters.  It would be if not for others who say the limit of a referenced URL including lists and folders is a mere 256 characters.  But then a URL is often more than just the path to the file when you consider query string information tacked on to the end of many URL references.  In addition, there are other related limits.  For example, a site name might be limited to 128 characters.  Document library names can grow to only 255 characters while a folder name is limited to 123 characters.  The bottom line to all of this is that it is possible to create an overall path to a file in a document library that could be too long if you ‘cheat’ and use Windows Explorer mode to build, populate, and work with the files in your SharePoint library.  If so, other users who work through the SharePoint interface may have a problem when they attempt to access files deeply buried under many folder levels with long folder and document names.

Both of these points make valid arguments for not using folders or at least minimizing their use, but like that TV infomercial there is more.  For this third argument, I will use a simplified version of a document library to build the case for using metadata while also touching on how the above two arguments are also addressed.  Our overall goal is to make finding and organizing your files so much more efficient that you will not be able to wait to get rid of your folders.


In the above figure, you see a small portion of folders within a folder named: Grants.  Imagine however that there could be dozens if not hundreds of grants rather than only the 4 shown here.  If we just dropped all documents related to a grant into the corresponding grant folder at least we would have all documents grouped together by grant rather than in one large pile.  However, because we could eventually have hundreds of documents within a single grant, we may decide to create a second level of folders to organize the files by file category or file type as shown in the next figure.


Perhaps we also have folders within each file type for the year that the document was created.  We may even have other levels, but let’s stop the madness here for now.

The first point I want to make about this type of structure is that it is difficult to add more levels or even more folders within a level.  For example, suppose we needed to add a category type of Resolution as shown in the following figure.


Sure we could add this new category to the one grant that needs it.  However, if we do that and someone managing a different grant needs a similar folder and calls it Decisions instead of Resolutions, how would we know whether the files in these two folder types were the same or different?  Even worse, what if someone added both folders to the same grant?  Where would you look first to find files of this file type?  In addition, should we add this folder to all grants whether it is needed or not to provide a constant structure to each of the grant folders?

In addition to adding the folder itself, consider our example in which the level under the document type is year.  Each of the other document type folders has one folder for each year as shown here.


To be strictly correct, would we need to add folders for years under Resolution as well?  If the year folder also had subfolders beneath it, the same argument would apply to each of those levels as well.

Imagine the amount of work adding these folders might entail if our grants folder consisted of dozens or a few hundred grants.  Managing such a structure would be burdensome.  On the other hand, failure to follow through could lead to inconsistent structure between grants making it more difficult to find documents if each grant folder’s structure is different.

So what can we do instead?  Within SharePoint, you have the ability to add columns to the default columns provided by SharePoint when you create a new library or list.  These additional columns can have different data types such as dates, numbers, text, etc.  If we define these columns using definitions that help separate data much like folders do, we can create a structure that functionally gives us the same benefits as folder and much more as well.

To begin defining additional columns, referred to as metadata for the library, we should look first at the types of subfolders used within grants.  The obvious choices are document type and year.  So, we might want to begin by modifying the library structure to add both of these columns to our library.  Perhaps you think that you need a character type column for the document type and a numeric type column for the year.  A good rule of thumb in database column definition that applies here is:

If a fields contains content that looks like numbers, but is never used in calculations, store it as a character field.

For that reason, fields like zip codes, course numbers, and many others are stored as character fields.  After all, when was the last time you needed to add two zip codes together?


You can see an example of what the library list might look like in the above figure.  However, we do not have to stop there.  We can create additional columns to provide more data about each document such as the business area of the grant, the application area, the fund number, or even a sub-category.  Since metadata means data about data, we can add just about any information that might be useful to the user so that it is not only visible, but as I will show next time, this can be used in ad-hoc filters and sorts as well as permanent views.

The Power of the President

Now that everyone has had time to think about the results of the election in this country, maybe it is time to ask will it matter?  Maybe just as important, where do we go from here?

Many people think the President of United States can just step in and make whatever changes he or she may think.  However, we got rid of kings almost two hundred years ago and most people would not want to return to that form of government.  Therefore, we must remember that whomever is elected president is not a miracle worker nor are they given unlimited powers to do anything they want.  Well, that is true at least as long as the other two branches of our government, the legislature and the judicial, exercise their constitutional authority to provide a check and balance to the executive branch.  It is that balance of power that forms the foundation of our government and one that we need to maintain.

Speaking of power, what power does the president actually have if they just cannot go out and do whatever they want?  Possibly the best known is the president’s power as Commander-in-Chief to order military action or to stop military action.  Since the Congress still controls the budget for the military, it is not as unilateral as you might think.  After all, you cannot fight a military action without funding.  However, the President, not Congress or the military can decide which conflicts to become involved in.

The President can also decide whom to appoint to many high level positions.  Although these appointments may need to be approved by Congress, the President, not Congress gets the ball rolling by deciding whom to appoint.

The President can also decide which laws to enforce and which to ignore.  However, doing so without some good reason can result in substantial and negative backlash from the public.

The President has the power to persuade Congress to do or not do any action.  We have seen many examples of this over the years, but most recently concerning passing new health care laws and some of the recent decisions over taxes.

The president also has power over foreign affairs including negotiating with foreign governments.  In the past, the United States has used this power to accomplish things like the opening of trade with China or helping to tear down communism in Eastern Europe such as the tearing down of the Berlin Wall.

We would like to think that every president wields their powers in the best interests of the Constitution, the United States, and its people.  Of course, we get the opportunity every four years to get rid of any president we don’t agree with through the peaceful process of voting.

That process works only as long as the American people are awake and that they study the issues that are important and the views of the candidates on those issues.  Unfortunately, many people do not really know the issues.  Rather, they merely vote along party lines without any idea about who the candidate really is or what they stand for.  In fact, many people simply vote on name recognition which is one of the reasons why candidates spend such huge amounts of money on their campaigns.  Of course, there are many people who vote based on what their other family members or friends say without any real investigation about the candidates or issues on their own.  Finally, there are many people who just do not vote out of apathy, their conviction that their vote simply does not count.

In our recent election here in the United States, several states declared a ‘winner’ before all write-in votes were counted.  What does that say about how the states and even the country feels whether your vote is important or not?  Even if the final count could not possibly change the decision as to who the ‘winner’ is, should such announcements be delayed to ‘show’ that everyone’s vote counts?

That is all I have to say today.  I am sitting in a hospital waiting room as my wife undergoes some exploratory surgery related to her cancer.  The news I got just a few moments ago is bad.  While we will continue to fight until the end, the light at the end of the tunnel has receded.  I only hope that the future of our country which has also been troubled for almost as long as my wife’s cancer diagnosis (some might say longer) stands a better chance of recovery.

I’m still hoping for a real leader who can inspire our nation to turn around the cancer that has been eating at our country’s greatness and returns it to the leadership role in the world it once had, a world where we can all help raise the lifestyle of all people around the world, a world in peace.

C’ya next time.

Calculated Column or Calculated Measure, 2

Last time I walked you through an example of how calculated columns and calculated measures in a pivot table can sometimes result in the same value and sometime in different values.  This time before I start, I need to show you an image that was accidentally omitted close to the end of that blog.  I talked about adding the calculated column profit margin and the calculated measure profit margin to show that the results were not only different, but that the calculated column was not correct.  That missing image is reproduced here:


In general terms, this example demonstrates the fact that you do not sum percentages of individual items to get an overall percentage.  You cannot even average individual percentages to get the overall percentage.  Rather you must calculate the percentage of a group of items by first summing the individual values the make up the calculation and then perform the calculation of the percentage.  That is why the measure calculation subtracts the sum of total cost from the sum of total sales (which is the same as saying the sum of total profit) and divides that result by the sum of total sales to get an overage percent.

However, not all calculated columns are necessarily incorrect.  Suppose I needed to apply a discount to sales based on the sales channel.  The following table shows the discount amount as a percentage that I need to apply by channel to sales in that channel.

Channel Discount Amount
Stores 15%
Catalog 8%
Reseller 0%
Online 4%

The following figure shows these discount amounts applied to our model using a calculated column with the name DiscountedSales based on the following equation:

=[TotalSales] * (1-[Discount])


In this figure, I can easily verify that the final discounted sales amount reduces the total sales amount by the indicated discount percent for each of the channels.  However, remembering that calculated columns must perform this calculation for every row in the model, I may decide to check the possibility of using a calculated measure which only performs the calculation for the cells needed in the pivot table.  (Yes, I know this is a small demo table, but imagine a table with hundreds of thousands of rows.)  The first question might be, “How to I define a measure for profit margin?”  You may begin with an equation like the following in which you just sum each of the terms:

= SUM([TotalSales]) * (1 – SUM([Discount])

This equation may seem reasonable at first. It does result in a Total Discounted Sales value of $6,557,300 on Total Sales of $14,255,000.   However, after a little thought, I realized that this discounted sales total would represent over 50% discount of the total sales.  Since none of the sales channels had a discount greater than 15%, such a result does not seem realistic.

The problem is that you cannot simply replace your column references from a row calculation with SUM(<column>) references when defining a measure.  So what should the calculation be?  In this case, it is rather easy to determine the correct calculation.  Remember that I began with the following equation to calculated the discounted amount for each channel:

[TotalSales] * (1-[Discount])

If I wanted to sum the discounted amounts across all channels, I would use the equation:

∑([TotalSales] * (1-[Discount]))

Notice the parentheses that specify the calculation of the individual discounted amount for each item before summing the result.  Using the distributed property from your math days, I can convert this equation to:

∑([TotalSales] – [TotalSales]*[Discount])

We also saw last week that

∑(A + B) = ∑(A) + ∑(B)

But it is also true that

∑(A - B) = ∑(A) - ∑(B)

Therefore, we should be able to write:

∑ [TotalSales] – ∑ ([TotalSales]*[Discount])

Translating this to DAX, you might try to write the equation as:

MyDiscountedSales:=SUM([TotalSales]) – SUM([TotalSales]*[Discount])

The first term in this equation is valid, but not the second term. SUM expects to see a column passed to it as a parameter, not an expression.  Fortunately, there is another DAX function that we can use, SUMX().  SUMX() supports two parameters, a table reference and an expression.  Therefore, we can replace the above equation with the following:

MyDiscSales:=SUM([TotalSales]) – SUMX(Sales,[TotalSales]*[Discount])

(where Sales is the name of table in my model)

This new measure calculates an overage discounted sales amount of $13,275,000 on sales of $14,255,000 which seems a whole lot more reasonable.  Adding these measures as columns to my pivot table results in the following:


The calculated column: Sum of DiscountedSales appears to be correct.  I can manually calculate the values for the individual channels and the sum for the channel and even the Grand Total can be easily verified.  On the other hand, the first calculated measure (MyDiscountedSales) while calculating the individual row correctly, does not correctly calculated the values by channel or the Grand Total.  The column MyDiscSales based on the measure of the same name displays values that match the calculated DiscountedSales column. The obvious column that must be incorrect is the MyDiscountedSales column because the channel sums are not correct.

In this example, the calculated column provides the correct answer, and after refactoring the equation used for the calculated measure, so does the calculated measure.

So can I do anything with the equation that calculates the profit margin column?  Unfortunately, I have not yet been able to find anything that transforms the equation to correctly calculate the values as a calculated column.  I believe this is because the values for the channel totals and the Grand Total cannot be represented by any simple function derived solely from the detailed row values.  You cannot add, count, or average the detailed values to get these roll-up values.  Rather they must be calculated individually for each cell.

So you have seen that calculations that only require addition or subtraction may be encoded as either calculated columns or calculated measures.  Even equations that include multiplication may be encoded as either calculated columns or calculated measures if you can transform the equation to something that looks like only addition or subtraction although you may have to use a mix of SUM and SUMX functions.

My best recommendation is to verify the calculated results as best you can, using common sense estimates of what the data should be as well as actual manual calculations to determine if the results are correct.  Don’t assume that just because an equation returns a value rather than an error, that it is correct.

C’ya next time.

Light Rail in Orlando and Seattle

Recently our town of Orlando has started building stations for a new light rail system.  The first phase of the project will run from north of town to south of town going through downtown itself. On the north side, it runs through the very populated area of Maitland and Altamonte Springs.  That will be great for their commuters coming into town to work.  We do not live in that part of town.  We live southwest of Orlando close to the Universal Studios and International Drive area, a big tourist area to say the least.  But the light rail will not come anywhere near us.

In fact, the southern route of the light rail will not go to Disney.  It will not go to the airport.  It will not go to the Florida mall.  It will not go anywhere near the outlets.  It will not go to the Orange County Convention Center.  Sadly, it will not go anywhere near anything that might help tourists get around.

Why do I want the light rail to help tourists?  Well, for one reason, we were just in Seattle Washington about a month ago and we took advantage of the light rail system there to get from the airport to downtown Seattle where our hotel was located as well as the convention center.  At a cost of only $2.75 for a one-way ticket, it certainly beats the $30 taxi ride alternative.  Sure there were several stops between where we got on and where we got off, but only about 10.  Depending on the time of day and thus the traffic conditions, the 45-minute ride may have even been faster than a taxi.

The train cars were a bit older than since the light rail in Seattle has been around for a while, but they were still comfortable.  My only complaint/recommendation to them is that since the line goes to the airport, an area to place luggage might keep the aisles from being cluttered with people’s travel bags.  I even met a person on my ride back to the end of the conference going to the airport who works for the marketing department of a major company in the Seattle area.  He told me that he takes the light rail every day to and from work.  It is less stressful than driving into town and he does not have to pay to park his car in one of the parking garages in part compensating him for the cost of riding the rail.  He usually spends the time reading the news on his Windows Phone or reviewing work documents or doing email so that when he gets home, he has more free time for his family.

I know the local light rail in Orlando tried to save money by using the existing train tracks used and owned by freight carriers.  They made a deal with them for access during certain daylight hours while freight trains were routed around the western side of the county to avoid the downtown tracks.  I’m sure that saved a lot of money, but if the train does not go where it is convenient for users, they will not use it.  Originally, one of the alternatives was to have the tracks go down the center of I-4 to the International Drive area.  I don’t know if cost or other facts squashed that proposal, but the new line reminds me of a road we use to call ‘The Road to Nowhere’ in Reading Pennsylvania.  That road was built as part of a proposed by-pass around town, but funding stopped after only a small segment was built which essentially went,… well… nowhere.

To get to the nearest station for the Orlando light rail from my house would be about the same distance as I would need to drive downtown and in fact, the traffic to and from the station over the existing road would be involve more traffic and probably more time.  Even if they eventually added a line from that station that extended from the Orlando airport at one end to International Drive/Universal area at the other end, it would be inconvenient and more time consuming because of transferring between trains. (Sort of like the difference between direct flights between two cities and flights with one or more stops where you have to run from one end of the airport to the other to get to your connecting flight.)

I wish them luck and success, but unfortunately, I don’t think the southern end at least will be of much help to me.

C’ya next time.

Calculated Column or Calculated Measure

No matter whether you are working with PowerPivot or the Tabular model of SQL Server Analysis Services, you should be familiar by now with creating both calculated columns and calculated measures.  At least that is true if you have reading my blog for the past several months.  You may be confused as to when to use a calculated column versus a calculated measure.  In some cases, they can yield the same results.  However, in other cases, the difference between the two can create very different results.  In fact, in some cases, no obvious error occurs, but the resulting values are very, very wrong.

For this case, I am going to use a very simple set of data rather than the full Contoso data set in order to make the calculations easier to understand and why some choices between calculated columns and calculated measures can critically affect your results.  Notice in the table below that there are only 8 records, 4 records for each of two years.  Each of the four records in each year represent the sales of a fictitious company through 4 channels (Catalog, Online, Reseller, and Stores).


For the first calculation, I will calculate the profit of each channel in each of the years (row context) using the formula:

=[TotalSales] – [TotalCost]

(Notice in this case I omitted the table reference since there is only a single table in my model.)

As you can easily verify, the total profit for stores in 2011 is $400,000 which is the result of subtracting $1,600,000 from a total sales of $2,000,000.  Similarly, you can validate the other calculated rows.  The point is that as a calculated column, Power Pivot performs the calculation once for each row of the table in which I define the calculation.  The resulting values become a part of the model and are only calculated once when the column is created and then stored in the model.  However, as you can imagine, a table with millions of rows of data could take a while to calculate.  The following figure shows the calculated TotalProfits column for my 8 row data table.  Note that in 2012, the total profit for store sales was also $400,000 but this profit was on a larger total sales amount meaning that the profit margin was less.


Now suppose I created a calculated measure instead of a calculated column as shown in the following figure.  In this case, the formula for the calculated measure MyTotalProfit is:

=SUM(Sales[TotalSales]) – SUM(Sales[TotalCost])

The main difference between a calculated measure and a calculated column is that the actual calculation takes place when the calculated measure is added to the pivot table not before.  This is because the context of the calculation is based on the filters in the current table.  Remember that each row, column, slicer, or report filter is a filter for each cell displaying the calculated measure.


While our data is relatively small, this means that our calculation of MyTotal Profit occurs 5 times.  One time for each of the sales channels in the final pivot table, and one time for the grand total.  However, you can quickly see that no matter whether we sum the calculated column for TotalProfit over the years for each sales channel or use the calculated measure, we get the same result.  The reason is because of the following mathematical fact:

(A-D) + (B-E) + (C-F) = (A+B+C) – (D+E+F)

In this expression, the left side of the equation represents the sum of the calculation column for TotalProfit while the right side of the equation represents the calculated measure which subtracts the sum of costs from the sum of sales.

So in this case, the two methods, calculated column and calculated measure, yield the same result.  Which method results in the better performance would depend on the total number of rows in the table and the filters applied to the final pivot table.


Now suppose instead of looking at total profit as a number, we want to look at percent margin.  Afterall, a large dollar profit on an item that sells extremely well may not return as much profit per dollar sold as an item that sells less, but at a high profit per sale.  Let’s first see what happens when we calculate the calculated column PercentMargin using the formula:

= (Sales[TotalSales] – Sales[TotalCost]) / Sales[TotalSales]

As you can see in the following figure, the percent margin (displayed as a decimal value here) can readily be validated using the TotalSales and TotalCost columns.  For example, in 2011, sales at stores resulted in a 20% margin while sales at stores in 2012 resulted in only a 18.1818% margin on each dollar sold.


We could also calculate a calculated measure called MyPercentMargin using a similar formula, but summing total sales and total costs first before calculating the margin.  The expression in this case is:

(SUM(Sales[TotalSales]) – SUM(Sales[TotalCost])) / SUM(Sales[TotalSales])

As you can see in the following figure, this formula results in a total profit margin of about 22.2589% which looks reasonable considering that the individual channel profit margins for each of the two years ranges from 18.1818% to 28.5714%.


Adding the calculated column profit margin and the calculated measure profit margin to a simple pivot table shows the results in the following table.


The first thing to note is that in this example the values in the last two columns are different.  Simply looking at the percent values may not in this case tell you which column is correct.  However, your first hint is that the Grand Total for the Sum of PercentMargin column is much larger than any of the individual columns.  This is not mathematically possible.  The overall percent margin for all sales must be between the lowest and the highest percent margin of any of the detail rows.  This makes the Sum of PercentMargin column suspect.  On the other hand, the grand total for the MyPercenMargin column does fall between the range created by the lowest and highest percent margin of the individual rows.  With the help of a calculator, you can quickly see that the correct value for the percent margin of store sales is:

((2,000,000 + 2,200,000) – (1,600,000 + 1,800,000)) / (2,000,000 + 2,200,000)



This verifies that the calculation in the MyPercentMargin column is probably correct.

Why the difference?  The key is that I introduced division into the calculation.  As long as I was just adding and subtracting values for the calculated column, I could use the distributed property to create a measure that performed the same task.  However, by introducing division to calculate margins, I cannot  simply sum the individual profit margin percentages of each channel for each year.  Rather I must calculate the total sales and total costs first and perform the margin calculation on these totals.

So what is the general rule?  If your calculated column formula merely adds or subtracts values, you can typically use either a calculated column or calculated measure.  However, as soon as you introduce multiplication or division, you might think that you must use a calculated measure in order to get the correct result as shown in this example.  Well, that is not quite true either.  Next time, I’ll show an example in which the calculated column provides the correct result and the calculated measure does not.

C’ya next time.