Time In A Cube


This is the first of a two part series on building and using time dimensions in SSAS and Excel Pivot Tables.  This week I focus on the SSAS side of the question.  My purpose is to show how to create a time dimension table from date fields in the fact table.  So let’s get started.

Let’s begin with the assumption that you have a table that you want to use in SSAS to build a simple cube.  The data I’m using comes from Texas vaccination data, but any table with dates will do.  The following illustration shows the column schema of my fact table along with the names of the other tables in the database.

Note that the table PatientVisits started as my primary fact table.  It contains links to several other supporting tables such as RaceTable, CountyData and Visit_Vaccinations.  Initially, the VisitDate field was not indexed, but I’ve already created a basic index which will be used as the foreign key into the date dimension table we will create here.

Let’s assume that we want to analyze vaccinations from different counties over time.  We first need to create a date dimension table.  In SSAS, we cannot use the date field in the fact table as a dimension.  Think about how we might use dates, we might also need to group visits based on other date criteria other than individual dates such as month, quarter or even year.  These fields obviously do not exist in the original fact table.  So the first thing you might want to do is to define what date groupings you might need or want.  I tend to use a generic date dimension which includes several of the more popular ways of grouping date data for all of my analysis.  In any specific case, I may add a few additional groupings, but at least it gives me a start.  These groups are shown in the following figure:

Because I frequently build this same table to get started with a project, I have stored the code needed to create my date dimension as the following script.

After running this script, I now have a table that I can populate with my date data, but what dates should I use.  My standard default is to look at the earliest and latest dates in the fact table as my end points for the table.  I usually extend these endpoints to include full years.  Therefore, I take the earliest date year and make the starting date January 1st of that year.  Similarly, I take the latest date year and extend it to the end of that year, December 31st.  I store these end point dates in variables in my script using the following equations.

My goal then is to use a script to loop through the dates between my starting and ending dates to add the corresponding date information to my date dimension table defined earlier for each day in that date range. (If I were working with a time dimension, I would use similar logic to define my earliest and latest times and then step through at some time interval for the times between these two points.)  In general, that means I’m looking at something like the following for a basic code block:

The challenge now is how to calculate each of the date (or time) columns that I need.  SQL Server T-SQL provides a useful date and time function,DATEPART() that can help you create almost any column you can imagine.  You may need to combine the results from the DATEPART() function with some additional functions to format the information or combine different parts together.  But it is still quite flexible.  For the columns that I needed here, the functions along with the parameters I used are:

In some cases, I need to change the format of the data returned by these functions to either combine it with other data (such as when combining text with numeric data) or to calculate other information I need for user friendly labels to the date data.  The following script shows my final version of the basic code block that loops through the dates to calculate my date information and to insert it into the date dimension table.

Running this script produces a date dimension table that includes not only all of the dates in my fact table, but also any dates missing from the fact table as well as dates at the beginning and ending of the period to fill out a full year of date data.

At this point, I’m almost ready to load my data into SSAS and start building my cube.  To work with just visits, I could directly continue from this point.  On the other hand, to work with vaccinations given by time period or by county, I also need to flatten the many-to-many relationship between PatientVisits and Vaccinations into a single table which I called Patient_Vaccinations.  This was required because a single patient visit could involve several vaccinations and obviously, each vaccination type was given to multiple patients.  I will discuss flattening (denormalizing) of tables in SSAS in a future article.]

Opening BIDS, I then create a new project for vaccinations.  The first step is to define a data source.  This is easy because I can simply define a connection to the database table as shown in the following figure:

Next, I define a data view.  In this step, I use the data source defined in the last step to select the tables I want to use.  This can be as simple as selecting the tables I want by click the checkbox to the immediate left of the table names.  After selecting the tables, any existing referential links between the selected tables will automatically be used.  If I have tables that are not linked, I must define the link.  I cannot have tables in the data model that are not linked.  In this case, I want to link the VisitDate column in the PatientVisits table with the PK_Date column in DateTable.  Note that the primary key side is DateTable since there is only one record in the table for each date.  The following figure shows the diagram linking all of the tables in my simple patient vaccination model.

Next, right click Dimensions in the Project to define the dimensions.  I need to create 3 dimensions for this example: Race, County, and Date.  When you create the County dimension, it automatically includes the region dimension as a hierarchy level (more about this at a future time as well).  So essentially, your data model should look like the following diagram.  Note that by flatting patient visits and vaccinations into a single table, I can effectively replace 3 tables with a single table.

I am now ready to define the cube itself.  Right click on Cube within the project to create a new cube.  First I must define the fact table.  This will be the Patient_Vaccinations table in my case.  Notice that I do not have to include all of the fields in the table.  I only include those fields that I know I might use a possible measure in the cube.  Keeping my active data small will improve performance.

Next I must select the Dimension tables I want to use.  Note that in this case, my model consists of a single fact table and all of the dimensions that I defined earlier.  This may not always be the case.  I can have a model which includes several cubes using different fact from the same or even different fact tables.  With each fact table, I can define a different set of dimensions needed to support and manipulate the data as the cube dimensions.  But I did say this was going to be a simple model.

When I am done defining the cube, I must perform two additional steps.  I cannot simply jump to the Browser tab to view the cube.  I must first select Build from the xxxxxx menu.  This process is generally fast and should only take a few seconds.  Then I must select Deploy from the same menu to prepare the cube to be viewed by a browser.  (Again I may go into details about what this means at a future time.)

If the cube Builds and Deploys successfully, I should see the following screen image which shows the basic framework of the cube along with a Field list of the available fields that can be used as measures and dimensions.

I generally select the measure I want to display first and add it to the Values box.  Then I begin playing with different dimensions to analyze how the measure changes as a function of the dimensions used.  The following figure shows one possible analysis of the number of vaccines given by quarter within year by county.  Of course by changing the dimensions, I can see many other relationships in your data.  In fact, the more dimensions I have the more possible relationships I can explore.

Obviously, there is a lot more you can do with cubes, but I’ll stop here for this time because next week I want to show how to create the same result using PowerPivot for Excel without needing to create a separate date table as needed here.  Also next Saturday is Orlando Code Camp (www.orlandocodecamp.com).  I will be presenting this topic live at one of the sessions.  Come see me and say, ‘Hi!’

See you then.

Advertisements

3 comments on “Time In A Cube

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s