Time in a Pivot Table

Last week we looked at how to build and use a Date dimension table based on dates from our selected fact table to be used within a SSAS cube.  Until relatively recently, building a cube in SSAS was the only way to analyze datasets with over a million records.  With the introduction of Microsoft Office 2010, a new add-in to Excel called PowerPivot has become a game changer.  It allows the average person, not just DBAs, to create large pivot tables based on tables having millions of records.  In addition, Power Pivot lets the user combine data from different data sources as long as a common field can be used to link the tables.

But one of the key differences between SSAS and PowerPivot that the average Office Poweruser can now build their own pivot tables using tools already on their desktop/laptop computer with the addition of the free PowerPivot add-in from Microsoft.  However, just like with SSAS, we must at least have a star or a snowflake schema for the tables.  The data I started from in the last example as well as here has a many-to-many relation between Patient Visits and Vaccinations that is resolved through the table Visits_Vaccinations as shown in the figure below

I denormalized this relationship creating a single table to replace the many-to-many relationship by using the following statement in the SQL Server database before connecting to any data through PowerPivot.

The schema now consists of only the following four tables:

Using PowerPivot for Excel as described in my previous blogs, I can now load the data into the PowerPivot window.  Next, I would define any necessary relationships between these four tables as shown in the following figure taken from the Manage Relationships dialog (Relationships group of the Design ribbon).

Before I can proceed to creating the pivot table, I need to define the other date dimensions.  However, unlike what I did last week, I will not create a separate date table.  (Note, I could do that, but there is a simpler method.  I can simply create the new date data in the table Patient_Vaccinations using DAX calculated fields.

In the Pivot Table window, open the Patient_Vaccinations table using the tabs at the bottom of the window.  Scroll to the right to the last column in the table.  The next column has the header: Add Column.  Double click this heading to edit the text changing it to Year.  When I press the Enter key, the selected cell automatically moves to the first row under that column heading.  I can now enter a formula for the column.

Formulas in the PivotTable window may look like normal Excel formulas when you first look at them, but they really use the DAX language.  DAX is a special set of functions specifically created for working with PivotTable data.  DAX consists of 135 functions, but 71 of those functions are very similar to functions you may already know from Excel.  In fact 69 have exactly the same names.  I will be using just a small number of these functions today but I will work with the FORMAT function which is similar to the TEXT function from Excel.

While DAX can be used to create calculated columns and measures, I will only use the functions here to create calculated columns.  A calculated column definition always begins with an equal sign ‘=’.  So with the first row under the new column selected, type the following but do not press Enter:

= YEAR(

Note that a tooltip appears helping you with the parameters that the function requires.  In this case, it is telling you that there is one parameter and it must be a date.  I could simply enter the cell reference for the date I want to use, but a simpler method is to click on the date in the same row and let Excel enter the cell reference.  In my case, the formula now looks like the following:

= YEAR(Patient_Vaccinations[VisitDate]

The equation is almost finished.  I only need to add the closing parenthesis to end the function and then press Enter.

This is where the magic happens.  Not only does Excel calculate the value for that first cell beneath the header, but it applies the formula to all of the cells in that column without you having to copy and paste the formula down through all the rows.  PowerPivot does this because there can only be a single formula in a column.  Therefore, it automatically does the work for you of applying the formula to all rows in the table.

In a similar way, you might want to add the following columns (and formulas) to build the various date components needed for your pivot table.

Year =YEAR(Patient_Vaccinations[VisitDate])
Day =DAY(Patient_Vaccinations[VisitDate])
Month_Number =format(month(Patient_Vaccinations[VisitDate]),”00″)
Month_Name =format(Patient_Vaccinations[VisitDate],”MMM”)
Quarter =int((Month(Patient_Vaccinations[VisitDate])-1)/3)+1
Quarter_ Name =”Qtr ”   & int((Month(Patient_Vaccinations[VisitDate])-1)/3)+1
WeekDay =WEEKDAY(Patient_Vaccinations[VisitDate])
WeekDay_Name =FORMAT(WEEKDAY(Patient_Vaccinations[VisitDate]),”dddd”)
Week_Number =WEEKNUM(Patient_Vaccinations[VisitDate])

As you can see, some functions take you directly to the value you want while other functions may need some formatting help.  Typically, you will use the FORMAT() function to convert a number to a name such as day of week or month name, etc. but you can also use it for other formatting options such as displaying the full 4-digit year (‘yyyy’) as opposed to just a 2-digit year (‘yy’).

The calculation for quarter was a bit more challenging because there is no function to return the quarter number.  However, it is quite easy to calculate.  Simply take the month number and subtract 1.  Then take that result and divide by 3 (the number of months in a quarter).  Then take the integer result and add ‘1’.

Of course you could build other calculated columns, but these will suit our need.  It is time to build the pivot table from the data.  From the Home ribbon, find and click on the PivotTable icon in the Reports group.  In the dialog that appears, choose whether to add the Pivot table to the current sheet or a new sheet.  It must however be in the current workbook as the Pivot Table Window and all of the data.

The initial view of the pivot table on the sheet (shown below) is very similar to the view of the cube in SSAS, but it is more like the pivot table feature in standard Excel which has been available for years.

In the top portion of the Field List panel, you will see a list of all your tables.  To access the fields within the table, click the box with the plus sign to the left of the table name.  You can now select using the checkboxes to the left of each field in the field list fields that you want to appear in the pivot table.  By default, Excel assumes that numeric values should be used as measures and string should be row labels.  You may not always agree.  Therefore, clicking on a field name and dragging it to the appropriate box in the lower portion of the Field List panel may be a more accurate way to select your fields.

Furthermore, Excel will assume that numeric measures placed in the Values box (measures) should be summed while string measures placed here should just be counted.  You can change this (count numeric fields, but not sum string fields) by using the dropdown menu to the right of the field name and selecting Edit Measure.  This pops up a dialog which lets you choose from a list of five predefined aggregations (Sum, Count, Min, Max, Average).  You cannot define a custom calculation here.  However, in a future blog, I show you how to use DAX to create your own custom measures.

In the figure below, you can see the result of a simple pivot table analyzing the number of vaccinations per year by each of the racial group.  You may also want to compare this result to the cube created last week to see that it generates exactly the same results.

Could I have used the date dimension table created last week in the same source database?  Yes, but I wanted to show that in PowerPivot tables, that is not necessary.

In closing, today I’m at Orlando Code Camp giving the presentation: Do you have the time?  This presentation roughly follows the steps you read here for last week and this week.  In future weeks, I will examine other ways to use DAX in your PowerPivot projects.

See you next time.

Advertisements

Why Does This Project Still Cost Me Money?

Does your management only budget time and money for the development of new software application development and not maintenance? Is management surprised at the ‘cost’ of maintaining applications after they have gone into production?  Do they view IT as a ‘money-pit’ because of the increasing costs of maintaining software they thought they paid for in full years ago?  Do they understand that maintenance and enhancements of older apps can over time take an increasing percentage of any development team’s time greatly reducing their ability to develop new applications or conversely requiring the IT department to continually add to their staff?

One analogy might be buying a house.  As with the initial purchase of a house, the up-front cost of building a new application can be substantial.  It is the hope that the software will return a benefit over cost over time typically in the form of improved efficiency, better accuracy, or greater profitability.  Initially after going into production, the software may run with little intervention for some time giving people the impression that the project is done.  But at some point, someone will request a new feature or a change to an existing feature.  In some cases, requests for change come in fast and furious right from the day the program goes into production.  In either case, IT management must make a decision whether to address each issue as it comes in or to accumulate changes and then go through a major revision of the application incorporating several of the changes at once.

Of course, getting developers to jump from one application to another to ‘fix’ change requests as they come in can be counterproductive and should be reserved for only critical ‘show-stopping’ issues.  The reason is that developers have to build a mental model of the application before they can begin making changes.  This takes time.  In fact, some studies have shown that even a simple interruption to a programmer to answer the phone or an email while they are working on a problem can result in a loss of 15-30 minutes as they attempt to get back into the ‘zone’.  Jumping from program to program is even worse.  Therefore, most major development organizations collect changes over a period of time and then dedicate one or more developers to implement all of the changes as a single release or new version.  It also minimizes the time needed to test the changes since testing can occur once rather than several times.  Moving the application from development to QA and then to production occurs less frequently with the corresponding savings in time and effort.  And finally it can make it easier on the end-user so they do not have to change the way they work with the application every couple of days as new features or changes are made one at a time.

If you think about both styles of handling changes to an application (immediate vs. grouped), you can visualize the effect on the overall cost of maintenance not just for one application, but for the entire development team.  But let’s say that you have a new application that immediately receives requests for a lot of changes and they need to get in right away.  From a project standpoint, you might ask instead why these problems or deficiencies were not found during testing before the product went into production.  Was it pushed into production to meet a time schedule rather than held back for all features to be completed and tested properly?  Perhaps, the original specifications were incomplete.  A common fallacy is that developers should know better about the way the program should work.  However, the reality is that developers create applications to a set of specifications.  If those specifications are not complete, it is not realistic to expect the developers to necessarily uncover all the deficiencies.

Finally, the last point for now is that any application that is actively used and has had numerous revisions made to the base code eventually reaches a point when making additional changes becomes too difficult.  Perhaps the code has become like a plate of spaghetti due to the number of changes.  Sometimes the problems are technical as in the case when the original programming language is no longer supported or the database engine is five releases behind the current version and no one really understands how to deal with the limited features of that version.  You may even encounter an application written in a language that no one in your organization still knows how to develop code in.  The difficulty in trying to untangle the spaghetti or finding a person knowledgeable in that ancient programming language used by the Mayans to create their calendar may lead you to the decision to re-write the application.  The advantage of re-writing the application comes from the ability to: eliminate features no longer needed; streamline the code using new functionality and brining all components (languages and databases) up to current standards.

So when is a software project done?  Some would argue, never.  At least it can never be considered done as long as the problem it is solving still exists.  To that extent, management must plan a certain amount of resources (time and money) to manage the changes that inevitably occur.  Often this is translated into a percentage of budget or a percentage of staff.  Some ways to minimize these ‘after-production’ costs are to insure that the original project (as well as all subsequent updates) are well documented.  But that doesn’t always happen.  So be prepared for projects with inadequate documentation to consume more time for even the simplest of changes.  Also consider that no developer likes to be dedicated to fixing other people’s problems all of the time which I’ve written about before.  Good programming standards and best practices can go a long way toward minimizing the time needed for changes and possibly even help reduces problems that need to be fixed.  Ultimately, it is the responsibility of IT management to help the appropriate business leaders understand the software cycle in relation to post-production maintenance so as to set realistic expectations.  Just remember, like the weather in Florida, change happens.

A last reminder:  This Saturday is Orlando Code Camp.  This is a free day of training on a large variety of topics ranging from traditional application development, web development, Windows phone application development, and SQL Server development.  I will be speaking on the topic: Do You Have the Time?  This session is an overview of how to create and use a date/time dimension in SSAS and direct time data usage in relational tables with PowerPivot for Excel (with a little help from my friend, DAX).  Hope to see you there.

Time In A Cube

This is the first of a two part series on building and using time dimensions in SSAS and Excel Pivot Tables.  This week I focus on the SSAS side of the question.  My purpose is to show how to create a time dimension table from date fields in the fact table.  So let’s get started.

Let’s begin with the assumption that you have a table that you want to use in SSAS to build a simple cube.  The data I’m using comes from Texas vaccination data, but any table with dates will do.  The following illustration shows the column schema of my fact table along with the names of the other tables in the database.

Note that the table PatientVisits started as my primary fact table.  It contains links to several other supporting tables such as RaceTable, CountyData and Visit_Vaccinations.  Initially, the VisitDate field was not indexed, but I’ve already created a basic index which will be used as the foreign key into the date dimension table we will create here.

Let’s assume that we want to analyze vaccinations from different counties over time.  We first need to create a date dimension table.  In SSAS, we cannot use the date field in the fact table as a dimension.  Think about how we might use dates, we might also need to group visits based on other date criteria other than individual dates such as month, quarter or even year.  These fields obviously do not exist in the original fact table.  So the first thing you might want to do is to define what date groupings you might need or want.  I tend to use a generic date dimension which includes several of the more popular ways of grouping date data for all of my analysis.  In any specific case, I may add a few additional groupings, but at least it gives me a start.  These groups are shown in the following figure:

Because I frequently build this same table to get started with a project, I have stored the code needed to create my date dimension as the following script.

After running this script, I now have a table that I can populate with my date data, but what dates should I use.  My standard default is to look at the earliest and latest dates in the fact table as my end points for the table.  I usually extend these endpoints to include full years.  Therefore, I take the earliest date year and make the starting date January 1st of that year.  Similarly, I take the latest date year and extend it to the end of that year, December 31st.  I store these end point dates in variables in my script using the following equations.

My goal then is to use a script to loop through the dates between my starting and ending dates to add the corresponding date information to my date dimension table defined earlier for each day in that date range. (If I were working with a time dimension, I would use similar logic to define my earliest and latest times and then step through at some time interval for the times between these two points.)  In general, that means I’m looking at something like the following for a basic code block:

The challenge now is how to calculate each of the date (or time) columns that I need.  SQL Server T-SQL provides a useful date and time function,DATEPART() that can help you create almost any column you can imagine.  You may need to combine the results from the DATEPART() function with some additional functions to format the information or combine different parts together.  But it is still quite flexible.  For the columns that I needed here, the functions along with the parameters I used are:

In some cases, I need to change the format of the data returned by these functions to either combine it with other data (such as when combining text with numeric data) or to calculate other information I need for user friendly labels to the date data.  The following script shows my final version of the basic code block that loops through the dates to calculate my date information and to insert it into the date dimension table.

Running this script produces a date dimension table that includes not only all of the dates in my fact table, but also any dates missing from the fact table as well as dates at the beginning and ending of the period to fill out a full year of date data.

At this point, I’m almost ready to load my data into SSAS and start building my cube.  To work with just visits, I could directly continue from this point.  On the other hand, to work with vaccinations given by time period or by county, I also need to flatten the many-to-many relationship between PatientVisits and Vaccinations into a single table which I called Patient_Vaccinations.  This was required because a single patient visit could involve several vaccinations and obviously, each vaccination type was given to multiple patients.  I will discuss flattening (denormalizing) of tables in SSAS in a future article.]

Opening BIDS, I then create a new project for vaccinations.  The first step is to define a data source.  This is easy because I can simply define a connection to the database table as shown in the following figure:

Next, I define a data view.  In this step, I use the data source defined in the last step to select the tables I want to use.  This can be as simple as selecting the tables I want by click the checkbox to the immediate left of the table names.  After selecting the tables, any existing referential links between the selected tables will automatically be used.  If I have tables that are not linked, I must define the link.  I cannot have tables in the data model that are not linked.  In this case, I want to link the VisitDate column in the PatientVisits table with the PK_Date column in DateTable.  Note that the primary key side is DateTable since there is only one record in the table for each date.  The following figure shows the diagram linking all of the tables in my simple patient vaccination model.

Next, right click Dimensions in the Project to define the dimensions.  I need to create 3 dimensions for this example: Race, County, and Date.  When you create the County dimension, it automatically includes the region dimension as a hierarchy level (more about this at a future time as well).  So essentially, your data model should look like the following diagram.  Note that by flatting patient visits and vaccinations into a single table, I can effectively replace 3 tables with a single table.

I am now ready to define the cube itself.  Right click on Cube within the project to create a new cube.  First I must define the fact table.  This will be the Patient_Vaccinations table in my case.  Notice that I do not have to include all of the fields in the table.  I only include those fields that I know I might use a possible measure in the cube.  Keeping my active data small will improve performance.

Next I must select the Dimension tables I want to use.  Note that in this case, my model consists of a single fact table and all of the dimensions that I defined earlier.  This may not always be the case.  I can have a model which includes several cubes using different fact from the same or even different fact tables.  With each fact table, I can define a different set of dimensions needed to support and manipulate the data as the cube dimensions.  But I did say this was going to be a simple model.

When I am done defining the cube, I must perform two additional steps.  I cannot simply jump to the Browser tab to view the cube.  I must first select Build from the xxxxxx menu.  This process is generally fast and should only take a few seconds.  Then I must select Deploy from the same menu to prepare the cube to be viewed by a browser.  (Again I may go into details about what this means at a future time.)

If the cube Builds and Deploys successfully, I should see the following screen image which shows the basic framework of the cube along with a Field list of the available fields that can be used as measures and dimensions.

I generally select the measure I want to display first and add it to the Values box.  Then I begin playing with different dimensions to analyze how the measure changes as a function of the dimensions used.  The following figure shows one possible analysis of the number of vaccines given by quarter within year by county.  Of course by changing the dimensions, I can see many other relationships in your data.  In fact, the more dimensions I have the more possible relationships I can explore.

Obviously, there is a lot more you can do with cubes, but I’ll stop here for this time because next week I want to show how to create the same result using PowerPivot for Excel without needing to create a separate date table as needed here.  Also next Saturday is Orlando Code Camp (www.orlandocodecamp.com).  I will be presenting this topic live at one of the sessions.  Come see me and say, ‘Hi!’

See you then.

Microsoft SQL Server 2008 Certification Study Group for BI Track

In conjunction with the south Orlando chapter of PASS, MagicPASS, I will be leading a virtual study group to prepare for the MCTS Business Intelligence Development and Maintenance certification exam (70-448).  To join, you must be a member of PASS, but you can join for free at: http://www.sqlpass.org.  Look for the link along the left side of the main page to sign up for a free PASS membership.  We are just getting started so you haven’t missed anything yet.  For details, send me an e-mail at:  MPA_SharePoint@hotmail.com.

Flipping the Classroom

I was scanning through iTunes for a podcast to listen to over the weekend and found an interesting one on TED Talks.  The one I finally settled on had an interesting title: Let’s use Video to Reinvent Education presented by Salman Khan at TED 2011.  Perhaps you might think that I found the title interesting because I work for a school district.

Not really.

I found it interesting because I’ve always been a self-learner.  Except my idea of being a self-learning was buying books and magazines and reading them.  Of course, more recently it has been e-books and Internet sites.  The speaker, Salman Khan talked about flipping the classroom.  In essence, he meant that through the use of training videos stored on U-Tube, students would get their lecture at home, not in the classroom.  In the classroom, they would focus on ‘homework’ where they could ask questions from their peers or the teacher when they got stuck on a topic.

What a concept.  How many of you have sat in a classroom listening to a teacher drone on in a monotone voice better suited for putting you to sleep than paying attention to the subject.  Or maybe you had a teacher or professor like I had in one of my college math class who wrote equations with one hand while he erased them with his other hand.  Most of us were so busy trying to write down everything he wrote on the board that I don’t think any of us remember more than two words that he spoke during the entire semester.

So I’m sitting on the living room sofa watching this video and thinking, ‘If only I had a tool available like this when I was high school and college.’  Imagine being able to pause your teacher when you just became so over-saturated with the words coming out of their mouth that they were beginning to blur into a dull background noise.  Now imagine hitting a pause button on your teacher to take a few minutes to digest what was said before continuing.  Or maybe your teacher introduced a topic today that referred to something covered last week, last month, or even last year.  Maybe you did not quite get it then, but you need to understand it now in order to move forward.  Just replay that old video and watch it again or maybe watch it several times until you do get it.  No one is counting how many time you re-watch the video. Imagine trying to do that to your live teacher.  On second thought, I’d rather not.  Once was enough.

At the time the video was recorded during TED 2011 the site: www.khanacademy.org had a little over 2,200 videos on topics ranging from math to science to history, but when I check the site last weekend, they said they have over 3,000 videos and even have videos on finance and economics.  They claim they have over a million students a month viewing their videos and that number is growing.

Of course this site would be great if you have kids currently in school even if only as a supplement to their regular class, especially to cover topics they are having problems with.  In fact, some school districts (not ours) are experimenting with using the videos on this site to replace or at least partially augment their regular classes.  But here is the cool thing.  Even adults like you and me can go out to this site and learn (relearn?) these topics so that when your kids ask a question, you might even be able to explain what an integral means or how to evaluate a polynomial.  You might even find some of these classes enjoyable for the sake of just simple self-enrichment.  Maybe you always wanted to learn a little about art history, astronomy, or perhaps you want to understand venture capital, the Geithner Plan or the Paulson Bailout.  Maybe you just want to get a basic understanding of banking and money or you just want to make some sense out of current economics.  I found some interesting and timely videos on how China affects our economy.

All of that is great, but there is another part to this site.  Students who take classes through this site can take tests.  But in these tests, the student has to score a perfect 10 questions in a row before they can move on to the next module.  Why?  Well, as Salman says, what happens when a student gets only 70% of the questions right, or even what happens to the student who gets 95% of the questions right?  If you guess that they get moved on to the next topic without any real attempt to fill in those knowledge gaps, you are right (and can move on).  The problem is this ‘swiss cheese’ approach to knowledge eventually catches up with even the best students as they need to understand 100% of earlier topics to master new topics.  (Imagine only knowing 95% of the alphabet.)  This would seem to make perfect sense.

Yes this system is self-paced and depends a great deal on the motivation of the student, but if conducted in a ‘class’ setting, there is also a peer factor that will create a healthy competative environment.  Reports are available for the teacher to identify which students are having problems with different topics.  The teacher can then spend class time either inviting other students who have completed a particular module to help other students who are struggling with it.  If it is a new module for all students, of course the teacher can provide that same mentoring.  The site also provides ways for parents and even non-parent volunteers to join the site and serve as mentors or coaches.  This means that you could be helping not just your own kids, but perhaps you could be helping someone on the other side of the world learn geometry.

I don’t know if there are other sites like this, but if any of my readers know of any, please post information about them in the comments for all of us to benefit.  Oh, did I mentions using the site is FREE!  So how does the Khan Academy make money to pay salaries and provide a site like this that handles hundreds of thousands of students a day?  Well I haven’t been able to find absolute proof of it, but based on the videos available about the academy, it appears that Bill Gates and/or his foundation are providing the ‘grant’ or investment money to get this concept up and running.  And if that is the case, then I have to give a big salute to Bill for supporting a very noble and worthwhile cause.

I’m going to end this post here because there are a couple of videos I want to watch about current economics.

See you all Saturday for our next weekly technical topic.

Using External Content Types in SharePoint with Microsoft Office – SQLSat #110

Last Saturday was SQL Saturday #110 in Tampa FL.  I was scheduled to present my session on using SQL Server as an External Content Type in SharePoint and Microsoft Office.  I’ve done this session before.  Several times in fact.  It is loose based on a series of blog entries I created back in March of 2011.  I practiced going through the demo several times before the event and thought that everything was OK except for the demo segment where I connect an external list with contact information to Outlook so that it show up as a new contact folder.  But I’ve determined that problem is related to some changes made to my demo machine in the security settings, but the rest worked fine.

Well I got up early Saturday morning and drove to Tampa (only a little over an hour from our house) an was there before the start of the first session.  I felt ready.  I even did not mind having my session in an adjacent building, or at least I did not mind at first.  When my session time arrived (I had the second session of the day in this session room), I connected my laptop to the provided projector and the first thing I noticed is that my resolution just went crazy.  I’m use to working in an environment with at least 1680 by 1050 with my laptop typically set at 1920 by 1200.  With most development environments today having menu bars and property panels and toolbars, and work areas, you really need a large screen resolution to see anything meanful.  So my first sign of potential problems came when the projector dumbed down my screen to 800 x 600.  I knew that would make scrolling around to find things a problem, but that wasn’t the only problem.

In fact, it that was the limit of my problems, I probably could have gotten by.  The real problem came when in some of the demos, dialog boxes popped up that were larger than 800 x 600.  In fact I hit one dialog box while building the filters for the first external content type definition that was so large that I could not see the buttons at the bottom of the screen.  The dialog box was a fixed size box so I could not resize it.  I tried to move the window up but would not work either.  So I started using the TAB key to try to guess which button was activated before pressing the ENTER key.  Well that sort of worked, but it was scarry.  Then opening the SharePoint site, I noticed that all of the Silverlight extensions would not display.  For example, when you select Create to define a new list or library, I got the old style column and link display of different types of objects I could create.  In fact, the further I went into the demo, the worse it became.  In one part of the demo that I tried to jump to just to get something to work, I need to open the template for the document library so I can use Quick Parts to insert document properties from the library metadata into the template so it can be used when a person creates a new document in the library.  Now even Word refused to open.

I yanked out the cable to the project and everything worked again.  But of course I could not expect the attendees to crowd around me to view my screen to watch what I was doing.  So apologized profusely and decided to just end the pain and the session.  Trust me when I say that no one was more disappointed than I was.  I knew I could go back and reproduce all of my presentation later, capture key screens, wrap the images up with a little text description and make the resulting document available as a PDF for anyone interested to download and view.  Therefore, this week’s technical blog entry is really the PDF referenced by the hyperlink following this paragraph.  It takes you step-by-step through what would have been my presentation.  It is a token of my appreciation to those who came and were disappointed by not being able to see my presentation.  In addition, this blog represents by 100th blog entry here.  So as a present to all of you faithful readers, here is the link:

Using External Content Types in SharePoint with Microsoft Office

 

 

The Coming Educational Software War

Back in the days before there were PCs, before Microsoft existed as a mega-force in the software industry, there were Apple II computers, TRS-80s and the Commodore PET.  Ok, for the purist, Microsoft did exist, but they were still trying to find their way in 1980 creating a XENIX operating system for the Intel 8086 and trying to get a contract with IBM to develop an operating system and some programming languages.  Anyway, I had been out of high school for several years and had been asked to participate in a committee consisting of school staff and community members to evaluate and choose a personal computer platform for a computer lab.

 

During the first two meetings of the committee, we saw presentations by representatives for  Commodore and Radio Shack with both showing off their best educational software for these new personal computers.  Ok, by today’s standards, that software was pretty primitive.  But you’ve got to remember that most personal computers of the day had maybe 16K and worked with only a handful of colors on screens that were forty characters wide by 23 lines or less of text.  The next meeting was going to be the local computer store’s chance to show off the Apple II.  Having bought an early Apple II that supported only Integer BASIC natively, I was an early Apple computer advocate (My first book was for the Apple II).  So I talked with the store owner to see what software he wanted to show.  It was sad.  I had seen a lot more interesting software from our local Apple computer user’s group.  So I suggest that he talk to the group to see what they might suggest.  The result was several of the group’s members helped do a presentation of some of the latest educational software for the Apple II and the Apple II was ultimately selected by the school for its first personal computers.

 

It seemed like Apple Computer also independently realized that the market for personal computers could be driven by students who used computers in schools and then would demand to use the same computers when they started working.  Of course, we know that IBM was not going to let Apple Computer take the entire market.  But the interesting thing to notice is that Apple computers still dominate many school classrooms although the generic PC (now no longer built by IBM) has gained market share.  While Apple may have lost its way for several years, the latest Apple Computers and now the iPhones and iPads are commanding increased attention not only in schools, but in businesses as well.

 

So what is the point of this trip down memory lane?  My answer is: “Follow the money.”  And money is something that most school districts do not have a lot of right now.  Across the country, school districts have to make critical decisions on how to spend their dwindling allocations from taxes, tuition, and other income sources.  At the same time, students are demanding an increase in the technology they use to learn.  Most students bring some type of electronic device to school.  Some may have their own computers, iPads, or other tablets, but almost all of them have some type of smartphone that can connect to the Internet.   Many school districts have resisted this trend and have tried to prohibit students from bringing in their own electronic devices.  Ultimately this will be as effective as the VP of Information at a company I started at many years ago absolutely forbidding any department from purchasing or bringing in the early IBM PCs to work because after all, they were just toys and no one would ever do any real work on them.  Ultimately, he was forced into retirement.  Similarly, school districts who try to resist this tidal wave of student’s need to work through and with electronic devices will cause these districts to lose their best students as they transfer to private schools that are more enlightened.

 

But some school districts are already adapting.  They are promoting BYOD (Bring Your Own Device) to class and are finding ways to provide more instruction electronically.  In the state of Florida, there is a requirement to deliver a major percentage of education materials electronic within the next few years.  Most districts are not ready for this.  Some have their heads in the sand hoping the requirement will go away.  Well, it won’t go away.  It’s time to adapt or become irrelevant.

 

A related issue is the software that will be used to support these students with their diversity of devices.  It will be difficult to mandate specific software products unless they work across multiple devices and are available at very little cost or free.  Google is already attending educational conferences like FETC to promote Google Docs and their related family of products and tools.  Microsoft has made available Sky Drive and versions of their four most popular office products (Word, Excel, PowerPoint, and OneNote) through Office Web Apps.  Other companies are assembling on-line tools for email, collaboration and safe social networking for students.  Many products work with Twitter, U-Tube, and FaceBook, while some districts still actively block student access to these sites effectively making them unusable as a classroom tool.  Yes, there is a lot of inappropriate material on these sites and society expects our school to protect the children from them.  But we all know they still get to these sites from their personal devices and from devices in their homes.  It may be up to these companies or others to develop save sites to provide the same social networking.  Actually, some of these already exist.

 

So what is the answer?  I really think it is going to come down to who will give the best deal to educational institutions.  When you have no money, loyalty to products of the past means nothing.  You may even find yourself scrapping systems you used in the past because you can no longer afford to pay the maintenance fees or royalties.  In any real business decision, it really doesn’t matter how much you spent in the past on hardware or software.  The ONLY thing that matters is what it will cost you to supply those services in the future.  If you are using Oracle databases and SQL Server is cheaper, you’ll switch.  If you can get by with MySQL, you may switch.  If you cannot afford to pay the licensing on Microsoft Office, maybe Google Docs or Open Office will be your choice.  In some ways, it does not even matter if these are the best choices.  When you have no money, they may be your only choices.  What will happen to your staff if you make these switches?  Will an Oracle DBA want to work with MySQL or will they leave to work somewhere else using their skills and getting paid significantly more?

 

Education technology is at a turning point, a turning point driven by money or the lack of it.  Eventually this trend will translate into the products that students of today will want to use in their future workplace. While I personally don’t particularly care for many of Google’s policies especially related to privacy, perhaps they understand this trend better than others and they are poised to fight for this market.  I wonder who else is?

 

Using DAX to Join Tables on Multiple Fields

Last year, October 29th to be exact, I published a blog that I called “Getting Started With DAX Functions”.  It was originally suppose to be the start of a series of articles on DAX, but then I got side-tracked.  Between work issues and personal issues that served as great fodder for discussion, the rest of the DAX series got lost as I covered several Excel and SharePoint related topics.  In fact, I was about to launch into a another set of SharePoint web parts for displaying data different ways when I remembered that I never finished the DAX series.  To make completing this series even more important, I just received word that I will be doing a session at the Orlando Code Camp at the end of March on using DAX to create Date and Time dimension data.  As I went back to the DAX article from last October, I realized that I may have started at the wrong point.

In that first article, I jumped right into the date and time functions supported by DAX.  But perhaps we should have covered some additional fundamentals like why does DAX even exist and where did it come from?

The simple answer to that question is that DAX is a set based language that helps you manipulate data in Microsoft’s Vertipaq engine.  That’s what makes PowerPivot work in Excel, but we will get back to that a little later.  The purpose of pivot tables is to allow the user to easily analyze facts and compare the value of those facts as you change either horizontal or vertical parameters also known as dimension.  For example, if you look at football scores, the total points made by each team is a fact.  Some obvious dimensions are the names of the teams, the dates of the games, and even the names of the players who scored the points.  By using different dimensions to define the vertical columns and horizontal rows, you can analyze how teams performed as a function of the date, how players performed as a function of the date, or even how players performed as a function of the team they were on or even better, the team they played against.   For a company, a fact may be sales while dimensions might include the date, the product, the type of store the product was sold in, the state, or even the region of the country where the stores are located.  You might even want to compare sales to weather.  For example, Disney probably sells a lot more rain ponchos on rainy summer days in Florida than dry spring days.

So that is what a pivot table is.  How did they get started?  Actually pivot tables were not initially created by Microsoft.  Rather a simple form of a pivot table first appeared in Lotus Improv for the NeXT computer in 1991.  Other versions became available for PCs but it was not until 1993 that Microsoft Excel 5 introduced a new feature called a pivot table.

Without going through all of the versions of Excel to see how pivot tables have evolved becomming more powerful, let’s just jump to Excel 2010.  By this time, Excel supported worksheets with up to a million records.  This also defined the limit for the number of rows a pivot table could use as its source.  Excel basic pivot tables can work with data from only a single table, although that table could come from a variety of data sources.  But getting data from a single table probably means a trip to your friendly DBA with a box of fresh donuts to ask them to build for you a single table with all of the measures and dimensions you would need in your pivot table.  If they had to gather data from multiple data sources or even multiple databases, you might have to bring two or more boxes of donuts.

PowerPivot for Excel 2010 solves many of these problems by letting you work with individual tables with more than a million records.  In fact, some Excel experts such as Bill Jelen claim that it is possible to have a pivot table with 900 million records.  Of course there still are limits, but these limits are based more on current physical file storage limitations of the compacted Excel workbook rather than limits of the underlying Microsoft engine for PowerPivot named Vertipaq.  This strange sounding name actually has meaning in that the data for a PowerPivot table is compressed based on common column values across the rows rather than the typical database compression that tries to compress individual rows.  Since columns are considered vertical in a grid, the name Vertipaq for vertical packing/compression seemed like a natural choice.  BTW, Rob Collie can probably give you a much better explaining of how Vertipaq compresses data.

But PowerPivot does not stop there.  It allows you to use multiple data tables eliminating the need to build custom single tables with all the data.  That in itself should save you a lot of money by not having to buy all those boxes of donuts.  In fact, you can connect to and link these tables together using relationships just as you would in a regular relational database.

But the fun doesn’t stop there. These multiple tables can come from different data sources.  You can combine data from a SQL Server database with data from Oracle, Access, Excel spreadsheets, text files and at least a dozen other common data sources.  All you need is a common linking field from each table to create the relationship.

This last statement actually hints at one of the limitations of the current PowerPivot implementations.  Relationships between tables can only be created using a single column from each table.  What if the relationship is more complex than a single column requiring two or more columns?  That is one of the ways DAX can come to your rescue.

The following figure shows a portion of a table that holds the schedule for college football games.  No one column defined a unique or primary key.  For example, there are multiple records for each institution (team) since they play more than one game.  Also there are multiple records for each game date since more than one game is played on most days, especially weekends.

If we had two or more text columns, we could just concatenate the two columns to create a calculated column.  But in this case, we have a text column and a date column.  Fortunately for us, PowerPivot can perform some automatic data type changes (typically to the text type) so that we can create a calculated column from two fields and PowerPivot automatically converts the game date to a string before appending the institution name to it to provide a unique value (any institution can only have one game on any one day).

To create a calculated column, scroll horizontally to the last column on the right of the table and click in the header row of the first empty column.  Here you can enter the name for the new field.  Since this is just going to be a connecting column, I’ll name it: InstitutionGame.  To perform the concatenation, click in the first row beneath the header and enter an equal sign “=”.  This tells PowerPivot you want to define a calculation.  For this simple calculation we can simple enter the field references for the two fields we want to concatenate.  While there are several ways to enter field references, I prefer to begin by entering the table name.  Immediately a context sensitive dropdown appears with the field names within all tables that begin with the letter(s) you entered so far.  Simply enter more letters or scroll down through the list to highlight the field you want and press the Tab key to select that field.

After selecting the first field, enter the concatenation operator “&” and then repeat the previous steps to select the other field.  In my case, I see the following expression being built in the Expression box.

=DIVISIONBSchedules[Game_Date]&DIVISIONBSchedules[Institution]

When I press Enter, PowerPivot calculates the values for the entire column.  The calculation is really a set operation across the entire table rather than a cell-by-cell calculation as it would be in tranditional Excel, even simple Excel pivot tables.  However, the calculation is amazingly fast.  Next I would switch to the Division Offense table to get information on points scored.  Here I would create a similar calculated column using the corresponding game date and institution name fields for this table.

With both calculated columns defined, I can click on the Design ribbon and click on Create Relationship in the Relationships group

Since I left the focus of my cursor in the first row of the calculated InstitutionGame column of the DIVISIONBOffense table, the relationship dialog prefills this information into the first table and column name as shown below.

Next, select DIVISIONBSchedules as the related lookup table.  Since we named the calculated column the same, the Create Relationships wizard automatically supplies this field name as the Related Lookup Column. If the related column names do not match, you must use the dropdown to select the related lookup column.

In this case, we defined the direction of the relationship correctly.  However, even if we had started with the DIVISIONBSchedules table and tried to relate it to the DIVISIONBOffense table, the wizard is smart enough to recognize this situation and automatically correct it after popping up an informational warning that you got the relationship backwards.  (What a friendly wizard!)

At this point, you have used DAX to relate two tables using more than one column to define the relationship.  But that is not all DAX can do.  In the coming weeks, we will look at some other ways you can use DAX.

Sunny Award

Today, our Orange County Public School web site (https://www.ocps.net) was officially awarded a Sunny Award for the second year in a row.  This award is given to honor the most transparent government websites in the nation.  To receive an award, you must obtain an ‘A’ grade or better on the Sunshine Review’s Transparency Checklist.  This year we join 213 other government websites out of over 6,000 websites that were reviewed earning our second A+ award in two years.  Thanks go out to our entire team from those that manage the site to those who provide the content.  We have come a long way in 5 years.

Too Little Staff? Too Little Skills?

I think it is fair to say that every company has been hurt by the economy over the last several years.  Many have reduced the sizes of their staff.  Some maybe too far.  Others have cut back on expenses by eliminating things like training to conserve dollars rather than cut staff.  But now that the economy appears to be turning around (afterall, didn’t the DOW break 13,000 the other day?), you may be thinking about what your company should be doing to take advantage the upturn.  Maybe you think you don’t have enough staff to do all of the projects you want to take on to ride the wave of the upturn.  Perhaps the projects you want to do require new skills that your staff just does not have.  Maybe you think your staff is too small and the ones who are left don’t have skills you need? So what do you do?

I know some managers who will tell you that the solution is to bring in consultants to get the job done, to handle the tasks that require skills that your current staff does not have.  Maybe those consultants can bring one or two of your current staff in near the end of project so they can take over future maintenance tasks.  But what these managers are really saying is that they don’t have the faith in their existing staff to handle the project from the start.

Wow!  If you were an employee under that manager who may want to learn new things and take on interesting challenges that can make a difference, would you be satisfied?  Do you want to just be a clean-up player on the team while someone from the outside gets to come in get the big hits, earn big bucks and then walk out and leave you with the fun and exciting maintenance tasks?  We have a saying when management suggests this.  That saying is, “We don’t want to be the foster parents of every orphaned application whose parent took the money and ran.”

But seriously, if you have team members who show potential, they will leave you in a heartbeat as the economy improves and other organizations start competing for good talent.  So what do you do now?

First you have to admit that there is no magic solution.  However, there are some things that you can do to bring out the best in your staff but it will take some work on your part.  Start by evaluating the strengths and weaknesses of your staff.  Then look at your current and upcoming projects and match up your team’s strengths with your project needs.  The matches don’t have to be perfect.  Nothing ever is.  In fact, if there is room for your staff to stretch, learn new skills and challenge them to grow professionally and maybe personally, you may have a happy staff member, and you know what that means?  Loyalty.

What about current staff members who don’t quite meet your expectations?  If you remember a few weeks ago I mentioned that an important consideration in hiring a new employee is how they feel about learning something new.  If you have an employee who is willing to learn new things, now is the time to get them some training in the skills you think you are going to need.  Then turn them loose with some training and some simple but useful projects to let them gain the skills to later tackle the big projects you know are going to be coming.  I know some people think that you should toss people into the big tough projects right from the get go because if they succeed there, they can succeed everywhere.  That may be true for a small percentage of truly exceptional staff.  But you can achieve a lot more success with more people if you help to develop staff members that show potential and willingness to learn.

What about the rest?  Sometimes there is no alternative than to be honest with someone if their skills just don’t fit your needs and you really don’t think there is a way to use their skills in some other areas and they may be unwilling to learn new skills.  Sometimes you can still salvage the situation, but often they may just be happier somewhere else where their skills are needed and will someday thank you for helping them admit that to themselves.

The bottom line is there is never a way to guarantee that your best employees will not leave.  In any case, now is a great time to evaluate where you are and where you want to go.  One last point.  I don’t buy for a minute that you should not spend money training your staff with new skills because they will just take those skills to another company and you will get no return on that investment.  Whether you train your staff or not, they could leave.  Whether they get the ‘cool’ projects or just maintenance, they may leave.  A lot of things are outside of your control.  Many management consultants say that employees don’t leave jobs, they leave bosses.  You don’t have to be their best friend, but you have to show them that you really care about their career development.  Just because you cannot guarantee their loyalty, however, is never a good reason for not trying to do everything you can do to attract and keep the best talent by doing all you can to show that their contributions are valuable.  And even if they do leave eventually, you will hopefully have gotten some good work out of them while they were around.  Isn’t that better than the alternative?