Finding Out When Enough Is Enough

Last week I asked the question, ‘How many people must you ask in a survey?’  While I talked about the topic in generalities at that time, I also mentioned that it would be interesting to test the hypothesis that a survey only needs to query a small percentage of the population to get meaningful results.

To test that theory, I took some data from a recent survey that was conducted over several months and disguised the question, but kept the results.  The question was a basic Likert Scale type question in which the question itself postulates a specific position and asks the survey take whether they agree or disagree with the statement.  This survey was conducted using the SharePoint Survey list and was set to allow a user to only answer the question once so as to not pad the ballot box so to speak.

The  total possible population of respondents was around 12,000.  Of course the survey owners wanted to get as many respondents as possible which is why they conducted the survey over several months.  However, I have always been of the opinion that for this survey, anything more than about a month of making the survey really did not serious effect the overall results.  By that I mean that the responses after about a month were a true representation of the total population and that there was no need to try to get 100% participation.

However, the question I chose to use for this study contained five possible responses listed below:

  • Strongly Agree
  • Agree
  • Neither Agree nor Disagree
  • Disagree
  • Strongly Disagree

Since I was collecting the data with SharePoint, I also stored the date on which each survey was taken.  Therefore, I could tell on any given date, how many responses have been entered since the start of the survey.  Knowing the total population, I could very easily determine the percent participation.  By exporting the data from SharePoint to an Excel spreadsheet, an extremely valuable option from a SharePoint survey, I could load the data into a PowerPivot data model and then create a variety of tables and charts based on the data.

The first figure I show below is the final tabulated results  after three and half months of data collection.  You can see that the total count of responses was only 8,203 out of a possible 12,000.  This represents a little more than 67% of the population.  Of the people who responded to the question (Yes, the question was changed to protect the guilty), ‘I believe Pivot Tables help me analyze data at work, 63.7% of them strongly agreed with the statement.  In fact, over 96% agreed or strongly agreed with the statement.  But my question was, did I need to poll 67% of the population to discover that?


Going back to my PowerPivot table, I added a report filter (For those that don’t have PowerPivot, this data set is small enough that a simple Excel Pivot table would also work fine.).


When I opened the filter dropdown as shown in the next figure, I can expand the All node of the value tree to show all the possible values in the table.  Note each date is represented as a separate entry.


In order to select multiple dates as my filter, I need to click the checkbox at the bottom of the list box: Select Multiple Items.  This action places a checkbox next to each date as well as the All node.  By default, all records (dates in my case) are selected.


I first need to unselect the checkbox next to the All node.  Then I can select only the dates that I want to appear in my table.  For example, in the next figure, I select only the first three days of the survey.


When I click OK, my table updates and shows a total count of 214 survey responses on which 76.64% strongly agreed with the statement.  While this is close to the final 63.7% at the end of the survey period, it is still 13% away.  Obviously 3 days of a survey are not enough.


I then chose 10 more days through February 2nd.


This time with 1103 responses, my results for strongly agree was 65.55% and my total for strongly agree and agree were 96.7%.  Now I am getting really close to my final results and after only 13 days rather than 3 and a half months.


I added another 10 days bringing my survey count up to 4023, nearly half of the three and half month result and my Strongly Agree percent is starting to settle in at 63.81%, only a tenth of a percent off of the final result.


So, just for fun, (statistics is fun isn’t it?) I decided to chart the percentage of Strongly Agree responses as a function of the survey date.  I noticed that by the time I hit a month into the survey, my results had flattened out to around 64% plus or minus less than a half percent.


I then plotted the percent response rate assuming a maximum of 12,000 possible responders and to only about a 15-17% response rate.


So after surveying only about 15% of the population, I could say that the additional survey results over the next two and half months would not significantly affect my results.  Therefore, I could also say that it would be reasonable to assume that even though I only surveyed 67% of the total population, getting responses from the remaining 33% would probably not significantly change my results.

That is the power of surveys.  The trick is determining when the survey results begin to flatten out.  Every survey can be a little different and the number of possible answers to the survey will also affect the result (something we can maybe test in a future blog entry).

If I were plotting this data on a daily basis, I would have been able see when my results began to flatten and be able to ‘declare a winner’ with a great degree of certainty after a month and half or perhaps less.  In fact, with greater experience with similar types of data and by using questions with fewer possible answers, the size of the survey can be greatly reduced while retaining a high level of accuracy in the result.

I hope you found this interesting.  I chose to give the Tuesday blog a bit more of a technical twist this week because I am about to go on a summer writing schedule.  What does that mean?  I may drop back to one blog entry a week for most weeks.  There is just so many other things to do in the summer that are more fun than writing a blog, like cutting the grass and pulling weeds from the garden or even trimming overgrown bushes.  Anyway, I’ll try to keep a few non technical blogs in the mix each month to lighten up the reading from the dry technical topics.  When fall comes, I will switch back to two entries a week.

C’ya later.

How Many People Must You Ask in a Survey?

Have you ever conducted a survey to help make a decision?  Perhaps you helped someone else build a survey to help them make a decision.  I’m sure you have taken surveys.  I seem to get several every week in my email from various companies and organizations.  Then there are the surveys on the back side of your receipts from restaurants and stores.  You might even get surveys in the mail.  I remember when they use to include a nickel in the envelope with the survey to make you feel guilty about keeping the nickel and not filling in the survey.  I guess a nickel doesn’t buy a lot of guilt today.

But how many of the surveys do you think they get back?  Ninety percent?  Seventy five percent? How many people need to respond to a survey in order to make valid predictions?

I guess the answer depends on who you send the survey to and what questions you ask.  If you are a auto manufacturer and you want to evaluate your customer satisfaction with a new model, you do not want to send the survey to all automobile owners.  On the other hand, if you want to find out what features would entice owners of automobiles from other manufacturers to switch and buy one of your vehicles, you may want to exclude owners of your cars.

You see the dilemma?  The type of question should be closely tied to the audience to whom you send the survey.  If you cannot narrow down your audience, perhaps the questions in your survey are too broad and you should consider making two or more surveys to target specific audience groups with question that would be important to them.

How many questions should you put on a survey?  The more questions you include, the less likely someone will take the time to answer them.  For myself, anything more than a half dozen questions and I’m bored and ready to stop taking the survey.  One way you can counterbalance this tendency is to offer greater rewards for completing the survey.  Restaurants often offer free appetizers or deserts or even menu items for completing their surveys.  Stores may offer a certain percent off your next purchase.  Internet surveys have offered everything from cash/gift cards to thumb drives and even iPads.  Would you fill in a 100-question survey for the chance to win a 4 GB thumb drive?  What if they were offer an iPad for 5 lucky survey takers?  You might want to guess at how many people are willing to take that survey.  If you think only 500 people will take the survey, you might be more willing to spend the next half hour completing the survey than if fifty thousand people were to take the survey.  On the other hand, a survey that offers nothing in exchange for my time will probably wind up getting filled in my circular temporary storage bin otherwise known as a trash can.

So you have identified your survey questions and you have a targeted list of people you will be asking to take the survey.  You even have a reward program set up to encourage people to trade their time for a chance at a ‘gift’.  What percent of your target audience do you need to get a response from in order to have a reasonable chance that the survey will represent the total population?  Take the last presidential election as an example.  Did any of the survey takers ask you whom you might be voting for?  Probably not.  In all of the years that I’ve been voting, I’ve never once been asked by a survey taker who I was going to vote for or who I did vote for.  So whom are they asking?  Would it surprise you to know that they can get a pretty good idea of the way the election will turn out by only ask a few thousand people?  Makes me wonder if we just cannot randomly select a few thousand people from around the country to cast their ballots instead of trying to get all of us to drive to the polls and then wait in lines for hours to cast our votes.  If the odds are that we would get the same results, imagine the time we would save.  After all, isn’t that what manufacturers do when they conduct product surveys to determine what goods to make and sell to us?

I was recently involved in a survey with a very well defined maximum population.  The survey was open for several weeks, but I can evaluate the cumulative data as of any day beginning with only a percent or two up to the final 60+% results.  The survey owners were still trying to get people eligible to take the survey to complete the survey before the deadline even though they already had survey results from over 60% of the eligible survey takers.  Was this overkill?  Would a few more percent make a difference in the outcome of the decision?

Over the next couple of days (or weeks if I get busy doing something else), I’m going to pull the data into a PowerPivot model and evaluate the results to some of the questions over time as more and more people take the survey.  I’m curious to see if getting more people to take the survey really made a difference.  I’ll report back to you what I discover in a future Tuesday blog.

Until then, think about it and try to reason out what the results will look like.  C’ya!