What Is Survey Sampling?
Surveys would be meaningless and incomplete without accounting for the respondents that they’re aimed at. The best survey design practices keep the target population at the core of their thought process.
‘All the residents of the Dharavi slums in Mumbai’, ‘every NGO in Calcutta’ and ‘all students below the age of 16 in Manipur’ are examples of a population; they are countable, finite and well-defined.
When the population is small enough, researchers have the resources to reach out to all of them. This would be the best case scenario, making sure that everybody who matters to the survey is represented accurately. A survey that covers the entire target population is called a census.
However, most surveys cannot survey the entire population. This is when sampling techniques become crucial to your survey.
Why Is It Important?
If the target population is not small enough, or if the resources at your disposal don’t give you the bandwidth to cover the entire population, it is important to identify a subset of the population to work with – a carefully identified group that is representative of the population. This process is called survey sampling, and it is one of the most important aspects of survey design.
Whatever the sample size, there are fixed costs associated with any survey. Once the survey has begun, the marginal costs associated with gathering more information, from more people, are proportional to the size of the sample.
Drawing Inferences About the Population
Researchers are not interested in the sample itself, but in the understanding that they can potentially infer from the sample and then apply across the entire population.
A sample survey usually offers greater scope than a census. Working within a given resource constraint, sampling may make it possible to study the population of a larger geographical area or to find out more about the same population by examining an area in greater depth through a smaller sample.
Before we dive into the survey sampling methods at our disposal it is imperative that we develop a perspective on what an effective sample should look like.
3 Features to Keep in Mind While Constructing a Sample
It is important that researchers understand the population on a case-by-case basis and test the sample for consistency before going ahead with the survey. This is especially critical for surveys that track changes across time and space where we need to be confident that any change we see in our data reflects real change – across consistent and comparable samples.
Ensuring diversity of the sample is a tall order, as reaching some portions of the population and convincing them to participate in the survey could be difficult. But to be truly representative of the population, a sample must be as diverse as the population itself and sensitive to the local differences that are unavoidable as we move across the population.
There are several constraints that dictate the size and structure of the population. It is imperative that researchers discuss these limitations and maintain transparency about the procedures followed while selecting the sample so that the results of the survey are seen with the right perspective.
Now that we understand the necessity of choosing the right sample and have a vision of what an effective sample for your survey should be like, let’s explore the various methods of constructing a sample and understand the relative pros and cons of each of these approaches.
Sampling methods can broadly be classified as probability and non-probability.
3 Probability Sampling Techniques
When each entity of the population has a definite, non-zero probability of being incorporated into the sample, the sample is known as a probability sample.
Probability samples are selected in such a way as to be representative of the population. They provide the most valid or credible results because they reflect the characteristics of the population from which they are selected.
Probability sampling techniques include random sampling, systematic sampling, and stratified sampling.
When: There is a very large population and it is difficult to identify every member of the population.
How: The entire process of sampling is done in a single step with each subject selected independently of the other members of the population. The term random has a very precise meaning and you can’t just collect responses on the street and have a random sample.
Pros: In this technique, each member of the population has an equal chance of being selected as subject.
Cons: When there are very large populations, it is often difficult to identify every member of the population and the pool of subjects becomes biased. Dialing numbers from a phone book for instance, may not be entirely random as the numbers, though random, would correspond to a localized region. A sample created by doing so might leave out many sections of the population that are significant to the study.
Use case: Want to study and understand the rice consumption pattern across rural India? While it might not be possible to cover every household, you could draw meaningful insights by building your sample from different districts or villages (depending on the scope).
When: Your given population is logically homogenous.
How: In a systematic sample, after you decide the sample size, arrange the elements of the population in some order and select terms at regular intervals from the list.
Pros: The main advantage of using systematic sampling over simple random sampling is its simplicity. Another advantage of systematic random sampling over simple random sampling is the assurance that the population will be evenly sampled. There exists a chance in simple random sampling that allows a clustered selection of subjects. This can be avoided through systematic sampling.
Cons: The possible weakness of the method that may compromise the randomness of the sample is an inherent periodicity of the list. This can be avoided by randomizing the list of your population entities, as you would randomize a deck of cards for instance, before you proceed with systematic sampling.
Use Case: Suppose a supermarket wants to study buying habits of their customers. Using systematic sampling, they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.
When: You can divide your population into characteristics of importance for the research.
How: A stratified sample, in essence, tries to recreate the statistical features of the population on a smaller scale. Before sampling, the population is divided into characteristics of importance for the research — for example, by gender, social class, education level, religion, etc. Then the population is randomly sampled within each category or stratum. If 38% of the population is college-educated, then 38% of the sample is randomly selected from the college-educated subset of the population.
Pros: This method attempts to overcome the shortcomings of random sampling by splitting the population into various distinct segments and selecting entities from each of them. This ensures that every category of the population is represented in the sample. Stratified sampling is often used when one or more of the sections in the population have a low incidence relative to the other sections.
Cons: Stratified sampling is the most complex method of sampling. It lays down criteria that may be difficult to fulfill and place a heavy strain on your available resources.
Use Case: If 38% of the population is college-educated and 62% of the population have not been to college, then 38% of the sample is randomly selected from the college-educated subset of the population and 62% of the sample is randomly selected from the non-college-going population. Maintaining the ratios while selecting a randomized sample is key to stratified sampling.
3 Non-Probability Sampling Techniques
Non-probability sampling techniques include convenience sampling, snowball sampling and quota sampling.
In these techniques, the units that make up the sample are collected with no specific probability structure in mind. The selection is not completely randomized, and hence the resultant sample isn’t truly representative of the population.
When: During preliminary research efforts.
How: As the name suggests, the elements of such a sample are picked only on the basis of convenience in terms of availability, reach and accessibility.
Pros: The sample is created quickly without adding any additional burden on the available resources.
Cons: The likelihood of this approach leading to a sample that is truly representative of the population is very poor.
Use Case: This method is often used during preliminary research efforts to get a gross estimate of the results, without incurring the cost or time required to select a random sample.
When: When you can rely on your initial respondents to refer you to the next respondents.
How: Just as the snowball rolls and gathers mass, the sample constructed in this way will grow in size as you move through the process of conducting a survey. In this technique, you rely on your initial respondents to refer you to the next respondents whom you may connect with for the purpose of your survey.
Pros: The costs associated with this method are significantly lower, and you will end up with a sample that is very relevant to your study.
Cons: The clear downside of this approach is that you may restrict yourself to only a small, largely homogenous section of the population.
Use Case: Snowball sampling can be useful when you need the sample to reflect certain features that are difficult to find. To conduct a survey of people who go jogging in a certain park every morning, for example, snowball sampling would be a quick, accurate way to create the sample.
When: When you can characterize the population based on certain desired features.
How: Quota sampling is the non-probability equivalent of stratified sampling that we discussed earlier. It starts with characterizing the population based on certain desired features and assigns a quota to each subset of the population.
Pros: This process can be extended to cover several characteristics and varying degrees of complexity.
Cons: Though the method is superior to convenience and snowball sampling, it does not offer the statistical insights of any of the probability methods.
Use Case: If a survey requires a sample of fifty men and fifty women, a quota sample will survey respondents until the right number of each type has been surveyed. Unlike stratified sampling, the sample isn’t necessarily randomized.
Related read: When to do quota sampling and how to do it correctly
Probability sampling techniques are superior, but the costs can be prohibitive. For the initial stages of a study, non-probability sampling techniques might be sufficient to give you a sense of what you’re dealing with. For detailed insights and results that you can bank upon, move on to the more sophisticated techniques as the study gathers pace and takes a more concrete structure.
Once you have created your sample, go ahead and start creating an effective survey by choosing the right survey question types.
Photo by Benny Jackson on Unsplash
Note: This article was originally published on 27 April 2015, then refreshed and updated on 25 July 2017.
Very great post. I just stumbled upon yur blog and wante to mention that I have really
love browsing your blog posts. In any case I will be subscribing for your rss feed and I hope you write again soon!
Hi Margarito! Thanks for the message, and I’m really glad to hear that you like our blogs.
Plz tell me about implications of multistage.multiphae and cluster sampling
Great insight Christine, i am a master student have never understood well validity and reliability in research if we can have a blog on the same i will really appreciate.
Thanks Paul! I’ll put that topic on our list.
Thats so helpful. Written in a very clear style with good examples.
Pingback: How Much Data Do I Need?
thanks Christine very informative and clear.
A great insight to us who do surveys now and then with a number of indicators which requires survey.
I had three villages with two enumerators in each of the village with which I wanted 20 respondents from each enumerator. Can that be Quota Sampling or Convenient sampling?
First question of the survey for one to be interviewed was, “Did you attend any of the trainings
provided by the project?”
Hi Daud, thanks for the question! You can use either. If you just want quick feedback on who showed up, convenience sampling is the quickest, easiest way to get your sample. However, quota sampling is more rigorous and will give you more insight into the demographics you’re reaching for just a little extra effort.
Informative and very nicely written! Thanks for taking the time and effort to put this piece together. I have circulated it within my network.
Hi. I am initiating a study on cognitive development for a certain country in children between 6 and 14 years old who are given a test set. In addition, it is desired to calculate the average score obtained by age. The country has 19 states. Each age has approximately 50,000 children giving a total of 450000. So what would be the best way to select the sample? I mean? how many states would be enough? What would be the right size per age? If I did it by age I would give 384 * 9 = 3456 which seems a lot in terms of cost. But if I considered the population as a whole I would give 384, which would give me about 40 per age group. This seems little to make population inferences. So what would be the right reasoning? Thank you very much for your help
Thanks for your interest in our input. It is hard to say how many states need to be included in your sample without knowing much about the population you are working with. A lot of it depends on how diverse the population is within your country of study. Make sure you try to represent the demographics of your target population as much as you can.
With that being said, you will likely get more accurate results if you sample based on age. 40 people is definitely not enough to generalize across a population of 50,000. Do you have access to local schools for finding children to survey? Using schools to perform clustered sampling would allow you to decrease your cost per survey. Hope this helps!
Very impressive programme. Want to be part.
Hi thankyou for your blog it was very helpful I want to know that to know the potential user of any product in any state.
Which sample design will be good to use with reason
Very well explained. Good job.
I would like to ask if I want to sample NGOs in my country which is in Malaysia, should I use stratified random sampling (probability sampling) or purposive sampling (non-probability sampling). My target group is NGO who involves in the conservation of biodiversity only.
After I stratified NGOs to NGOs who involves in the conservation of biodiversity only, how should I randomly sample NGOs?
I would love if you can help me to clarify this. Many thanks.
Hi Su, thanks for the note. First, you can use all NGOs involved in the conservation of biodiversity as your population (rather than all NGOs). No stratification needed there! After that, you can choose from probability or non-probability sampling.
Why are you sampling these NGOs? If you’re just looking to get information easily and quickly (for example, if you want to learn about those NGOs’ data needs so you can build a trial product to help them), non-probability sampling will probably be easier. Usually, it will let you get lots of information far more quickly. But if you’re looking to do more rigorous analysis (for example, if you’re writing a national report on the financial status of these NGOs in 2018), then probability sampling is usually better.
Great Job. Simple and to the point
Can we do probability sampling in any modified way for a population where I do not have a sampling frame, for example walk in patients in an OPD of a hospital
What about systematic sampling? You could discreetly sample every nth person who walks into the OPD. As long as there’s no inherent order that people walk in, this should lead to a representative sample.
Hi I am Jerry from the Philippines. I am currently working on a module for introduction to research and scientific writing intended for high school students, more like research made super easy kind of thing (stress on the “super” . . . hehe). It’s a step by step guide in conducting research for beginners. I intend to use it in the classes I handle. I find your discussion very simple yet informative, the same goal I have for my module. I would like to ask permission to use your blog as one of my reference in discussing sampling techniques. Thanks in advance!
Hi Jerry, glad to hear this blog would be helpful for students. Feel free to use it!
P.S. We’ll be publishing an entire ebook on sampling in the next couple of months. Should we send that to you when it’s out?
Yey!!! That would be great. Looking forward to that 🙂
I’m planning to conduct a survey regarding citizens’ awareness on water supply issues, whether the community is aware or unaware with the amount of available water supply. The total population is 8.27 million. It’s a very huge number so how should I proceed with my sampling technique to make sure my survey is representative of the total population?
Thanks a lot!
Hey Richard, good question! The short answer is that 400-1,000 people would be a reasonable sample.
The long answer — take a look at the “Calculating Sample Size” section on page 26 of our data collection ebook (https://atlan.com/ebooks/data-collection/). If you put your population into the sample size formula (with a 5% margin of error and 95% confidence level), you get a sample of 384 people. That’s the default sample size for big populations.
However, that sample size formula is best for homogenous (i.e. internally similar) populations. If you think that your population has diversity that’s relevant to your research, it’s often a good idea to increase your sample size. This gives you a bit more peace of mind and greater statistical confidence. Plus, if you’re using stratified sampling, a larger sample allows you to stratify your population as much as you need without making your sub-samples too small.
If you’re increasing your sample, you probably won’t need to go above 1,000 people. (That’s generally where private organizations stop. It gets expensive and time consuming to go above that, and that sample size gives you a margin of error of 3%.) Of course, you can choose to make your sample as big as you like. Try playing around with the sample size formulas to see how changing your sample affects your margin of error.
Hope that helps! Let me know if you have any other questions.
P.S. Probabilistic sampling is the best way to go for your project. It’ll give much greater confidence that your sample is truly representative of the 8 million people you’re trying to understand.
Thank you for writing this blog, it’s great! I have a complicated question: I’m working with a group that has conducted a country-wide survey at the district level using a probability proportional to size (PPS) sampling technique of villages constructed based on population estimates. They then conducted several surveys over time and chose 20 villages with a randomly selected starting place on lists that were organized by size of the village (so, intrinsic stratification based on size), and then once villages were chosen, enumerators randomly selected households within the village to conduct the surveys. So, if all was done correctly, one can calculate statistics and confidently assume they are representative at the district level.
However, now the investigators are asking for statistics to be generated at a level under the district (tehsil) and we are no longer confident that the sample will be representative of the population at this smaller geographic level. So, I’m wondering if there is a way to assess randomness of the sample at the tehsil level? Or, what would you do to find a representative sample at the tehsil level? Any way to weight the sample in some way or do something else? Or, are we stuck with essentially reporting from a “convenience sample” at the tehsil level and simply stating this as a caveat?
Also, we are considering using certain criteria like: time (at least X number of villages within a particular tehsil were included in X number of surveys over time), geography (villages in a particular tehsil are distributed across the tehsil), resampling of villages (villages were not resampled excessively and if they were we will choose data from only one survey instance), urban/rural (villages are adequately distributed across urban and rural areas). What do you think about this approach to choosing tehsils that would give us fairly representative data to analyze? Note that we do not have population size at the tehsil level either. Thank you!
Glad you enjoyed the blog! This is definitely an interesting question. Your list of possible criteria is certainly extensive, but I think it’s actually not needed. You should hold your sample to the same criteria, no matter what geographic level you’re working at. If your tehsil-level sample fulfills the same criteria as the district-level sample you created, then the tehsil-level sample is just as valid.
To clarify – your original sample used PPS sampling, meaning that you created a list of villages for each district, ranked each list by village size, and chose villages of each list at regular intervals. So your core criteria was that your chosen villages were evenly distributed across a list that was ranked by population.
Do your surveyed villages still fit this criteria at tehsil level? You can check this by creating a list of villages for each tehsil, ranked by population (just like before). Then highlight the villages on each tehsil list that you already surveyed. How are these villages distributed on each tehsil list?
Are they fairly evenly spread out? If so, you’re all set! Your sample will hold up well at the tehsil level, and you can calculate tehsil-level statistics. (Though, for full transparency, you should add a caveat that the sample was originally created at the district level).
Alternatively, are the villages for some tehsils clustered at the top or bottom of the list? If so, then your sample unfortunately isn’t rigorous at the tehsil level. I suppose you could try weighting the villages to fix this, but it’ll be quite tricky and prone to error. It would be much safer to report tehsil-level stats as a convenience sample.
Does this make sense? Let me know if you have any questions!
I am doing a comparative study on children behaviour of class 3, 4 thus I want to take 25 student from each class what kind of sampling is suitable pls give idea
Hey Arpita! First, probability sampling is the way to go. It’s a lot more reliable, and school is a nice controlled environment where probability sampling is definitely possible. You have a couple of options within probability sampling.
In a school, it’s easy to get a list of all students. So the easiest option is systematic sampling. First, get a list of all the kids’ names in each class. (Make sure the list of names has been randomized!) Then you’ll pick children’s names off the list at random intervals. For example, if there are 100 kids on each class list, and you need 25 kids per class, you’ll pick every 4th name. That’ll give you a random sample of 25 kids per class.
Systematic sampling assumes that you don’t care about any internal differences between kids in the same class. If you do care about these differences, then you want to use stratified sampling.
How would this work? Here’s an example. Say you think that kids’s grades are probably related to their behavior. First, split the kids in each class into the groups you care about — i.e. divide them into 3 groups based on their class ranking. One group would have the top third of the class, one group will have the middle third, and one group will have the bottom third of the class. Then you can choose one third of your sample from each group. (Like above, randomize each group’s list, then choose 12 kids off each list.)
What if the characteristic you care about isn’t evenly distributed? Then make sure your sample reflects the distribution of your group in each class. For example, say you think that boys and girls behave differently, and you know that 40% of each class is male and 60% is female. You would split each class’ student list into separate lists — one with male students, and one with female students. Then randomize each list, and choose 40% of your sample from the male list and 60% from the female list.
Hope that helps! Let me know if you have any other questions.
Thanks a lot. It really help me to select sample
Hey please guide me in this Scenario:
The college administration wants to do research to measure English reading skills of intermediate students of second year. College has currently 3000 registered students in second year, 50% of which are boys and 50% are girls. There are 4 groups; pre-engineering, pre-medical, general science and commerce and each group has 5 sections and each section has 150 students. In order to draw inferences, test of reading skills will be conducted on 500 students. The maximum budget to conduct this study is Rs.3000. Administration claims that only expense which will be incurred is the cost to administer the test. The administration wants to conduct the study with precision but with constraint of low budget.
What is the best suited sampling technique for this study? Justify your stance with three sound arguments.
Please help me…
Hey Aftab, this sounds like a question from an assignment or test. I can’t do your homework for you, sorry!
As per the above-given scenario, it is convenience sampling that is suitable for this type of preliminary research. This is the maximum help for you. Remaining you need to do research by your own self.
Is this will be purposive or quota if you kindly give some idea
Hi, i am currently doing a survey on association of physical activity level and dietary supplement use among female workers in a city. Basically, i plan to collect sample from some organization in that city area. Roughly 5 organization will be selected out of 20. After that, select all the female workers in the organization to be my respondent. BUT, i don’t know which sampling method that i am using. It is convenience sampling????
my sample size on the research is too much around 357 sample size. and it is not manageable so is there any method to reduce a sample size?
In my country in Africa, power supply to homes is irregular and I am trying to find out how many people would be willing to pay more for constant 24/7 power supply. I have narrowed down to a potential population of 10 million people spread across two major cities, using income as the criteria. What sampling technique and sample size do you suggest for my research as I cant get across to everyone.
I am carrying out a correlation study on Mass Literacy Education as a panacea for effective Antenatal Clinic attendance. Please give me an idea of the right SAMPLING TECHNIQUE THAT IS APPROPRIATE. THANKS.
Hi, I’m writing an exam on research methodology and have a slight understanding of sampling but i struggle with applying the sampling techniques to various scenarios.
A company is considering operating an on-site kindergarten facility. But before taking further steps, it wants to get the reactions of four groups to the idea: (1) Employees who are parents of kindergarten-age children, and where both are working outside of the home, (2) employees who are parents of kindergarten-age children but where one of them is not working outside of the home, (3) single parents with kindergarten-age children, and (4) all those without children of kindergarten-age.
this is a exam preparation question i was given. which sampling method would be best suited for this scenario and why?
thank you for your help
Pingback: A Brief Introduction to Random Sampling, Stratified Sampling and Linear Sampling – Data warehousing and data mining
I was reading this few hours before entering my research examination and it was so helpful.Great Job.
very well explained, But I’m still confusing in thesis topics for choosing the sampling method. I studying in Master in Disaster management now I’m focusing in my thesis but my thesis related with vulnerability Assessment I’m selecting the different type of houses doing the survey in every vulnerability factors like the soft storey, short column, heavy overhang etc. If possible can you tell me which sampling method is better for this Rapid visual screening .
Thank you in advance if you help me in my problem
Hi..very enlightening post i must say, thanks… but i see nothing on checklist method of sampling.
Hello christine, this is very informative post for us. I just want to know that I am going to prepare a survey questionnaire on overall firm performance impact of supply chain partnership strategy through external environmental factors by the manager level of manufacturing industries, and in this study we have independent and dependent variables. Kindly guide me which sampling technique will be used here?
Thanks in advance!
Your article was quite clear and crisp. Thanks for the same.
I am doing my research on the “Risks faced by Start-up firms”. I need to collect response from all the stakeholders associated with Start-up ( there are around 8 different types of stakeholders).
The population size of each type of stakeholder is large. In addition, I have a challenge in getting the people (Stakeholders of Start-ups) to respond to my survey.
Therefore, I am planning to:
a) Restrict my survey to one Geographic area.
b) Undertake Quota Sampling. Take samples from all stakeholders, but it may not be in proportion with the actual population, since I am not aware of the proportion of each stakeholders in the actual population.
c) Undertake survey of around 50 samples (all stakeholders put together).
Please advise me , if my approach is right.
As part fulfillment of a master in occupational health I am looking at doing a cross sectional survey looking at work related stress in a military environment. The total population is 1000 so I am looking at getting a representative sample but I am totally confused and I’m not sure what type of sampling to use. I thought I could survey the 1000 but my tutor wants me to use a sampling technique. Hope you can be of help.
Thank you. It is helpful
I would like to conduct a study regarding patients experience in four hospitals. Total number of average monthly admission 3800.
I got 352 by Sampling.
I have proportionated 352 to all four hospital.
For one hospital I got only 42 patients.
I think it’s not enough. Please advise me
Very meaningful and simplified for everyone to understand.
[…] Sampling Techniques: How to Choose a Representative Subset of the Population: https://blog.socialcops.com/academy/resources/6-sampling-techniques-choose-representative-subset/ https://goo.gl/2P3fti #DataScience […]
[…] Sampling Techniques: How to Choose a Representative Subset of the Population: https://blog.socialcops.com/academy/resources/6-sampling-techniques-choose-representative-subset/ https://goo.gl/dWZjuQ #DataScience […]
[…] Sampling Techniques: How to Choose a Representative Subset of the Population: https://blog.socialcops.com/academy/resources/6-sampling-techniques-choose-representative-subset/ https://goo.gl/Rrac5G #DataScience […]
Quite an interesting article
Assuming i want too conduct a research on ‘The attitude on federal university otuoke, when going home
1. what step will i take in conducting this research
2a. With your knowledge of sampling technique, suggest the best sampling technique you think is adequate for the study
2b. Why do you think it’s the best technique to be adopted
I don’ t know, pls if I have 7,831.300 as a total population of the subjects in the area of my study and I determine to select/target only 50 participants, how can I explain the criterion of how I arrived at choosing 50 participants out of the total number of population as above. I am undergraduate student pls. Thanks
I am doing my research on employees deviant behavior. my respondent will be the administrative staff at the universities. so far I decided to go for the purposive sampling technique since I don’t have a sample frame. what the possibility to cluster the universities as a first stage to choose from where ill extract my data, then for choosing respondent I go for purposive techniques. In other words, to what extent it is acceptable to combine probability and non-probability techniques in a single research.
if so, are there any good reading materials I can refer to about mixing probability and non-probability sampling in a single research and how it is done?
i intend to do a retrospective study using the case notes of patients admitted into a program in 2014 and compares the same with patients seen in 2017
particular interest would be demographic information diagnosis HIV status analgesia prescribed
What is the recommended sample size and what term can i sue in sampling technique
[…] always need to obtain information from every member. But you can use a sample. If you manage to get a good sample, you’ll save a lot of time and money analyzing the data. It’s always preferable to have a […]
Realy Nice article on sampling techniques
Thank you very much. Very helpful blog. Thanks once again.
I shall soon send in my question.
Thanks Christine very informative Post. Great Job.