ICD - Sampling in Data Lesson
Sampling in Data
Let's look at how to gather the data we want to analyze. Data consists of information coming from observations, counts, measurements, or responses.
Statistics is the science of collecting, organizing, and interpreting data in order to make decisions.
There are two types of data sets:
- Population —collection of all outcomes, responses, measurements, or counts that are of interest.
- Sample —a subset (part) of a population.
In statistics, we use numerical descriptions to describe characteristics of either a population or a sample.
Quantitative Data |
Qualitative Data |
Age: 31 Height: 5'5 Weight: 120 Salary: $50,000 GPA: 3.2 |
Gender: Female Job Title: Teacher Hair Color: Brown Eye Color: Blue Marital Status: Single |
Population and Sample Means (Videos 1 and 2)
Watch the following video showing the difference between a sample mean and a population mean.
In video 1, mean, median, and mode are reviewed, as well as population mean and sample mean.
In video 2, the topics of population mean and sample mean are explored in more depth.
The numerical description is either a parameter or a statistic.
- A parameter is a numerical description of a population characteristic.
- A statistic is a numerical description of a sample characteristic.
There are two types of data:
- Qualitative data consist of attributes, labels, or non-numerical entries. Examples of qualitative data are:
- Gender
- Job Title
- Hair color
- Eye color
- Marital status
- Quantitative data consist of numerical measurements or count. Examples of quantitative data are:
- Age
- Height
- Weight
- Salary
- Grade point average
Types of Sampling (Videos 1 and 2)
Watch the following video showing types of samples.
In the following two videos, various types of samples are introduced and explored.
Reasonable Samples
The following video explores examples of whether or not situations are reasonable samples.
Representative vs. Random Samples
The following video explores examples of selecting a representative sample of a population.
The table below shows how information can be separated into two data sets. The table shows the model and base price of various vehicles.
Model (qualitative data) |
Base Price (quantitative data) |
---|---|
Escort LX |
$11, 430 |
Ranger 4 x 2 XL |
$11, 485 |
Contour LX |
$14, 460 |
Taurus LX |
$18, 445 |
Windstar |
$19, 380 |
Explorer XL 4 x 2 |
$21, 560 |
Crown Victoria |
$21, 135 |
Collecting Data
There are several ways to collect data. The focus of the study dictates the best way to collect the data. The following table shows four methods of data collection.
Method |
Characteristics |
Examples |
---|---|---|
Perform an experiment—experiments are often "double blind." This means that neither the researcher nor the subject know which subjects are receiving NO treatment or a PLACEBO—treatment that has no value to the experiment. |
|
Testing the effect of imposing a new marketing strategy in a certain region. |
Complete an observational study - the researcher observes the values of variables for the subjects, without doing anything to them. |
|
Conducting a marketing study to see how people rate a new product. |
Conduct a survey - an investigation on one or more characteristic of a population |
|
Determine if a school should adopt school uniforms. |
Use a simulation—the use of a mathematical or physical model to reproduce the conditions of a process. |
|
Automobile manufacturers use simulations with dummies to study the effects of crashes on humans. |
Take a census—a count or measure of an entire population. |
|
Determine the population of Gwinnett County |
Use sampling—a count or measure of PART of a population |
|
Determine the population of a city in Gwinnett county to predict the population of Gwinnett county |
The following chart shows examples of different methods of data collection used in statistical studies.
Statistical Study |
Data collection method |
---|---|
The effect of an asteroid colliding with Earth |
Simulation—because it is impractical to create this situation |
The effect of aspirin on preventing heart attacks |
Experiment—because the effect of a treatment (taking aspirin) is being measured |
The weights of all linemen in the National Football League |
Census—because teams keep accurate records of all players |
Americans' approval rating of the U.S. president |
Sampling—because it would be nearly impossible to talk to every American |
To determine if a campus dining hall should open a pizza bar. |
Survey — because the need is to find out if it would be used. |
Randomness
An important concept in collecting data is randomness . To get a true sample of data, the collecting process must be random. Simulations use random number generators to simulate the results.
Look at this example:
From a class containing 12 girls and 10 boys, three students are to be selected to serve on a school advisory panel. Here are four different methods of making the selection.
I. Select the first three names on the class roll.
II. Select the first three students who volunteer.
III. Place the names of the 22 students in a hat, mix them thoroughly, and select three names from the mix.
IV. Select the first three students who show up for class tomorrow.
Which is the best sampling method, among these four, if you want the school panel to present a fair and representative view of the opinions of your class. Explain the weaknesses of the three you did not select as the best.
- Solution: Choice III is the best solution in terms of fairness because each of the other methods does not give equal chance of selection to all possible groups of three students. Explanations as to why the others are unfair may include comments such as the following:
- Names beginning with the same letter may belong to the same family or the same ethnic group.
- Volunteers may have special interest in a particular issue on which they want to focus.
- Prompt students may be the more serious students and, perhaps, would be the more conscientious members of a panel, but they may not be the typical students in the class
In which methods would you have had a chance to be picked?
Not using a random sample often introduces bias into a sample.
Bias
A biased sample is one in which the method used to create the sample results in samples that are NOT representative of the population. For instance, consider a research project on attitudes toward water pollution. Collecting the data by publishing a questionnaire in an environmental magazine and asking people to fill it out and send it in would produce a biased sample. People interested enough to spend their time and energy filling out and sending in the questionnaire are likely to have different attitudes toward water pollution than those not taking the time to fill out the questionnaire.
Also, the people reading a magazine about the environment would likely have similar attitudes about water pollution.
An example of a biased sample would be a sample consisting of only 18-22 year old college students if the statistical study was to research the 18-22 year old population of the country.
The following table shows the different types of sampling techniques.
Sampling Technique |
Characteristics |
---|---|
Simple random sample |
Every member of the population has an equal chance of being selected. |
Stratified sample |
Members of the population are separated into groups with similar characteristics. And a random sample is taken within each strata. |
Cluster sample |
Subgroups are formed and subgroups are selected and each member from that group is used in the sample. |
Systematic sample |
Each member of the population is assigned a number. And a pattern (such as every 5th person) is used to select the sample. |
Convenience sample |
Consists only of the available people. This often leads to biased studies. |
Generating Random Samples
The following video explores the concept of generating simple random samples.
Random Sampling and Avoiding Bias
The following video explores the types of random sampling, as well as ways to avoid bias in the sampling process.
Population vs. Sample and Sampling Techniques from a Population
Video 1 explores the difference between population vs. sample, as well as parameter vs. statistic.
Video 2 explores types and techniques of samples: simple random, stratified, cluster, systematic, and convenience.
Sampling Presentation
It's now time for us to explore and practice working examples of types of sampling, random sampling, and bias.
IMAGES CREATED BY GAVS