ICD - Sampling in Data Lesson

Math_Lesson_TopBanner.png Sampling in Data

Let's look at how to gather the data we want to analyze. Data consists of information coming from observations, counts, measurements, or responses.

Statistics is the science of collecting, organizing, and interpreting data in order to make decisions.

There are two types of data sets:

  1. Population —collection of all outcomes, responses, measurements, or counts that are of interest.
  2. Sample —a subset (part) of a population.

In statistics, we use numerical descriptions to describe characteristics of either a population or a sample.

Quantitative Data

Qualitative Data

Age: 31

Height: 5'5

Weight: 120

Salary: $50,000

GPA: 3.2

Gender: Female

Job Title: Teacher

Hair Color: Brown

Eye Color: Blue

Marital Status: Single

Population and Sample Means (Videos 1 and 2)

Watch the following video showing the difference between a sample mean and a population mean.

In video 1, mean, median, and mode are reviewed, as well as population mean and sample mean.

In video 2, the topics of population mean and sample mean are explored in more depth.

The numerical description is either a parameter or a statistic.

  • A  parameter  is a numerical description of a population characteristic.
  • A  statistic  is a numerical description of a sample characteristic.

There are two types of data:

  1. Qualitative data  consist of attributes, labels, or non-numerical entries. Examples of qualitative data are:
    • Gender
    • Job Title
    • Hair color
    • Eye color
    • Marital status
  2. Quantitative data  consist of numerical measurements or count. Examples of quantitative data are:
    • Age
    • Height
    • Weight
    • Salary
    • Grade point average

Types of Sampling (Videos 1 and 2)

Watch the following video showing types of samples.

In the following two videos, various types of samples are introduced and explored.

Reasonable Samples

The following video explores examples of whether or not situations are reasonable samples.

Representative vs. Random Samples

The following video explores examples of selecting a representative sample of a population.

The table below shows how information can be separated into two data sets. The table shows the model and base price of various vehicles.

Model (qualitative data)

Base Price (quantitative data)

Escort LX

$11, 430

Ranger 4 x 2 XL

$11, 485

Contour LX

$14, 460

Taurus LX

$18, 445

Windstar

$19, 380

Explorer XL 4 x 2

$21, 560

Crown Victoria

$21, 135

Collecting Data

There are several ways to collect data. The focus of the study dictates the best way to collect the data. The following table shows four methods of data collection.

Method

Characteristics

Examples

Perform an experiment—experiments are often "double blind." This means that neither the researcher nor the subject know which subjects are receiving NO treatment or a PLACEBO—treatment that has no value to the experiment.

  • Treatment is applied to PART of the population
  • The other PART of the population is used as a control group—given NO treatment
  • Responses from both groups
  • Results are compared

 

Testing the effect of imposing a new marketing strategy in a certain region.

Complete an observational study  - the researcher observes the values of variables for the subjects, without doing anything to them.

  • Unethical or impossible to assign people to receive a specific treatment
  • Certain variables are inherent traits and cannot be randomly assigned.
  • Certain variables can affect the results

Conducting a marketing study to see how people rate a new product.

Conduct a survey  - an investigation on one or more characteristic of a population

  • Carried out on people by asking them questions
  • The wording of the question can lead to biased results
  • Can be done as a census or a sampling

Determine if a school should adopt school uniforms.

Use a simulation—the use of a mathematical or physical model to reproduce the conditions of a process.

  • Computers, tables, or calculators are used in the collection of data
  • Helps the researcher study situations that are impractical or dangerous to create in real life.
  • Save time and money

 

Automobile manufacturers use simulations with dummies to study the effects of crashes on humans.

Take a census—a count or measure of an entire population.

  • Provides complete information
  • Costly
  • Difficult to perform
  • Takes enormous time
  • Gives good estimates of probabilities

 

Determine the population of Gwinnett County

Use sampling—a count or measure of PART of a population

  • Used to predict population parameters
  • More practical than a census

Determine the population of a city in Gwinnett county to predict the population of Gwinnett county

The following chart shows examples of different methods of data collection used in statistical studies.

Statistical Study

Data collection method

The effect of an asteroid colliding with Earth

Simulation—because it is impractical to create this situation

The effect of aspirin on preventing heart attacks

Experiment—because the effect of a treatment (taking aspirin) is being measured

The weights of all linemen in the National Football League

Census—because teams keep accurate records of all players

Americans' approval rating of the U.S. president

Sampling—because it would be nearly impossible to talk to every American

To determine if a campus dining hall should open a pizza bar.

Survey — because the need is to find out if it would be used.

Randomness

An important concept in collecting data is  randomness . To get a true sample of data, the collecting process must be random.   Simulations use random number generators to simulate the results.  

Look at this example:

From a class containing 12 girls and 10 boys, three students are to be selected to serve on a school advisory panel. Here are four different methods of making the selection.

I. Select the first three names on the class roll.

II. Select the first three students who volunteer.

III. Place the names of the 22 students in a hat, mix them thoroughly, and select three names from the mix.

IV. Select the first three students who show up for class tomorrow.

Which is the best sampling method, among these four, if you want the school panel to present a fair and representative view of the opinions of your class.   Explain the weaknesses of the three you did not select as the best.

  • Solution: Choice III is the best solution in terms of fairness because each of the other methods does not give equal chance of selection to all possible groups of three students.  Explanations as to why the others are unfair may include comments such as the following:
    1. Names beginning with the same letter may belong to the same family or the same ethnic group.
    2. Volunteers may have special interest in a particular issue on which they want to focus.
    3. Prompt students may be the more serious students and, perhaps, would be the more conscientious members of a panel, but they may not be the typical students in the class  

In which methods would you have had a chance to be picked?  

Not using a random sample often introduces  bias  into a sample.

Bias

A  biased  sample is one in which the method used to create the sample results in samples that are NOT representative of the population.   For instance, consider a research project on attitudes toward water pollution. Collecting the data by publishing a questionnaire in an environmental magazine and asking people to fill it out and send it in would produce a biased sample.   People interested enough to spend their time and energy filling out and sending in the questionnaire are likely to have different attitudes toward water pollution than those not taking the time to fill out the questionnaire.  

Also, the people reading a magazine about the environment would likely have similar attitudes about water pollution.

An example of a biased sample would be a sample consisting of only 18-22 year old college students if the statistical study was to research the 18-22 year old population of the country.

The following table shows the different types of sampling techniques.

Sampling Technique

Characteristics

Simple random sample

Every member of the population has an equal chance of being selected.

Stratified sample

Members of the population are separated into groups with similar characteristics. And a random sample is taken within each strata.

Cluster sample

Subgroups are formed and subgroups are selected and each member from that group is used in the sample.

Systematic sample

Each member of the population is assigned a number. And a pattern (such as every 5th person) is used to select the sample.

Convenience sample

Consists only of the available people. This often leads to biased studies.

Generating Random Samples

The following video explores the concept of generating simple random samples.

Random Sampling and Avoiding Bias

The following video explores the types of random sampling, as well as ways to avoid bias in the sampling process.

Population vs. Sample and Sampling Techniques from a Population

Video 1 explores the difference between population vs. sample, as well as parameter vs. statistic.

Video 2 explores types and techniques of samples: simple random, stratified, cluster, systematic, and convenience.

Sampling Presentation

It's now time for us to explore and practice working examples of types of sampling, random sampling, and bias.

Math_AdvAlgConceptsConnectBottomBanner.png

IMAGES CREATED BY GAVS