SS - Using Statistical Studies to Make Decisions Module Overview

Statistical Reasoning Overview

Statistics wordleQuick poll! How many people like surveys? Believe or not, surveys can help businesses improve their products in order to give customers more of what they need. Surveys are not all about questions and answers though, data must be collected and analyzed in order to make sure that the results will produce a better outcome or perhaps display results in order to analyze the information better. In this unit, you will demonstrate that the field of statistics provides us with the tools to understand where data come from and how numbers are used to make decisions.

Essential Questions

    1. What components make up a statistical investigation?
    2. How should data and subjects be treated during statistical investigations?
    3. What sampling techniques are used in poll results?  
    4. How is data expressed graphically?  
    5. What does a histogram represent?
    6. What characteristics do we look for?
    7. What values determine the center, shape, and spread of data?
    8. What makes for a good survey question?  
    9. What makes for a good survey conclusion?
    10. What concerns should be taken into account when collecting data for statistical investigation?

Key Terms

The following key terms will help you understand the content in this module.

  • 5-Number Summary - A summary for a variable consisting of the minimum, the lower quartile, the median, the upper quartile, and the maximum number.
  • Bias - Any systemic failure of a sampling method to represent its population.
  • Biased Sampling Method - Sampling method that tends to over or under represent parts of a population.
  • Biased Statistic - When the statistic of an expected value does not equal the parameter or quantity estimated.
  • Bivariate - Relates two pieces of information or variables to each participant.
  • Blinding - Any individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups is said to be blind.
  • Block Design - When groups of experimental units are similar, it is often a good idea to gather them together into blocks.   By blocking we isolate the variability attributable to the differences between the blocks so that we can see the differences caused by the treatments more clearly.
  • Boxplot - Displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values.   Boxplots are particularly effective for comparing groups of different sizes.
  • Categorical Variable - A variable that names categories (whether with words or numerals).
  • Census - A sample that consists of the entire population.
  • Cluster Sample - A sampling design in which entire groups, or clusters, are chosen at random.   Cluster sampling is usually selected as a matter of convenience, practicality, or cost.   Each cluster should be heterogeneous (and representative of the population), so all the clusters should be similar to each other.
  • Control Group - Randomly assigning participants to receive a fake treatment helps control for the placebo effect.
  • Convenience Sample - Consists of the individuals who are conveniently available.   Convenience samples often fail to be representative because every individual in the population is not equally convenient to sample.
  • Dotplot - Graphical displays with a dot for each case against a single axis.
  • Double-Blind - Studies conducted where the control group is unknown to the researcher and the participants.   Participants sometimes inaccurately report on their improvement.   Likewise, researchers sometimes subconsciously try harder to find improvement in treatment-group participants versus control-group participants.
  • Experiment Study - Research in which the researcher separates the participants into at least two groups, applies some sort of treatment and then compares results.
  • Experimental Unit - person or object upon which the treatment is applied
  • Explanatory Variable - Variable that serve to explain changes in the response. They may also be called the predictor or independent variables.
  • Frequency - The count or number of times an event occurs.
  • Histogram - Uses adjacent bars to show the distribution of values in a quantitative variable.   Each bar represents the frequency of values falling in an interval of values.
  • Hypothesis - A formal research question.
  • Lurking Variables - A variable other than x and y that simultaneously affect both variables, accounting for the correlation between the two.
  • Margin of Error -  a statistic expressing how many percentage points your results will differ from the real population value.
  • Mean - Found by summing all the data values and dividing by the count.
  • Median - The middle value of an organized data set, with half of the data above and half below it.
  • Natural Variability - Occur when no human intervention or error but natural factors cause significant changes.
  • Nonrepresentative Sampling - When the sample does not represent the differences in a population.
  • Normal Curve - The shape of the distribution of a data set that is unimodal and symmetric.
  • Null Hypothesis - Specifies a population model parameter of interest and proposes a value for that parameter.
  • Observational Study - Research in which data is collected about some characteristic(s) of the population.   The data can be collected by observation or by a survey or interview.
  • Outlier - Extreme values that don't appear to belong with the rest of the data.   They may be unusual values that deserve further investigation, or just mistakes; there's no obvious way to tell.
  • Parameter - A numerical representation of the population.
  • Placebo - A treatment known to have no effect, administered so that all groups experience the same conditions.
  • Placebo Effect - The tendency of many human subjects (often 20% or more of the experiment subjects) to show a response even when administered a placebo.
  • Quartile - The median and the quartiles divided data into four equal parts.
  • Range - The difference between the lowest and highest values in a data set (maximum - minimum).
  • Response variable - Variable that about which the researcher is posing the question. May also be called the outcome or the dependent variable.
  • Sample - A (representative) subset of the population, examined in hope of learning about the population.
  • Simple Random Sample (SRS) - A sample drawing process in which each set of n elements has an equal chance of selection.
  • Simulation - A simulation models random events by using random numbers to specify event outcomes with relative frequencies that correspond to the true real-world relative frequencies we are trying to model.
  • Spread - A summary of a distribution using the standard deviation, interquartile range, and range.
  • Statistic - Statistics are values calculated for sampled data.
  • Stratified Random Sample - A sampling design in which the population is divided into several subpopulations, or strata, and random samples are then drawn from each stratum.
  • Systematic Random Sample - A sample drawn by selecting individuals systematically from a sampling frame.
  • Treatment - The process, intervention, or controlled circumstance applied to randomly assigned experimental units.
  • Under-coverage - A sampling scheme that biases the sample in a way that gives a part of the population less representation than it has in the population.
  • Univariate - One piece of information or variable recorded for each participant.
  • Variable - A variable is any characteristic, number, or quantity that can be measured, counted, or observed for record.
  • Voluntary Response Sample - Bias is introduced to a sample when individuals can choose or select whether to participate in the sample.   Samples based on voluntary responses are always invalid and cannot be recovered, no matter how large the sample size.

[CC BY-NC-SA 4.0] UNLESS OTHERWISE NOTED | IMAGES: LICENSED AND USED ACCORDING TO TERMS OF SUBSCRIPTION - INTENDED ONLY FOR USE WITHIN LESSON.