ICD - Inferences and Conclusions from Data Module Overview

Math_AdvAlgConceptsConnectBanner.png Inferences and Conclusions from Data

Introduction

people pointing at graphs on a clipboard in front of laptopsWhy do we study statistics? What is statistics? What types of jobs are available in the field of statistics?
We study statistics because it is used in fields from medical studies to research experiments, from continuously orbiting satellites to social network sites like Facebook, Instagram, or LinkedIn, data are being collected all over the world and all the time. Knowledge of and in statistics provides us with needed tools and concepts in quantitative reasoning to find specific information in a huge pool of data.
Statistics is the science of "learning from data." It is specifically concerned with the collection, analysis, and interpretation of data, as well as communication and presentation of results relying on data. Statistics is a specific type of quantitative reasoning necessary for making important advances in various fields of sciences, for example medicine and genetics research. It is also used to make decisions in business and public policy.
Statisticians have opportunities in many fields, such as government, business, industry, universities, and research laboratories. Statisticians are involved in the development of new "life-saving" medicines, helping shape public policy in government, planning market strategies in business, and managing investment portfolios in finance. Statisticians can enjoy wonderfully fulfilling job opportunities as well as very financially lucrative (quite often being able to earn six-figure incomes).

Essential Questions

  • How do I choose summary statistics that are appropriate to the data distribution?
  • How can I find a standard deviation?
  • How do I decide if the normal distribution describes a set of data?
  • When do I use the normal distribution to estimate probabilities?
  • How can I find the sampling distribution of a sample proportion?
  • How can I find the sampling distribution of a sample mean?
  • How do I use theoretical and empirical results to determine if a treatment was effective?
  • How does the way I collected data affect the conclusions that can be drawn?
  • How do I use statistics to explain the variability and randomness in a set of data?
  • How do I interpret the margin of error of a confidence interval?
  • How do I use a margin of error to find a confidence interval?

Inferences and Conclusions from Data Key Terms

The following key terms will help you understand the content in this module.

Bias - A mistake causing results that are not representative of the population.

Center - Measures of center refer to the summary measures used to describe the most "typical" value in a set of data. The two most common measures of center are median and the mean.

Census - A census occurs when everyone in the population is contacted.

Central Limit Theorem - The CLT allows us to use normal calculations to determine probabilities about sample proportions and sample means obtained from populations that are not normally distributed.

Confidence Interval - An interval for a parameter, calculated from the data, usually in the form estimate ± margin of error. The confidence level gives the probability that the interval will capture the true parameter value in repeated samples.

Continuous Random Variables - Have an infinite number of possible values.

Data - Information about a product or process, usually in numerical form.

Descriptive Statistics —This involves the organization, summarization, and display of data.

Discrete Random Variables - Have a finite number of distinct values or counts.

Empirical Rule - If a distribution is normal, then approximately:

  • 68% of the data will be located within one standard deviation symmetric to the mean.
  • 95% of the data will be located within two standard deviations symmetric to the mean.
  • 99.7% of the data will be located within three standard deviations symmetric to the mean.

Frequency Distribution -  Instead of listing every data point, a frequency distribution will list the value with its associated frequency.

Inferential Statistics —This involves using a sample to draw conclusions about a population.

Interquartile Range - The difference between the first and third quartile values. This IQR measures the spread of the middle half of the data.

Margin of Error - The value in the confidence interval that says how accurate we believe our estimate of the parameter to be. The margin of error is comprised of the product of the z-score and the standard deviation (or standard error of the estimate). The margin of error can be decreased by increasing the sample size or decreasing the confidence level.

Mean Absolute Deviation - A measure of variation in a set of numerical data, computed by adding the distances between each data value and the mean, then dividing by the number of data values. Example: For the data set {2, 3, 6, 7, 10, 12, 14, 15, 22, 120}, the mean absolute deviation is 20.

Normal Distribution - A frequency distribution that is the symmetric, bell-shaped curve and has the data spread evenly around a specific center.

Outliers - Data that are far away from most of the data points.

Parameters - These are numerical values that describe the population. The population mean is symbolically represented by the parameter LaTeX: \mu_xμx. The population standard deviation is symbolically represented by the parameter LaTeX: \sigma_xσx .

Population - The entire set of items from which data can be selected.

Qualitative Data - Consist of attributes, labels, or non-numerical entries.

Quantitative Data - Consist of numerical measurements or count.

Quartiles - Divide an ordered data set into four equal parts.

Random - Events are random when individual outcomes are uncertain. However, there is a regular distribution of outcomes in a large number of repetitions.

Range - The difference between the greatest data element and the least data element.

Sample - A subset, or portion, of the population.

Sample Mean - A statistic measuring the average of the observations in the sample. It is written as LaTeX: \overline{\rm x}¯x . The mean of the population, a parameter, is written as LaTeX: \muμ.

Sample Proportion - A statistic indicating the proportion of successes in a particular sample. It is written as LaTeX: \widehat{p}ˆp. The population proportion, a parameter, is written as p.

Sampling Distribution - A statistic for the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Sampling Variability - The fact that the value of a statistic varies in repeated random sampling.

Shape -The shape of a distribution is described by symmetry, number of peaks, direction of skew, or uniformity.

Spread - The spread of a distribution refers to the variability of the data.   If the data cluster around a single central value, the spread is smaller. The further the observations fall from the center, the greater the spread or variability of the set. (range, interquartile range, Mean Absolute Deviation, and Standard Deviation measure the spread of data).

Standard Deviation - The square root of the variance.  LaTeX: \sigma=\sqrt[]{\frac{1}{n}\Sigma\left(x_i-\overline x\right)^2}σ=1nΣ(xi¯x)2

Statistic - A numerical description of a sample characteristic.

Statistics - The science of collecting, organizing, and interpreting data in order to make decisions.

Survey - An investigation on one or more characteristic of a population, either through census or sampling.

Variance - The average of the squares of the deviations of the observations from their mean.  LaTeX: \sigma^2=\frac{1}{n}\Sigma\left(x_1-\overline x\right)^2σ2=1nΣ(x1¯x)2

Math_OverviewBottomBanner.png IMAGES CREATED BY GAVS