ID - Investigating Data (Overview)
Investigating Data
Introduction
Mrs. Wood is doing an experiment with her Algebra I class. She is going to try teaching one of her classes outside to see if the fresh air helps to improve their test scores. She teaches one class inside and one class outside and then compares their scores from that unit's test. What important statistics should she compare? If one class has a higher average than the other, does that mean every student in that class did better? In this unit, we will learn what statistics like mean, median, range, IQR and standard deviation imply about a data set. We'll also learn how to plot and analyze data to help compare different data sets!
Essential Questions
- How do I summarize, represent, and interpret data?
- How can I use visual representations and measures of center and spread to compare two data sets?
- Why is technology valuable when making statistical models?
- How do you determine the regression line or line of best fit for a scatter plot of data?
- Why are linear models used to study many important real-world phenomena?
- How do I determine if linear or exponential regression is more appropriate for a scatter plot?
- How can I apply what I have learned about statistics to summarize and analyze real data?
Key Terms
Association - A connection between data values.
Bivariate data - Pairs of linked numerical observations. An example would be to list the heights and weights of each player on a football team.
Categorical Variables - Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue), gender (male or female), year in school (freshmen, sophomore, junior, senior). These are data that cannot be averaged or represented by a scatter plot as they have no numerical meaning.
Center - Measures of center refer to the summary measures used to describe the most "typical" value in a set of data. The two most common measures of center are median and the mean.
Correlation coefficient - A measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations.
First Quartile (Q1) - For an ordered data set with median M, the first quarter is the median of the data values less than M. For the data set {2, 3, 6, 7, 10, 12, 14, 15, 22, 120}, the first quartile is 6.
Five-Number Summary - Minimum, lower quartile, median, upper quartile, maximum.
Histogram - Graphical display that subdivides the data into class intervals and uses a rectangle to show the frequency of observations in those intervals—for example you might do intervals of 0-3, 4-7, 8-11, and 12-15.
Interquartile Range (IQR) - A measure of variation in a set of numerical data. The interquartile range is the distance between the first and third quartiles of the data set. Example For the data set {1, 3, 6, 7, 10, 12, 14, 15, 22, 120}, the interquartile range is 15 - 6 = 9.
Line of Best Fit (trend or regression line) - A straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points. An exponential model will produce a curved fit.
Mean Absolute Deviation (MAD) - A measure of variation in a set of numerical data, computed by adding the distances between each data value and the mean, then dividing by the number of data values. Example For the data set {2, 3, 6, 7, 10, 12, 14, 15, 22, 120}, the mean absolute deviation is 20.
Mean (X bar) - The mean (X bar) is a common measure of center. To find the mean, average the data values in the data set.
Median - The median (M) is a common measure of center. The median the middle number in a set of data values that have been placed in order from highest to lowest or lowest to highest. To find the median of a set with an odd number of values, place the set in order and choose the middle number. To find the median of a set with an even number of values, place the set in order and average the two middle numbers.
Outlier - Sometimes, distributions are characterized by extreme values that differ greatly from the other observations. These extreme values are called outliers. As a rule, an extreme value is considered to be an outlier if it is at least 1.5 interquartile ranges below the lower quartile (Q1), or at least 1.5 interquartile ranges above the upper quartile (Q3).
Quantitative Variables - Numerical variables that represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable. Other examples could be scores on a set of tests, height and weight, temperature at the top of each hour, etc.
Range - The range is the difference between the greatest data value and the least data value.
Relative Frequency Table - Displays frequency counts as a proportion of the total.
Second Quartile (Q2) - The median value in the data set.
Shape - The shape of a distribution is described by symmetry, number of peaks, direction of skew, or uniformity.
Symmetry - A symmetric distribution can be divided at the center so that each half is a mirror image of the other.
Scatter plot - A straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points. An exponential model will produce a curved fit.
Spread - The spread of a distribution refers to the variability of the data. If the data cluster around a single central value, the spread is smaller. The further the observations fall from the center, the greater the spread or variability of the set. (range, interquartile range, Mean Absolute Deviation, and Standard Deviation measure the spread of data)
Third Quartile (Q3) - For an ordered data set with median M, the third quartile is the median of the data values greater than M. Example For the data set {2, 3, 6, 7, 10, 12, 14, 15, 22, 120}, the third quartile is 15.
Trend - Displays frequency counts as a proportion of the total.
IMAGES CREATED BY GAVS