AAD - Module Overview
All about Distributions Overview
Introduction
This unit focuses on describing and analyzing quantitative data. The FIRST STEP in data analysis is to LOOK at the data in graph form as determined by the TYPE of data variable present. Describing the graph verbally is the SECOND STEP, using the correct terminology to explain some of the features or characteristics of the graph. Next, we analyze mathematically and use formal terms like mean or median, and standard deviation or IQR. The POWER of statistics is beginning to emerge!
Essential Questions
- How do we display quantitative data (numbers) in a meaningful way?
- What is the difference between quantitative data and qualitative (categorical) data?
- What are the best graph choices for each type of variable?
- What terms are used to describe a distribution verbally?
- What statistics are used to describe the distributions numerically?
- What are the procedures for comparing two or more data distributions?
Key Terms
The following key terms will help you understand the content in this module.
variable- holds information about the same characteristic for many cases.
categorical variable- a variable that names categories whether with words or numerals (ordinal).
quantitative variable- a variable in which the numbers act as numerical values and include units. You can perform operations on them.
distribution- gives the values the variable can assume and tells how frequently it takes on each of those values.
range- a measure of spread found by finding the difference of the maximum and minimum values.
spread- a measure of variability summarized by standard deviation, interquartile range, or range.
outlier- extreme values that don't appear to belong with the rest of the data that may be mistakes or unusual values requiring further investigation.
center- general concept of referring to the middle of the values in the distribution with common measures of center including mean, median, mid-quartile, and mode.
shape- described by unimodal versus multiple modes, symmetry vs skewness, outliers, clusters, or gaps.
skewed left- distribution that is not symmetric and has the thinner end (tail) on the left or at the low end of the scale.
skewed right- distribution that is not symmetric and has the thinner end (tail) on the right or at the high end of the scale.
symmetric- distribution can be folded along a vertical line through the middle and have halves that match closely.
uniform- distribution that is roughly flat (same vertical frequencies).
dot plot- data display placing a dot for each case along a horizontal axis.
histogram- uses adjacent bars to show the distribution of values in a quantitative variable where each bar represents the relative frequency of values falling in an interval of values.
stem plot- also called stem-leaf plot, displays the distribution of a single variable by splitting each data value into its leftmost digits for the stem and the rightmost value for the leaf, preserves the actual data values unlike a histogram…is generally not useful for very large data sets (above a few hundred)
back to back stem plot - compares two data sets using the same stem values
time plot- displays data that change over time…usually values are connected with lines to show trends more clearly
mean- is found by summing all the data values and dividing by the count
non-resistant- mean is non-resistant to extreme values and will be affected by those values by being pulled towards them
median- is the middle value with half of the data above and half below it -the median is resistant to large deviations and is called a robust estimator of center.
midrange- average of max and min values used to find center which is very sensitive to outlying values
quartiles- divide a distribution into 4 parts containing 25% of data…Q1 is the median of the lower half of the data and Q3 is the median of the upper half of the data
IQR-Inter-quartile range measures spread in skewed distributions IQR = Q3 - Q1
five-number summary-contains minimum and maximum, the quartiles Q1 and Q3, and the median.
minimum- lowest data value
maximum- highest data value
box plot- displays the five-number summary as a central box with whiskers that extend to the non-outlying data values…modified box plot also displays outliers using special
variance-sum of the squared 'deviations' from the mean divided by (count-1)
standard deviation- square root of variance