AAD - Boxplots and Advanced Data Comparisons Lesson

Math_Lesson_TopBanner.png Box plots and Advanced Data Comparisons Lesson

BOTH mean and median are used to describe the center of distributions.   Mean works well for symmetric distributions while median is the better choice for skewed distributions or those with outliers.   Of course we also want to consider spread, or variability, which we have previously referred to as the "range" (maximum - minimum). For symmetric distributions using means, we associate the standard deviation as the measure of spread.   BUT to measure spread with respect to a median we need a new method.

Another summary exists centering around the middle 50% of the data.  We will use quartiles for this.  

Five Number Summary  

Minimum

Q1: 25% of data below - median of the lower half

Q2: Median

Q3: 25% of data above - median of the upper half

Maximum

When plotted as a graph this information translates into a "box plot."     Box plots show less detail than histograms or stem plots but are excellent for showing side by side comparisons of data. The IQR (Interquartile Range) = Q3 - Q1 and is used in the definition of outlier.   If a data point is between Q1 and Q3 it is NOT unusually high or low.

BY DEFINITION: An observation is an OUTLIER if it falls MORE THAN 1.5 times the IQR above Q3 or below Q1.

Symmetry can be observed when Q1 and Q3 are EQUIDISTANT from the median.

A modified box plot (showing outliers as points) will be our intent when we say box plot.  When using the "trace" command on the graphing calculator we observe all the values for the 5 number summary.

Please view this video to learn more about some '"special" features of distributions.

All About Distribution Review 

Every math course strives to encourage mathematical communication through the same methods...words, pictures, and symbols.  Statistics is no different. Data display VARIABILITY.  The "pattern" of the variability is the distribution.

You may present data visually by using graphs, followed by a verbal description, and concluding with a numerical summary.   It is a good idea to use more than one descriptive METHOD including tabulargraphicalverbal, and numerical. To describe a distribution with numbers we consider shape, center, spread, and any unusual features like gaps or outliers.  These characteristics give a good description of the overall pattern.  The acronym to help us remember is SOCS:   shape, outliers, center, spread.   We already know about shape: symmetric or skewed.  The most common measure of center is our ordinary arithmetic average (MEAN). 

Math_APStatisticsBottomBanner.png