EBD - Module Overview

Math_APStatBanner.png Exploring Bivariate Data Module Overview

Introduction

The most effective way to display the relationship between two quantitative variables is a scatterplot. A scatterplot shows the relationship between two quantitative variables measured on the same individuals. 

2-Variable Data require a 2-dimensional graph using the typical x/y coordinate plane   

First plot the data and then add numerical summaries 

Look for overall patterns and deviations from those patterns     

When the overall pattern is quite regular, use a compact mathematical model (equation) to describe it

The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis.  Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.  If there is no distinction between the explanatory and response variables, you may plot either on the horizontal axis. We will, however, name our variables differently than in other math courses. In statistics the independent variable is called the explanatory variable and the dependent variable is called the response variable. When time is one of the variables it is typically plotted as the independent (explanatory) variable along the x-axis. Summarizing, in this unit you will examine paired data values to determine if any association exists between the two variables. The association will be determined through analysis of the data using graphs and other regression methods. If an association appears to be present then we describe it verbally and numerically using summary stats. Many associations are linear in nature, while others are nonlinear. We will learn how to describe these associations using a mathematical model which is the algebraic equation that describes the relationship. A good model should model the data closely and be a reliable predictor of future data pairs.

Essential Questions

  • Can a coordinate graph, plotting two variables against each other, indicate the existence of a relationship between them?
  • If two variables are related, can we determine how strong the relationship is? 
  • Are all relationships linear in nature?
  • Are all relationships positive in nature?  
  • If not linear, what other types of relationships exist?
  • Does a very strong association prove anything?

Key Terms

The following key terms will help you understand the content in this module.

Response variable- the variable whose values are predicted or described in terms of the explanatory variable.

Explanatory variable- also referred to as independent variable.

Independent variable- also referred to as the predictor dependent variable| also referred to as response variable.

Scatterplot- shows the relationship between two quantitative variables in a 2-dimensional graph.

Positive association- pattern that moves from lower left to upper right.

Negative association- pattern that moves from upper left to lower right.

Linear correlation- 2 dimensional graph resulting in a pattern resembling a line having slope and intercept.

R-value (correlation coefficient)- decimal value between -1 and +1 indicating the strength and direction of a linear association.

Mathematical model- any equation, graph, or diagram to represent a problem regression line| line of best fit to describe the overall trend of the data.

Least squares regression line (LSRL)- linear equation of form y = a + bx resulting from minimizing the sum of the squares on the residuals.

R-square (coefficient of determination)- gives the fraction of the variability in the response variable that is accounted for by the variation in the regression on the explanatory.

Variable residual- difference between the observed value (y)and the predicted value (yhat).

Residual plot- graph of residuals versus the explanatory variable.

Influential observation- outlier that, when removed, dramatically affects the slope of the line.

Outlier- surprise point standing away from the overall pattern of the scatterplot.

Extrapolation- forming questionable conclusions that go beyond the scope of the original data.

Lurking variable- variable that may affect the apparent relationship of two variables when not controlled.

Causation- occurs whenever variable A changes and B changes in a corresponding way for every possible change…a correlation is required but is not sufficient to establish causation which can only be concluded through a well designed experiment.

Confounding- occurs when the effects of two or more predictor variables on a response variable cannot be separated from one another.

Math_OverviewBottomBanner.png