IRV - Inference for Regression on Slope Lesson
IRV - Inference for Regression on Slope Lesson
Earlier in the course we learned a regression technique to create a predictive equation in the form of a line as yhat = a + bx (or even better using the variable names in context). Occasionally you may see the notation written as yhat = β 0 + β1 x when talking about the population. When we generalize our predictive equation to the population we use Greek letters so yhat = a + bx becomes
or possibly
is the y-intercept and β1 represents the SLOPE. Certain descriptive values were used to describe how "well" the line fit the original data (correlation factor r) and exactly how much of the variability in the response variable was due to the change in the explanatory variable (correlation coefficient rsq). What we DID NOT CONSIDER was how accurate our slope was, or rather how accurate our ESTIMATE of the slope was. We know better than to think that the data would line up perfectly on a straight line. Our model is an idealized regression line.
Recall the importance of RESIDUALS which are errors defined as (observed y - predicted yhat). Now we designate the error as ε so ε = (y - yhat). We use the sample based error (SE) to help ASSESS the regression model. Of course, there are assumptions to be made.
We hypothesize that a slope or change rate of ZERO indicates that there really is no relationship between the two variables..
Ho: β1 = 0 there is NO change in slope (slope = 0 in context)
H1: β1 ≠ 0 there IS a change in slope (slope ≠ 0 in context)
Where β1 is the SLOPE of the linear relationship in the population
EXAMPLE: (using computer printout information)
A local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following information: annual electric bill (in $) and home size (in square feet).
Extracting information and drawing conclusions based on computer printouts is a very important skill that will be asked on the AP Stat Exam.
Output from a regression analysis appears below.
http://stattrek.com/AP-Statistics-4/Estimate-Slope.aspx?Tutorial=AP Links to an external site.
Find a 99% confidence interval for the slope of the regression line?
Here are some answer choices but let's work through the problem completely and not simply guess.
(A) 0.25 to 0.85
(B) 0.02 to 1.08
(C) -0.08 to 1.18
(D) 0.20 to 1.30
(E) 0.30 to 1.40
SOLUTION:
Let's do it all...regression equation too. yhat = 15 + .55x or even better using the variable names
[predicted electric bill in $] = 15 + .55 [home size in sq ft]
We can also see other important values in the table like SEslope, test statistic t, and the P-value. The values are .24, 2.29 and .01 respectively. Remember we are doing inference about SLOPE related to the explanatory variable which is home size.
Using our earlier formula, CI = estimate + ME we can edit it to become CI= b1 + (t*)(SE b1). The critical value is the t score having (101 - 2) or 99 degrees of freedom and a cumulative probability equal to 0.005 or (1 - .99)/2.
From the t* table, we find that the critical value is approximately 2.63.
Substituting into what we already know, we get
.55 +/- (2.63)(.24)
.55 +/- .6312
(-.0812, 1.1812)
The correct answer is (C).
INTERPRETATION: Based on this survey we can be 99% confident that the true slope value lies between -.0812 and 1.1812. Observing that this interval contains the ZERO value, it is possible that the slope really does equal zero and there is no relationship. Continue with a test of significance to investigate further.
Doing a hypothesis test for this same data summary
H0: slope = 0 there is no association between electric bill and square footage of home OR the variables are independent
ha: slope not = 0 there IS an association between electric bill and square footage of home OR the variables are NOT independent
Using the printout we can see that the P-value for the slope = .01. This P-value is statistically significant and gives evidence to reject the null hypothesis at both alpha level .02 and .05. We conclude that there is an association between electric bill and square footage of home.
IMAGES CREATED BY GAVS