IRV - More Uses for the Chi-square Distribution Lesson

More Uses for the Chi-square Distribution Lesson

The textbook reading and exercises involve 3 other types of chi-square tests.

Goodness of Fit

The Goodness of Fit test determines whether the actual values observed "match" the expected values resulting from a predictive model. This test is used when we have "expectations" for counts based on "claims", or expectations for population percentages based on demographical information, or even expectations based on the zodiac calendar which is known to have 12 birth signs.

Wording of hypotheses could appear as:

H0: births are uniformly distributed over the 12 zodiac signs

Ha: births are not uniformly distributed over the 12 zodiac signs

H0: % color distribution of M&M's package is as the company advertises

Ha: % color distribution of M&M's package is different from the companys claim

Since we are still performing significance tests, it is necessary to begin each test by satisfying conditions. Of course, some of the conditions are logical: quantitative data and independent observations obtained from a random sample. A NEW and important condition is that the data be in COUNTED form, not percent form.

Because of the formula required for this test, there are expectations about the values that are contained in the cells of the table or matrix.

X squared sigma formula

Since the formula calls for dividing by expected values it would be disastrous to the math if any cell were equal to ZERO. In fact, the calculations are only meaningful when the expected cell count is at least five (5).

To calculate the expected counts for each cell, multiply the row total by the column total, and divide by the table total:

expected count formula

In reality, we will be letting the calculator do these computations for us.

Repeating...the chi-square statistic is the SUM over all r x c cells in the table:

chi square statistic

The degrees of freedom is the product of (r - 1)(c - 1) where r is the number of rows and c is the number of columns. As with other test statistics, a table of values for the chi-square probabilities exists. The P-value is the area to the right of the   statistic under the chi-square density curve. Once you understand how the calculations are done, we will rely on the calculator to do the math involved.

You will have an opportunity to investigate the use of the formula for Goodness of Fit based on an M&M's Investigative Task. Two other types of chi-square tests exist depending on the amount of data being compared.

Homogeneity

The test for Homogeneity is "evaluating the equality" of several populations of categorical data. The test asks whether 3 or more  populations are equal  with respect to some characteristic, making table form ideal for organizing and displaying the data. Mathematically speaking we can call the storage arrangement a MATRIX. For calculator assisted testing, the data is expected in matrix form. Luckily, we only enter one matrix (observed values) and let the calculator determine the expected values matrix.

Since we are testing over 3 or more groups, the hypotheses are written as:

H0: the distribution does not change from group to group (however many) , or all groups are equal (HOMOGENEOUS)

Ha: the distributions are DIFFERENT for one or more groups

Independence

The test for Independence is used to determine if two categorical variables are related in some way or not. The hypotheses statements clearly reflect that goal. This test is limited to considering TWO variables in table form, one variable represented by the columns, and the other represented by the rows.

Voting records from any-town USA are presented in the table:

voting status  

Hypotheses:

H0: voting habits and gender are independent (not related)

Ha: voting habits and gender are not independent (related)

Don't get too confused...inference is inference...and significance tests are significance tests...what changes is the shape of the distribution which controls the areas and probability, along with a change in wording of the hypotheses.

Perhaps a complete EXAMPLE will help to demonstrate how all this fits together and how the calculator can be used for the computations.

EXAMPLE

A recent experiment investigated the relationship between smoking and urinary incontinence. Of 322 subjects in the study who were incontinent, 113 were smokers, 51 were former smokers, and 158 had never smoked. Of 284 control subjects who were not incontinent, 68 were smokers, 23 were former smokers, and 193 had never smoked. Do a significance test to see if there is a relationship between smoking and incontinence. Representing all that information in table form looks like:

urinary status 

Conditions:
Data is counted, all cells > 5, assume random and independent data values

Hypotheses:

H0: smoking status and incontinence are independent

Ha: smoking status and incontinence are not independent

Calculator assisted solution:

Calculator1.png Calculator2.png Calculator3

Chi Square = 22.98, df = 2, p < .0001

Conclusion:
A P-value < .0001 is statistically significant at any alpha level and provides VERY STRONG evidence against the null hypothesis. We reject the null hypothesis and conclude that smoking status and incontinence are NOT independent but are related in some way. Note: since this possible connection is suggested by the data, further investigation is warranted to determine the exact relationship between the two variables.

Math_APStatisticsBottomBanner.png

IMAGES CREATED BY GAVS