Verified for the 2025 AP Statistics exam•Last Updated on June 18, 2024
The first thing to decide when you realize you are looking at categorical data with more than one variable is to determine if you want to perform a test for independence or a test for homogeneity.
Once you determine which test is appropriate, the next step is to write your hypotheses. Regardless of the test, be sure to include context in your hypotheses, either by using meaningful subscripts or identifying the parameters of interest. ✍️
The appropriate hypotheses for a chi-square test for homogeneity are:
The appropriate hypotheses for a chi-square test for independence are:
When writing a set of hypotheses for a test for chi-squared test for independence, your null hypothesis is that there is no association between the two categorical variables in your given population. Your alternative hypothesis is that there IS an association between the two categorical variables of interest.
For example, let’s say that we are looking at how our favorite sport affects someone’s grade in an AP Statistics class. We could take a random sample of 100 students from your high school’s AP Statistics class and ask them what is their favorite sport, football, basketball or baseball, along with their letter grade for the class. 🏈
Our hypotheses would be as follows:
Since this problem involves one population (AP Statistics students at XYZ High School), this would require a test for independence.
When writing a set of hypotheses for a test for chi-squared test for homogeneity, your null hypothesis is that there is no difference in the distribution of the categorical variables between population 1 and population 2. The alternate hypothesis would be that there is a difference between the distribution of the categorical variable between the two populations of interest.
For example, if we wanted to observe how the distribution of sports preference differs among AP Statistics students and AP Calculus students, we could take a random sample of 100 Stats students and 100 Calculus students and determine if the distribution of football, baseball, or basketball preference differs between these two groups. ⚾
Our hypotheses would be as follows:
Since this problem involves two populations (AP Statistics students at XYZ High School and AP Calculus students at XYZ High School), this would require a test for homogeneity (we are looking to see if two populations are homogeneous in terms of sports preference)..
A test for homogeneity is also used in a randomized experiment since our sample is creating two “populations.” For instance, individuals receiving new drug treatment & individuals receiving placebo. 💉
Chi-squared tests require two familiar conditions for inference:
When sampling without replacement, we should check the 10% condition for independence (n < 10%N)
For our large counts condition, we need to verify that all of our expected counts are at least 5 (similar to other chi-square test set-ups). 🗼
For our test for independence, we need to verify that our data was collected using a simple random sample.
To verify that your data was collected using a simple random sample, you can check that the following conditions have been met:
For our test for homogeneity, we need to verify that our data was collected using a stratified random sample or treatments were randomly assigned (experimental design).
To verify that your data was collected using a stratified random sample, you can check that the following conditions have been met:
Alternatively, if you are conducting an experimental study, you can verify that treatments were randomly assigned by checking that the following conditions have been met: