4 min read•june 18, 2024
Harrison Burnside
Jed Quiaoit
Harrison Burnside
Jed Quiaoit
Another way to check a statistical claim is to perform a for the difference in two population proportions. As with any significance test, we have to write hypotheses, check our conditions and then calculate and conclude. 📲
Still lost? Let's do a refresher!
A statistical significance test is used to determine whether the difference between two population proportions is statistically significant, or whether it could have occurred by chance.
To perform a significance test for the difference in two population proportions, you need to first write your null and alternative hypotheses. The null hypothesis states that there is no difference between the two population proportions, while the states that there is a difference.
Next, you need to check that the conditions for the test are met. These include having a large enough and having a random and independent sample.
Once you have checked the conditions, you can calculate the and determine the . The p-value is the probability of obtaining a test statistic as extreme as the one observed, given that the null hypothesis is true. If the p-value is less than the (usually 0.05), you can reject the null hypothesis and conclude that the difference between the two population proportions is statistically significant. If the p-value is greater than the significance level, you cannot reject the null hypothesis and must conclude that the difference is not statistically significant. 😄
The first thing we need to do when setting up a significance test for the difference in two population proportions is to write out our hypotheses. Our null hypotheses will always have our two population proportions being equal, while our alternate has them either greater than, less than or not equal to each other. 🏆
It is also important in this stage of setting up the test to identify what p1 and p2 represent. We have to define our parameters so the reader knows what we are truly comparing.
We also must check our conditions for inference. The same three conditions apply as did for confidence intervals with one little small change in the normal check.
Probably the most important condition is that we need to be sure that both of our samples come from random samples. If we don't take a from our population, then our findings suffer from and we are stuck and we can't generalize our findings to our population. 😞
To check that our sample is independent, we need to make sure that both of our populations are at least 10 times that of our samples. Also, if we are dealing with a randomized experiment, the random assignment of treatments classifies our samples as independently selected. 🔟
When dealing with proportions, we always check our normal condition by using the , which states that our expected successes and failures is at least 10. With a , we have to combine our proportions to create a combined p-hat. This is what we use to find our expected failures and successes. 🎩
Then we have to verify that each of our expected failures and successes are at least 10.
This is because we are using a pooled sample. In this test, you combine the two samples into a single "pooled" sample and calculate a single proportion for the combined sample. The test statistic is then calculated based on the difference between the two proportions and the proportion. 🏊
Let's return to our MJ vs. Lebron problem from earlier... again. Recall that MJ made 836/1623 shots and Lebron made 622/1493 shots. Instead of testing this claim with a , let's test it using a 2 Prop Z Test to verify our results. 🏀
Another great idea when writing our hypotheses is to use meaningful subscripts such as MJ and L that clarify which proportion matches which population.
Next, we have to check our large counts condition using this pooled p-hat.
🎥 Watch: AP Stats - Inference: Hypothesis tests for Proportions