<< Hide Menu

📚

 > 

📊 

 > 

📊

5.6 Sampling Distributions for Differences in Sample Proportions

4 min readjune 18, 2024

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

B

Brianna Bukowski

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

B

Brianna Bukowski

Differences (Non-Distribution) Recap

To find the sampling distribution for differences in a sample proportion or mean, remember that variances always add to find the new variance. If one needs the standard deviation, you should take the square root of the variance. However, for means you can just subtract. ➖

Proportion Differences

To find the standard deviation of differences in sample means, divide the variances by each sample size before square rooting to find the overall standard deviation. The simplified formula can be seen below. If you are only given the standard deviations for both samples, you must square both standard deviations, add them up and then take the square root. This can be referred to as the “Pythagorean Theorem of Statistics.” 📐

Source: NEW AP Statistics Formula Sheet

For any Proportion Inference, you must check Large Counts to confirm normality. You can only check Central Limit Theorem for Quantitative Data (Means). 

For a categorical variable, when randomly sampling with replacement from two independent populations with population proportions p1 and p2, the sampling distribution of the difference in sample proportions, p1 - p2, has mean µ = p1 - p2 and standard deviation as shown in the image below.

Additionally, the sampling distribution of the difference in sample proportions p1 - p2 will have an approximate normal distribution provided the sample sizes are large enough:

  • n1p1 > 10
  • n1 (1 - p1) > 10
  • n2p2 > 10
  • n2 (1 - p2) > 10 Here is a review of types of distributions: (Be sure to save this somewhere!) ⭐

Source: The AP Statistics CED

Practice Problem

Suppose that you are conducting a survey to compare the proportion of people in two different cities who support a new public transportation system. You decide to use simple random samples of 1000 people from each city, and you ask them whether or not they support the new system. After collecting the data, you find that 600 people out of the 1000 respondents from City A support the system, and 700 people out of the 1000 respondents from City B support the system. 🚂

a) Calculate the sample proportions of respondents who support the new system in each city.

b) Explain what the sampling distribution for the difference in sample proportions represents and why it is useful in this situation.

c) Suppose that the true population proportion of people in City A who support the new system is actually 0.6, and the true population proportion of people in City B who support the new system is actually 0.7. Describe the shape, center, and spread of the sampling distribution for the difference in sample proportions in this case.

d) Explain why the Central Limit Theorem applies to the sampling distribution for the difference in sample proportions in this situation.

e) Discuss one potential source of bias that could affect the results of this study, and explain how it could influence the estimate. (Hint: slightly different when thinking about working with one sample vs. two samples)

Answer

a) The sample proportion of respondents who support the new system in City A is 600/1000 = 0.6, and the sample proportion of respondents who support the new system in City B is 700/1000 = 0.7.

b) The sampling distribution for the difference in sample proportions represents the distribution of possible values for the difference between the sample proportions if the study were repeated many times. It is useful in this situation because it allows us to make inferences about the difference between the population proportions in the two cities based on the sample data.

c) If the true population proportion of people in City A who support the new system is 0.6, and the true population proportion of people in City B who support the new system is 0.7, the sampling distribution for the difference in sample proportions would be approximately normal with a center at 0.7 - 0.6 = 0.1 and a spread that depends on the sample sizes and the variability of the populations.

d) The Central Limit Theorem applies to the sampling distribution for the difference in sample proportions in this situation because the sample sizes (n1 = 1000 and n2 = 1000) are large enough for the distribution to be approximately normal, even if the populations are not normally distributed.

e) One potential source of bias in this study could be nonresponse bias, which occurs when certain groups of individuals are more or less likely to respond to the survey. For example, if people in City A who support the new system are more likely to respond to the survey, the sample from City A could be biased toward higher levels of support and produce an overestimate of the population proportion. 

On the other hand, if people in City B who do not support the new system are more likely to respond, the sample from City B could be biased toward lower levels of support and produce an underestimate of the population proportion. This could lead to an incorrect estimate of the difference in population proportions between the two cities.