<< Hide Menu
6 min read•june 18, 2024
Avanish Gupta
Jed Quiaoit
Avanish Gupta
Jed Quiaoit
In a bivariate quantitative data set, we often have two sets of quantitative data that are related or dependent in some way. One of the variables, referred to as the "independent" or "explanatory" (x) variable, is thought to have an effect on the other variable, which is referred to as the "dependent" or "response" (y) variable. The explanatory variable is often used to explain or predict the value of the response variable.
For example, in a study examining the relationship between age and blood pressure, age might be the explanatory variable and blood pressure the response variable. In this case, the value of the explanatory variable (age) might be used to predict the value of the response variable (blood pressure).
We can organize this data into scatterplots, which is a graph of the data. On the horizontal axis (also called the x-axis) is the explanatory variable and on the vertical axis is the response variable. The explanatory variable is also known as the independent variable, while the response variable is the dependent variable. Here are two examples below:
The form of a scatterplot refers to the general shape of the plotted points on the graph. A scatterplot may have a linear form, in which the points form a straight line, or a curved form, in which the points follow a curved pattern. The form of a scatterplot can be useful for understanding the relationship between the two variables and for identifying patterns or trends in the data. ✊
For example, a scatterplot with a linear form might indicate a strong, positive relationship between the two variables, where an increase in one variable is associated with an increase in the other. A scatterplot with a curved form might indicate a nonlinear relationship between the two variables, such as a quadratic relationship, where the relationship between the variables is not a straight line.
In the scatterplot above, Graph 1 is best described as curved, while Graph 2 is obviously linear.
The direction of the scatterplot is the general trend that you see when going left to right. Graph 1 is decreasing as the values of the response variable tend to go down from left to right while graph 2 is increasing as the values of the response variable tend to go up from left to right. ➡️
In a linear model, the direction of the relationship between two variables is often described in terms of positive or negative correlation. Positive correlation means that as one variable increases, the other variable also tends to increase. Negative correlation means that as one variable increases, the other variable tends to decrease.
The slope of the line that fits the data can be used to determine the direction of the correlation. If the slope is positive, the correlation is positive, and if the slope is negative, the correlation is negative.
For example, consider a linear model that shows the relationship between age and height. If the slope of the line is positive, it indicates that as age increases, height tends to increase as well. This would indicate a positive correlation between age and height. On the other hand, if the slope of the line is negative, it would indicate a negative correlation between age and height, where an increase in age is associated with a decrease in height.
The strength of a scatterplot describes how closely the points fit a certain model, and it can either be strong, moderate, or weak. How we figure this out numerically will be on the next section about correlation and the correlation coefficient. In our case, Graph 1 shows a medium strength correlation while Graph 2 shows a strong strength correlation. 🥋
Lastly, we have to discuss unusual features on a scatterplot. The two types you should know are clusters and outliers, which are similar to their single-variable counterparts. 👽
Clusters are groups of points that are close together on the scatterplot. They may indicate that there are subgroups or patterns within the data that are different from the overall trend.
Outliers are points that are far from the other points on the scatterplot and may indicate unusual or unexpected values in the data. Outliers can be caused by errors in data collection or measurement, or they may indicate a genuine difference in the population being studied.
It's important to consider unusual features on a scatterplot when analyzing the data, as they can influence the interpretation of the relationship between the two variables and the results of statistical analyses.
Describe the scatterplot in context of the problem.
**Notice that this response is IN CONTEXT of the problem. This is a great way to maximize your credit on the AP Statistics exam.
🎥 Watch: AP Stats - Scatterplots and Association[
© 2024 Fiveable Inc. All rights reserved.