Scatter plots help us visualize relationships between variables. We can see if they're connected positively, negatively, or not at all. They also show us if the relationship is linear or nonlinear, which is crucial for understanding data patterns.

finds the best-fitting straight line through our data points. This line helps us make predictions and understand how changes in one variable affect another. We can also measure how well our model fits the data using tools like .

Scatter Plots and Linear Relationships

Scatter plots for variable relationships

Top images from around the web for Scatter plots for variable relationships
Top images from around the web for Scatter plots for variable relationships
  • Graphical representation of data points on a coordinate plane
    • Each point represents a pair of values for two variables (x and y)
    • plotted on x-axis, plotted on y-axis
  • Visualize relationship between two variables
    • : As x increases, y tends to increase
    • : As x increases, y tends to decrease
    • No : No apparent relationship between x and y
  • Assess strength of correlation visually
    • Strong correlation: Data points closely follow clear pattern
    • Weak correlation: Data points more scattered and deviate from pattern
  • Identify potential that may affect the overall relationship

Linear vs nonlinear relationships

  • Linear relationships:
    • Data points in appear to follow straight line
    • Change in y proportional to change in x
    • Example: Relationship between distance traveled and time at constant speed
  • Nonlinear relationships:
    • Data points in scatter plot do not follow straight line pattern
    • Change in y not proportional to change in x
    • Examples:
      • Exponential: Relationship between population growth and time
      • Quadratic: Relationship between height of thrown object and time
      • Logarithmic: Relationship between perceived loudness and actual intensity of sound

Linear Regression and Predictions

Line of best fit interpretation

  • or minimizes sum of squared distances between line and data points
  • Equation of line of best fit given by y=mx+by = mx + b
    • mm: slope of line, represents change in y per unit change in x
    • bb: , represents value of y when x is zero
  • Calculate slope (mm) and y-intercept (bb) using formulas:
    • m=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2m = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}
    • b=yˉmxˉb = \bar{y} - m\bar{x}
      • xˉ\bar{x} and yˉ\bar{y}: means of x and y values
      • xix_i and yiy_i: individual data points
      • nn: number of data points
  • Use line of best fit to make predictions about dependent variable (y) for given value of independent variable (x)
  • represent the difference between observed and predicted y-values

Linear models for predictions

  • Make prediction using linear model:
    1. Determine equation of line of best fit (y=mx+by = mx + b)
    2. Substitute given x-value into equation to calculate predicted y-value
  • Assess accuracy of linear model using (R2R^2)
    • R2R^2: proportion of variance in dependent variable explained by linear model
    • R2R^2 ranges from 0 to 1, values closer to 1 indicate better fit
    • R2=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
      • yiy_i: actual y-value for given x-value
      • y^i\hat{y}_i: predicted y-value for given x-value
      • yˉ\bar{y}: mean of y-values
  • Limitations of linear models:
    • May not be appropriate for nonlinear relationships
    • (making predictions outside range of observed data) can lead to inaccurate results
    • (making predictions within the range of observed data) is generally more reliable

Measures of Model Fit and Correlation

  • : Measures the average deviation of observed y-values from the predicted y-values
  • : Measures the strength and direction of the between two variables
  • Both measures provide additional insight into the accuracy and reliability of the linear model

Key Terms to Review (24)

Coefficient of Determination: The coefficient of determination, denoted as $R^2$, is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a linear regression model. It provides an indication of the goodness of fit of the regression line to the observed data.
Correlation: Correlation is a statistical measure that describes the degree and direction of the linear relationship between two variables. It quantifies how changes in one variable are associated with changes in another variable.
Dependent Variable: The dependent variable is the outcome or response variable that is measured or observed in a study. It is the variable that depends on or is influenced by the independent variable.
Exponential Relationship: An exponential relationship is a mathematical function where the independent variable is the exponent, and the dependent variable grows or decays at a rate that is proportional to its current value. This type of relationship is characterized by rapid, accelerating growth or decay over time.
Extrapolation: Extrapolation is the process of using a known set of data or information to estimate or predict values or outcomes beyond the original range of observation. It involves extending a trend or pattern observed within a dataset to make inferences about future or unobserved values.
Independent Variable: The independent variable is the variable that is manipulated or changed in an experiment or study to observe its effect on the dependent variable. It is the factor that the researcher has control over and deliberately varies to measure its impact on the outcome.
Interpolation: Interpolation is the process of estimating or predicting the value of a variable between two known data points. It is a technique used to estimate the value of a function or a set of data at an intermediate point based on the values at surrounding points.
Least Squares Regression Line: The least squares regression line is a statistical method used to fit a linear model to a set of data points by minimizing the sum of the squared differences between the observed values and the predicted values. It is a widely used technique for analyzing the relationship between two or more variables.
Line of Best Fit: The line of best fit, also known as the regression line, is a line that best represents the relationship between two variables in a scatter plot. It is used to make predictions and analyze the strength of the linear relationship between the variables.
Linear Regression: Linear regression is a statistical technique used to model the linear relationship between a dependent variable and one or more independent variables. It is a widely used method for fitting a straight line to a set of data points in order to predict or estimate the value of the dependent variable based on the values of the independent variables.
Linear Relationship: A linear relationship is a mathematical association between two variables where the change in one variable is proportional to the change in the other variable. This type of relationship can be represented by a straight line on a graph, indicating a constant rate of change between the variables.
Logarithmic Relationship: A logarithmic relationship is a mathematical relationship between two variables where one variable is the logarithm of the other. This type of relationship is often observed in natural phenomena and is used to model exponential growth or decay processes.
Negative Correlation: Negative correlation is a statistical relationship between two variables where an increase in one variable is associated with a decrease in the other variable. It indicates an inverse or opposite relationship between the variables.
Nonlinear Relationship: A nonlinear relationship is a type of relationship between two variables where the change in one variable is not proportional to the change in the other variable. This means the relationship between the variables is not linear and cannot be represented by a straight line.
Outliers: Outliers are data points that lie an abnormal distance from other values in a data set. They are observations that are numerically distant from the rest of the data, and can have a significant impact on statistical analyses and the fitting of linear models.
Pearson Correlation Coefficient: The Pearson correlation coefficient is a statistical measure that quantifies the linear relationship between two variables. It ranges from -1 to 1, with -1 indicating a perfect negative correlation, 0 indicating no correlation, and 1 indicating a perfect positive correlation.
Positive Correlation: Positive correlation is a statistical relationship where two variables tend to move in the same direction. As one variable increases, the other variable also increases, and vice versa. This relationship is commonly observed when analyzing data and fitting linear models.
Quadratic Relationship: A quadratic relationship is a mathematical relationship between two variables where one variable is a function of the square of the other variable. This type of relationship is commonly represented by a parabolic curve and is often seen in various scientific and real-world applications.
R-squared: R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a linear regression model. It is a key metric used to assess the goodness of fit of a linear regression model.
Residuals: Residuals, in the context of statistical modeling, refer to the differences between the observed values and the predicted or fitted values from a model. They represent the unexplained portion of the data and provide insights into the quality and fit of the model.
Scatter Plot: A scatter plot is a type of data visualization that displays the relationship between two numerical variables by plotting individual data points on a coordinate plane. It allows for the identification of patterns, trends, and the strength of the association between the variables.
Slope-Intercept Form: The slope-intercept form is a way to express the equation of a linear function, where the slope and y-intercept of the line are explicitly shown. This form allows for easy interpretation of the line's behavior and is widely used in the study of linear functions, their graphs, and the fitting of linear models to data.
Standard Error of Estimate: The standard error of estimate is a measure of the accuracy of predictions made with a regression line. It represents the average amount that the observed values vary from the predicted values of the dependent variable.
Y-intercept: The y-intercept is the point where a line or curve intersects the y-axis, representing the value of the function when the independent variable (x) is equal to zero. It is a critical parameter that describes the behavior of various functions, including linear, quadratic, polynomial, and exponential functions.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.