Scatterplots_Regression

Overview

Scatterplots and Regression is a medium-to-hard topic in the Problem-Solving and Data Analysis domain on the digital SAT. Questions require students to identify the direction and strength of association between two variables, interpret the slope and y-intercept of a line of best fit in real-world context, compute and interpret residuals, distinguish between linear, quadratic, and exponential models, and recognize the difference between correlation and causation. For the May 2026 SAT, test-takers can expect 2–3 regression questions per Math module, and Desmos can assist with fitting regression lines. These are calculator-active questions.

Key Points

1. Types of Association

Association	Pattern on Scatterplot
Positive	Points trend upward left to right
Negative	Points trend downward left to right
No association	Points scattered with no pattern
Linear	Points cluster around a straight line
Nonlinear	Points cluster around a curve

Strength of association: the closer the points cluster to the trend line or curve, the stronger the association.

2. Line of Best Fit

The line of best fit (least-squares regression line) minimizes the sum of squared vertical distances from all points to the line.

Equation form: y = mx + b

Parameter	Meaning in Context
Slope (m)	Predicted change in y for each 1-unit increase in x
y-intercept (b)	Predicted value of y when x = 0

Slope interpretation (3-step method):

Calculate or read the slope value
Identify the units of both axes
Write: “For every 1 [x-unit] increase in [x-variable], the predicted [y-variable] [increases/decreases] by [|m|] [y-unit]”

Example: If a line of best fit for (study hours, test score) has slope = 4.5: “For every additional hour of study, the predicted test score increases by 4.5 points.”

3. Residuals

$Residual = Actual value - Predicted value$

Residual sign	Meaning
Positive	Actual point is ABOVE the line
Negative	Actual point is BELOW the line
Zero	Actual point is exactly ON the line

A residual plot (residuals vs. x-values) with random scatter indicates a good model fit. A pattern in the residual plot suggests the model is not the best fit.

4. Correlation Coefficient r

r ranges from −1 to +1
r close to +1 → strong positive linear association
r close to −1 → strong negative linear association
r close to 0 → weak or no linear association
|r| ≥ 0.8 → generally considered a strong association

r only measures the strength of linear association. A curved relationship may have r ≈ 0 even though the association is very strong.

5. Choosing the Right Model

Model	Shape	Context clues
Linear	Straight line	Constant rate of change; “increases by X per unit”
Quadratic	Parabola (U or arch)	Projectile motion; “slows then reverses”
Exponential	J-curve or decay	Growth/decay rates; “doubles every,” “half-life”

Desmos regression (for the digital SAT):

Linear: type y₁ ~ mx₁ + b
Quadratic: type y₁ ~ ax₁² + bx₁ + c
Exponential: type y₁ ~ ab^x₁

6. Correlation vs. Causation

A correlation between X and Y does NOT mean X causes Y.

Possible explanations when X and Y are correlated:

X causes Y
Y causes X
A third variable Z causes both X and Y (confounding/lurking variable)
Coincidence

The SAT frequently presents a scenario and asks what can or cannot be concluded. Key language: “cannot be concluded from this study” or “the data suggest but do not prove.”

7. Interpolation vs. Extrapolation

Type	Definition	Reliability
Interpolation	Prediction within the observed data range	More reliable
Extrapolation	Prediction beyond the observed data range	Less reliable; model may not hold

Pitfalls and Common Mistakes

Mistake 1: Confusing correlation with causation. A strong r between two variables does not prove that one causes the other. Fix: Look for a third variable explanation; the SAT answer will specifically say “an association exists” without claiming causation for observational data.

Mistake 2: Misidentifying the sign of a residual. Students confuse which direction is positive. Fix: Residual = Actual − Predicted. If the point is above the line, actual > predicted, so residual > 0.

Mistake 3: Interpreting the y-intercept as meaningful when x = 0 is outside the data range. For a model of adult heights vs. ages, the y-intercept (age = 0) may not make real-world sense. Fix: Note whether x = 0 falls within the data range; if not, the y-intercept is a mathematical artifact, not a meaningful prediction.

Mistake 4: Assuming a high |r| means the model is linear. r measures linear association only. A quadratic model may fit better even if r is moderate. Fix: Always inspect the shape of the scatterplot before choosing a model.

Mistake 5: Extrapolating far beyond the data and trusting the prediction. The line of best fit may not hold outside the data range. Fix: Flag any prediction as extrapolation when it is outside the observed x-values, and treat it with caution.

Quick Reference Card

Concept	Formula / Rule
Residual	Actual − Predicted (positive = above line)
Slope interpretation	Change in y per 1-unit increase in x
r range	−1 ≤ r ≤ +1; closer to ±1 = stronger
Correlation ≠ causation	Association only; causation requires experiment
Linear model	y = mx + b (constant rate)
Exponential model	y = ab^x (percent-based growth/decay)
Interpolation	Within data range — reliable
Extrapolation	Beyond data range — unreliable
Desmos linear regression	y₁ ~ mx₁ + b

SAT 2026 知识库

Explorer

Scatterplots_Regression

Overview

Key Points

1. Types of Association

2. Line of Best Fit

3. Residuals

4. Correlation Coefficient r

5. Choosing the Right Model

6. Correlation vs. Causation

7. Interpolation vs. Extrapolation

Pitfalls and Common Mistakes

Quick Reference Card

Graph View

Table of Contents

Backlinks

SAT 2026 知识库

Explorer

Scatterplots_Regression

Overview

Key Points

1. Types of Association

2. Line of Best Fit

3. Residuals

4. Correlation Coefficient r

5. Choosing the Right Model

6. Correlation vs. Causation

7. Interpolation vs. Extrapolation

Pitfalls and Common Mistakes

Related Entries

Quick Reference Card

Graph View

Table of Contents

Backlinks