Skip to main content
Chapter 4

Correlation and Regression: The Science of Relationships

#Correlation (r)#Causation#Regression Line#R² (Coefficient of Determination)

Correlation and Regression: Connecting the Dots

Data points never exist in isolation. Many pairs—such as height and weight, education and income, or temperature and ice cream sales—influence or vary with each other. Statistics quantifies these ‘relationships.‘

1. Correlation

Correlation indicates the extent to which two variables change together. The Correlation Coefficient (rr) ranges from -1 to +1.

  • r>0r > 0 (Positive Correlation): Both increase or decrease together.
  • r<0r < 0 (Negative Correlation): As one increases, the other decreases.
  • r=0r = 0 (No Correlation): No apparent relationship between the two.

Important
There is a strong positive correlation between ice cream sales and drowning incidents, but eating ice cream doesn’t cause drowning. A third variable, ‘Summer (High Temperature),’ affects both phenomena.

2. Simple Linear Regression

While correlation shows “there is a link,” regression analysis shows “how strong the link is, and how we can predict the future with it.”

y=ax+by = ax + b

It’s the process of finding the single best-fitting line (the Regression Line) that explains the relationship between data points.

Study Hours vs. Exam Scores (Sample Data)

Study Hours (x)Actual Score (y)Predicted Score (ŷ)Error (y-ŷ)
2 hrs5558-3
5 hrs7573+2
8 hrs9288+4
10 hrs9598-3

3. Accuracy of the Model: R2R^2

R2R^2 (R-squared) measures how much of the variation in the actual data is explained by our regression line. The closer it is to 1, the higher the predictive power of the model.


💡 Professor’s Tip

Regression analysis is the root of almost all quantitative analysis, from calculating Beta (β\beta) in finance to predicting risk rates in actuarial science. Modern machine learning algorithms are, at their core, sophisticated extensions of this fundamental regression analysis.

🔗 Next Step