Course Progress
Part of 10 Chapters
Correlation and Regression: The Science of Relationships
Correlation and Regression: Connecting the Dots
Data points never exist in isolation. Many pairs—such as height and weight, education and income, or temperature and ice cream sales—influence or vary with each other. Statistics quantifies these ‘relationships.‘
1. Correlation
Correlation indicates the extent to which two variables change together. The Correlation Coefficient () ranges from -1 to +1.
- (Positive Correlation): Both increase or decrease together.
- (Negative Correlation): As one increases, the other decreases.
- (No Correlation): No apparent relationship between the two.
Important
There is a strong positive correlation between ice cream sales and drowning incidents, but eating ice cream doesn’t cause drowning. A third variable, ‘Summer (High Temperature),’ affects both phenomena.
2. Simple Linear Regression
While correlation shows “there is a link,” regression analysis shows “how strong the link is, and how we can predict the future with it.”
It’s the process of finding the single best-fitting line (the Regression Line) that explains the relationship between data points.
Study Hours vs. Exam Scores (Sample Data)
| Study Hours (x) | Actual Score (y) | Predicted Score (ŷ) | Error (y-ŷ) |
|---|---|---|---|
| 2 hrs | 55 | 58 | -3 |
| 5 hrs | 75 | 73 | +2 |
| 8 hrs | 92 | 88 | +4 |
| 10 hrs | 95 | 98 | -3 |
3. Accuracy of the Model:
(R-squared) measures how much of the variation in the actual data is explained by our regression line. The closer it is to 1, the higher the predictive power of the model.
💡 Professor’s Tip
Regression analysis is the root of almost all quantitative analysis, from calculating Beta () in finance to predicting risk rates in actuarial science. Modern machine learning algorithms are, at their core, sophisticated extensions of this fundamental regression analysis.