Skip to main content
Chapter 6

Multiple Regression Analysis: Variables of a Complex World

#Multiple Regression#Independent Variable#Multicollinearity (VIF)#Adjusted R-squared

Multiple Regression Analysis: One Reason is Rarely Enough

Real-world events are seldom explained by a single cause. For example, apartment prices are determined not just by size (simple regression) but also by age, distance to transit, and school district quality. Statistics addresses this through Multiple Regression Analysis.

1. Structure of the Multiple Regression Equation

y=β0+β1x1+β2x2+...+βkxk+ϵy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k + \epsilon

Each β\beta value represents the unique influence a variable has on the dependent variable (yy) when other variables are held constant, specifically for a one-unit change in that variable.

2. The Conflict Between Variables: Multicollinearity

The most significant pitfall to watch out for in multiple regression analysis is ‘multicollinearity.’ This occurs when independent variables are too strongly correlated with each other.

Case Study: Accounting Data and Multicollinearity

Variable AVariable BStatusProblem
Ad SpendBrand AwarenessHigh CorrelationInability to distinguish which variable contributed to sales
HeightWeightHigh CorrelationOne variable already contains the information of the other
TemperatureHumidityModerate CorrelationCommon scenario, manageable

Important
Multicollinearity causes the statistical significance of individual regression coefficients to decrease and results to become unstable. To check for this, we use the index; typically, a value of 10 or higher indicates a problem.

3. Evaluating Model Quality: Adjusted R2R^2

While simple regression uses the coefficient of determination (R2R^2), multiple regression has a problem where R2R^2 automatically increases as more variables are added. Adjusted R2R^2 corrects for this by penalizing the addition of unnecessary variables.


💡 Professor’s Tip

A good regression model is not one with ‘many variables,’ but one that ‘explains the phenomenon best with the fewest number of key variables.’ This is often called Occam’s Razor, and statisticians measure this efficiency using indices such as AIC or BIC.

🔗 Next Step