Course Progress
Part of 10 Chapters
Non-parametric Statistics and PCA: Data Analysis Beyond Common Sense
Non-parametric Statistics and PCA: Getting Closer to the Essence of Data
Up until now, we have performed analysis assuming that data follows a beautiful bell shape (normal distribution). However, in reality, there is plenty of data that does not follow normality at all, or where it is difficult to know what is important because there are too many variables.
1. Non-parametric Statistics: Setting Aside Assumptions to Gain Freedom
When the amount of data is too small or the distribution is extremely skewed, we use ‘ranks’ instead of the ‘values’ of the data.
Comparison of Parametric vs. Non-parametric Statistics
| Category | Parametric | Non-parametric |
|---|---|---|
| Assumptions | Follows a normal distribution | No assumptions about the distribution |
| Data Type | Continuous numerical data | Ordinal, rank, nominal data |
| Representative Analysis | t-test, ANOVA | Wilcoxon, Kruskal-Wallis |
| Pros and Cons | High accuracy, but meaningless if assumptions are broken | Slightly lower accuracy, but applicable anywhere |
2. Principal Component Analysis (PCA): Compression and Summarization of Information
It is nearly impossible to analyze data with 100 variables. PCA (Principal Component Analysis) is a magical technique that significantly reduces the number of variables to 2-3 while maintaining as much information as possible.
Since each variable has different units, they are aligned to mean 0 and variance 1.
Create a map of how the variables change together.
Find the 'principal component directions' where the data is most scattered.
Retain only the most important 1st and 2nd principal components and discard the rest.
3. Effect of PCA: Amount of Explained Variance
Below shows how much each principal component explains the entire data when 10 variables are reduced to 5 through PCA.
Proportion of Explained Variance by Principal Component (Scree Plot)
Shows that approximately 75% of the total information can be explained with only the 1st and 2nd principal components.
💡 Professor’s Tip
PCA is not simply about ‘reducing data’; it’s about ‘finding the skeleton of data.’ It provides the insight to find the direction in which truly meaningful signals extend amidst numerous noises.