import seaborn as sns
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
AE 16: Principal component analysis
In this application exercise, we will:
Learn about Principal Component Analysis
Load the Penguins Dataset: Import and explore the dataset to understand its structure and the features available for analysis.
Preprocess the Data: Clean the data by handling missing values and standardize the numerical features for PCA.
Perform PCA: Apply Principal Component Analysis to reduce the dimensionality of the data and extract the principal components.
Visualize the PCA Result: Create a scatter plot of the principal components to visualize the clustering of different penguin species.
Learn about PCA
Exercise 1
Watch this video on Principal Component Analysis:
- What were three takeaways from this video? Include how you think linear algebra contributes to PCA:
Add response here.
PCA in Python
Packages
We will primarily use the seaborn
and sklearn
packages.
Exercise 2
Load the Penguins Dataset using seaborn
# add code here
Exercise 3
Preprocess the data
We need to handle missing values and select the numerical features for PCA.
# add code here
Exercise 4
Perform PCA
Use PCA
from sklearn
to reduce the dimensionality of the data. Hint: use two principal components
# add code here
Exercise 5
Visualize the PCA Result
Use seaborn
to visualize the principal components.
# add code here