Principal Component Analysis

Sources: 1, 2

Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables, while retaining most of the sample’s information, and useful for the regression and classification of data. So basically, compression while keeping the substance.

It is indeed an optimization problem where we need to maximize a sum. For example, when you want to project a point $x$ on an unit vector $u$ , we get a new point $x^{'}$ whose magnitude is:

x^{'} = (x^{T} u) u

If $u^{T} u = 1$ and $(x_{i}^{T} u)^{2}$ is the amount of information stored about a point x, then the optimization problem we need to solve is

ma x i \sum (x_{i}^{T} u)^{2}

There are steps in order to achieve PCA and they involve Lagrange functions and eigenvalues and eigenvectors :

Standardization: Ensuring that each variable has a mean of 0 and a standard deviation of 1. $Z = \frac{X - μ}{σ}$

Here,

$μ$ is the mean of independent features,
$σ$ is the standard deviation of independent features.

Covariance Matrix Computation

To find the covariance we can use the formula:

co v (x 1, x 2) = \frac{\sum _{i = 1}^{n} ( Z - μ ) ( Z - μ ) ^{T}}{n - 1}

Compute the Eigenvalues( $λ$ ) and Eigenvectors( $X$ ) of covariance matrix to identify principal components

(A - λ I) X = 0

, where $A - λ I$ needs to be a singular matrix (i.e. non-invertible), so:

∣ A - λ I ∣ = 0

Therefore, we can find the eigenvalues $λ$ by using the equation:

A X = λ X

Further down the line, I will return with some C++ or Python implementation of the PCA. As promised, the code

🚀 Costin Chitic

Recent Notes

SeaClear

Visual Odometry

Simple Online and Real-Time Tracking

Computer Vision for detecting a Metallic Grid and a set reference

Inertial Odometry

Principal Component Analysis

Graph View

Backlinks