Multivariate Statistics
Understanding relationships between many variables.
In real-world Machine Learning, we rarely deal with a single variable. Data is high-dimensional. Multivariate statistics provides the tools to analyze, visualize, and reduce the complexity of such data.
1. Multivariate Normal Distribution
The generalization of the 1D Gaussian (bell curve) to $d$ dimensions. Instead of a scalar mean $\mu$ and variance $\sigma^2$, we have a vector mean $\mathbf{\mu}$ and a Covariance Matrix $\Sigma$.
Why is this used in ML?
It is the default assumption for many algorithms, including **Gaussian Mixture Models (GMM)**, **Linear Discriminant Analysis (LDA)**, and **VAE (Variational Autoencoders)** latent spaces.
Code Implementation
# Scipy: Multivariate Normal PDF
from scipy.stats import multivariate_normal
import numpy as np
# Define Mean Vector (2D) and Covariance Matrix
mean = np.array([0, 0])
cov = np.array([[1, 0.5],
[0.5, 1]])
# Create distribution
dist = multivariate_normal(mean=mean, cov=cov)
# Calculate Probability Density at point (1, 1)
pdf_val = dist.pdf([1, 1])
# PDF Value: 0.0944
2. Covariance & Correlation Matrix
How do two variables change together?
- Covariance: Unscaled measure of joint variability. Positive means they move together, negative means opposite.
- Correlation: Scaled version (between -1 and 1).
Code Implementation
# Numpy: Covariance Matrix
import numpy as np
# Data: Height (cm) vs Weight (kg) for 5 people
data = np.array([
[170, 65],
[180, 80],
[160, 55],
[175, 70],
[165, 60]
])
# Calculate Covariance (Rowvar=False means columns are variables)
cov_matrix = np.cov(data, rowvar=False)
# Variance of Height: 62.5
# Variance of Weight: 92.5
# Covariance: 75.0
3. Eigenvalues & Eigenvectors
For a square matrix $A$, an eigenvector $\mathbf{v}$ is a non-zero vector that changes only in scale (by $\lambda$) when $A$ is applied to it.
Why is this used in ML?
They represent the **"principal axes"** of the data's rotation or variance. They are the core of **PCA**, **SVD**, and **Google's PageRank** algorithm.
Code Implementation
# Numpy: Eigendecomposition of the Covariance Matrix
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# 1st Eigenvalue (Variance along PC1): 153.99
# 1st Eigenvector (Direction of PC1): [np.float64(-0.63), np.float64(-0.77)]
4. PCA (Principal Component Analysis)
A technique to reduce the dimensionality of data while preserving as much variance as possible. It does this by finding the eigenvectors of the covariance matrix (Principal Components).
Code Implementation
# Scikit-Learn: PCA
from sklearn.decomposition import PCA
# Reduce 2D data (Height, Weight) to 1D
pca = PCA(n_components=1)
pca.fit(data)
explained_variance = pca.explained_variance_ratio_[0]
# Explained Variance Ratio: 0.9935
# (Means this 1 dimension captures ~99% of the information)
5. SVD (Singular Value Decomposition)
A general matrix factorization method applicable to *any* matrix (not just square ones).
Why is this used in ML?
SVD is the computational engine behind **PCA**. It is also used in **Recommender Systems** (Matrix Factorization) and **Natural Language Processing** (LSA/LSI).
Code Implementation
# Numpy: SVD
U, S, Vt = np.linalg.svd(data, full_matrices=False)
# Singular Values: [np.float64(408.35), np.float64(12.31)]