Multivariate Statistics

Understanding relationships between many variables.

In real-world Machine Learning, we rarely deal with a single variable. Data is high-dimensional. Multivariate statistics provides the tools to analyze, visualize, and reduce the complexity of such data.

1. Multivariate Normal Distribution

The generalization of the 1D Gaussian (bell curve) to $d$ dimensions. Instead of a scalar mean $\mu$ and variance $\sigma^2$, we have a vector mean $\mathbf{\mu}$ and a Covariance Matrix $\Sigma$.

f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^k |\Sigma|}} \exp\left(-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T \Sigma^{-1} (\mathbf{x}-\mathbf{\mu})\right)

Why is this used in ML?

It is the default assumption for many algorithms, including **Gaussian Mixture Models (GMM)**, **Linear Discriminant Analysis (LDA)**, and **VAE (Variational Autoencoders)** latent spaces.

Code Implementation


# Scipy: Multivariate Normal PDF
from scipy.stats import multivariate_normal
import numpy as np

# Define Mean Vector (2D) and Covariance Matrix
mean = np.array([0, 0])
cov = np.array([[1, 0.5], 
                [0.5, 1]])

# Create distribution
dist = multivariate_normal(mean=mean, cov=cov)

# Calculate Probability Density at point (1, 1)
pdf_val = dist.pdf([1, 1])
# PDF Value: 0.0944

2. Covariance & Correlation Matrix

How do two variables change together?

Covariance: Unscaled measure of joint variability. Positive means they move together, negative means opposite.
Correlation: Scaled version (between -1 and 1).

Cov(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]

Code Implementation


# Numpy: Covariance Matrix
import numpy as np

# Data: Height (cm) vs Weight (kg) for 5 people
data = np.array([
    [170, 65],
    [180, 80],
    [160, 55],
    [175, 70],
    [165, 60]
])

# Calculate Covariance (Rowvar=False means columns are variables)
cov_matrix = np.cov(data, rowvar=False)

# Variance of Height: 62.5
# Variance of Weight: 92.5
# Covariance: 75.0

3. Eigenvalues & Eigenvectors

For a square matrix $A$, an eigenvector $\mathbf{v}$ is a non-zero vector that changes only in scale (by $\lambda$) when $A$ is applied to it.

A\mathbf{v} = \lambda\mathbf{v}

Why is this used in ML?

They represent the **"principal axes"** of the data's rotation or variance. They are the core of **PCA**, **SVD**, and **Google's PageRank** algorithm.

Code Implementation


# Numpy: Eigendecomposition of the Covariance Matrix
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# 1st Eigenvalue (Variance along PC1): 153.99
# 1st Eigenvector (Direction of PC1): [np.float64(-0.63), np.float64(-0.77)]

4. PCA (Principal Component Analysis)

A technique to reduce the dimensionality of data while preserving as much variance as possible. It does this by finding the eigenvectors of the covariance matrix (Principal Components).

Code Implementation


# Scikit-Learn: PCA
from sklearn.decomposition import PCA

# Reduce 2D data (Height, Weight) to 1D
pca = PCA(n_components=1)
pca.fit(data)

explained_variance = pca.explained_variance_ratio_[0]
# Explained Variance Ratio: 0.9935
# (Means this 1 dimension captures ~99% of the information)

5. SVD (Singular Value Decomposition)

A general matrix factorization method applicable to *any* matrix (not just square ones).

A = U \Sigma V^T

Why is this used in ML?

SVD is the computational engine behind **PCA**. It is also used in **Recommender Systems** (Matrix Factorization) and **Natural Language Processing** (LSA/LSI).

Code Implementation


# Numpy: SVD
U, S, Vt = np.linalg.svd(data, full_matrices=False)

# Singular Values: [np.float64(408.35), np.float64(12.31)]

Multivariate Statistics

1. Multivariate Normal Distribution

Why is this used in ML?

Code Implementation

2. Covariance & Correlation Matrix

Code Implementation

3. Eigenvalues & Eigenvectors

Why is this used in ML?

Code Implementation

4. PCA (Principal Component Analysis)

Code Implementation

5. SVD (Singular Value Decomposition)

Why is this used in ML?

Code Implementation

6. References & Further Reading