Multivariate Statistics

Understanding relationships between many variables.

In real-world Machine Learning, we rarely deal with a single variable. Data is high-dimensional. Multivariate statistics provides the tools to analyze, visualize, and reduce the complexity of such data.

1. Multivariate Normal Distribution

The generalization of the 1D Gaussian (bell curve) to $d$ dimensions. Instead of a scalar mean $\mu$ and variance $\sigma^2$, we have a vector mean $\mathbf{\mu}$ and a Covariance Matrix $\Sigma$.

$$ f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^k |\Sigma|}} \exp\left(-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T \Sigma^{-1} (\mathbf{x}-\mathbf{\mu})\right) $$

Why is this used in ML?

It is the default assumption for many algorithms, including **Gaussian Mixture Models (GMM)**, **Linear Discriminant Analysis (LDA)**, and **VAE (Variational Autoencoders)** latent spaces.

Code Implementation


# Scipy: Multivariate Normal PDF
from scipy.stats import multivariate_normal
import numpy as np

# Define Mean Vector (2D) and Covariance Matrix
mean = np.array([0, 0])
cov = np.array([[1, 0.5], 
                [0.5, 1]])

# Create distribution
dist = multivariate_normal(mean=mean, cov=cov)

# Calculate Probability Density at point (1, 1)
pdf_val = dist.pdf([1, 1])
# PDF Value: 0.0944
    

2. Covariance & Correlation Matrix

How do two variables change together?

  • Covariance: Unscaled measure of joint variability. Positive means they move together, negative means opposite.
  • Correlation: Scaled version (between -1 and 1).
$$ Cov(X, Y) = E[(X - \mu_X)(Y - \mu_Y)] $$

Code Implementation


# Numpy: Covariance Matrix
import numpy as np

# Data: Height (cm) vs Weight (kg) for 5 people
data = np.array([
    [170, 65],
    [180, 80],
    [160, 55],
    [175, 70],
    [165, 60]
])

# Calculate Covariance (Rowvar=False means columns are variables)
cov_matrix = np.cov(data, rowvar=False)

# Variance of Height: 62.5
# Variance of Weight: 92.5
# Covariance: 75.0
    

3. Eigenvalues & Eigenvectors

For a square matrix $A$, an eigenvector $\mathbf{v}$ is a non-zero vector that changes only in scale (by $\lambda$) when $A$ is applied to it.

$$ A\mathbf{v} = \lambda\mathbf{v} $$

Why is this used in ML?

They represent the **"principal axes"** of the data's rotation or variance. They are the core of **PCA**, **SVD**, and **Google's PageRank** algorithm.

Code Implementation


# Numpy: Eigendecomposition of the Covariance Matrix
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# 1st Eigenvalue (Variance along PC1): 153.99
# 1st Eigenvector (Direction of PC1): [np.float64(-0.63), np.float64(-0.77)]
    

4. PCA (Principal Component Analysis)

A technique to reduce the dimensionality of data while preserving as much variance as possible. It does this by finding the eigenvectors of the covariance matrix (Principal Components).

Code Implementation


# Scikit-Learn: PCA
from sklearn.decomposition import PCA

# Reduce 2D data (Height, Weight) to 1D
pca = PCA(n_components=1)
pca.fit(data)

explained_variance = pca.explained_variance_ratio_[0]
# Explained Variance Ratio: 0.9935
# (Means this 1 dimension captures ~99% of the information)
    

5. SVD (Singular Value Decomposition)

A general matrix factorization method applicable to *any* matrix (not just square ones).

$$ A = U \Sigma V^T $$

Why is this used in ML?

SVD is the computational engine behind **PCA**. It is also used in **Recommender Systems** (Matrix Factorization) and **Natural Language Processing** (LSA/LSI).

Code Implementation


# Numpy: SVD
U, S, Vt = np.linalg.svd(data, full_matrices=False)

# Singular Values: [np.float64(408.35), np.float64(12.31)]
    

6. References & Further Reading