Dimensionality reduction techniques go beyond PCA, offering diverse methods to simplify complex datasets. From non-linear approaches like t-SNE and UMAP to linear methods like LDA and ICA, these tools tackle different data challenges.
Neural network-based techniques like autoencoders provide powerful alternatives for reducing dimensions. These methods, along with matrix factorization and manifold learning, expand our toolkit for handling high-dimensional data effectively.
Manifold Learning Techniques
t-SNE and UMAP: Non-linear Dimensionality Reduction
- t-SNE (t-Distributed Stochastic Neighbor Embedding) is a non-linear dimensionality reduction technique
- Preserves local structure of high-dimensional data in low-dimensional space
- Converts high-dimensional Euclidean distances between data points into conditional probabilities representing similarities
- Minimizes the divergence between joint probabilities in high-dimensional and low-dimensional space using gradient descent
- UMAP (Uniform Manifold Approximation and Projection) is another non-linear dimensionality reduction method
- Constructs a high-dimensional graph representation of the data and optimizes a low-dimensional graph to be as structurally similar as possible
- Assumes the data is uniformly distributed on a Riemannian manifold and tries to learn the manifold's local metric
- Faster than t-SNE and better preserves global structure (clusters at different scales)
Multidimensional Scaling (MDS)
- MDS is a technique used for visualizing the level of similarity of individual cases in a dataset
- Aims to find a low-dimensional representation of the data where the distances between points are preserved as well as possible
- Classical MDS: Uses eigenvector decomposition to preserve pairwise distances exactly in the low-dimensional space
- Non-metric MDS: Preserves the rank order of the pairwise distances (used for ordinal data)
- Stress function measures the discrepancy between the distances in the low-dimensional space and the original dissimilarities
- Applications include visualizing the relationships between objects (cities on a map) or individuals (based on survey responses)
Linear Dimensionality Reduction Methods
Supervised and Unsupervised Linear Methods
- Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction method
- Finds a linear combination of features that best separates the classes
- Projects the data onto a lower-dimensional space while maximizing the separation between classes
- Assumes the data is normally distributed and the classes have equal covariance matrices
- Independent Component Analysis (ICA) is an unsupervised method for separating a multivariate signal into additive subcomponents
- Assumes the subcomponents are non-Gaussian and statistically independent
- Finds a linear transformation that minimizes the statistical dependence between the components
- Applications include blind source separation (cocktail party problem) and feature extraction
Matrix Factorization Techniques
- Non-negative Matrix Factorization (NMF) is a dimensionality reduction technique that factorizes a non-negative matrix into two non-negative matrices
- Finds a low-rank approximation of the original matrix: $V \approx WH$, where $V$, $W$, and $H$ are non-negative
- Interpretable parts-based representation: Each column of $W$ represents a basis vector, and each column of $H$ represents the coefficients
- Applications include image processing (facial recognition), text mining (topic modeling), and recommender systems
Neural Network-based Dimensionality Reduction
Autoencoders
- Autoencoders are neural networks trained to reconstruct their input data
- Consist of an encoder that maps the input to a lower-dimensional latent space and a decoder that reconstructs the input from the latent representation
- Bottleneck layer in the middle has a lower dimensionality than the input, forcing the network to learn a compressed representation
- Types of autoencoders:
- Undercomplete autoencoders: Latent space has lower dimensionality than the input, used for dimensionality reduction
- Regularized autoencoders: Add regularization terms to the loss function to learn more robust representations (sparse, contractive, or denoising autoencoders)
- Variational autoencoders (VAEs): Latent space is constrained to follow a prior distribution (usually Gaussian), enabling generation of new samples