Light

➗Linear Algebra for Data Science Unit 10 – Tensors: Multi-dimensional Data Structures

Tensors are multi-dimensional arrays that extend vectors and matrices to higher dimensions. They're crucial in linear algebra, physics, and computer science, providing a powerful framework for representing and manipulating complex data structures. In machine learning and data science, tensors are fundamental for processing multi-dimensional data like images, videos, and text. They enable efficient computation on modern hardware, making them essential for neural networks and deep learning algorithms.

Study Guides for Unit 10

10.1

Introduction to tensors and multi-dimensional data

5 min read

10.2

Tensor operations and decompositions

3 min read

10.3

Tucker and CP decompositions

4 min read

10.4

Applications in recommendation systems and computer vision

4 min read

What Are Tensors?

Tensors are multi-dimensional arrays that generalize vectors and matrices to higher dimensions
Provide a powerful framework for representing and manipulating complex, high-dimensional data structures
Consist of a collection of numerical values arranged in a grid-like format with a specific number of axes or dimensions (ranks)
Fundamental mathematical objects in linear algebra, physics, and computer science
Essential tools in machine learning and deep learning for representing and processing data (images, videos, and natural language)
Offer a concise and expressive notation for describing mathematical operations and transformations on multi-dimensional data
Enable efficient computation and parallelization of large-scale numerical computations on modern hardware (GPUs and TPUs)

Tensor Basics and Notation

Tensors are denoted using bold uppercase letters ( $\mathbf{A}$ , $\mathbf{B}$ , $\mathbf{C}$ )
The number of dimensions or axes in a tensor is called its rank or order
- Scalar: rank-0 tensor, a single numerical value (3, -1.5)
- Vector: rank-1 tensor, a 1D array of values ( $\mathbf{v} = [1, 2, 3]$ )
- Matrix: rank-2 tensor, a 2D array of values arranged in rows and columns ( $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ )
- Higher-order tensors: rank-3 and above, multi-dimensional arrays (RGB image as a 3D tensor with dimensions height × width × color channels)
The shape of a tensor describes the size of each dimension (vector of length 3, matrix of size 2×2, 3D tensor of shape 256×256×3)
Elements of a tensor are accessed using index notation ( $a_{ij}$ for a matrix element at row i and column j, $x_{ijk}$ for a 3D tensor element)
Einstein summation convention simplifies tensor notation by implicitly summing over repeated indices ( $c_i = \sum_j a_{ij}b_j$ written as $c_i = a_{ij}b_j$ )

Tensor Operations and Algebra

Addition and subtraction: element-wise operations between tensors of the same shape ( $\mathbf{A} + \mathbf{B}$ , $\mathbf{C} - \mathbf{D}$ )
Scalar multiplication: multiplying a tensor by a scalar value ( $\alpha\mathbf{A}$ )
Tensor product or outer product: multiplies two tensors to create a higher-order tensor ( $\mathbf{A} \otimes \mathbf{B}$ $A \otimes B$ )
- Outer product of two vectors ( $\mathbf{u} \otimes \mathbf{v}$ ) produces a matrix ( $\mathbf{A}_{ij} = u_i v_j$ )
- Outer product of a matrix and a vector ( $\mathbf{A} \otimes \mathbf{v}$ ) produces a 3D tensor
Tensor contraction: generalizes matrix multiplication to higher-order tensors by summing over pairs of indices ( $\mathbf{C}_{ik} = \sum_j \mathbf{A}_{ij}\mathbf{B}_{jk}$ )
Transpose: swaps the order of two dimensions in a tensor ( $\mathbf{A}^T_{ij} = \mathbf{A}_{ji}$ for matrices)
Reshaping: changes the shape of a tensor while preserving its total number of elements (reshaping a 2×3 matrix into a 6-element vector)
Slicing and indexing: extracting subtensors or specific elements from a tensor ( $\mathbf{A}[1:3, :]$ for selecting rows 1 and 2 of a matrix)

Tensors in Machine Learning and Data Science

Tensors are the fundamental data structures used to represent and process multi-dimensional data in machine learning and deep learning
Neural networks operate on tensors, with weights and biases stored as rank-2 tensors (matrices) and activations as rank-1 tensors (vectors)
- Input data (images, text, audio) are represented as tensors and transformed by the network layers
- Convolutional neural networks (CNNs) use rank-4 tensors to represent filters and feature maps
- Recurrent neural networks (RNNs) use rank-3 tensors to represent sequences and hidden states
Tensor operations are used to define the forward and backward passes of neural networks
- Matrix multiplication (tensor contraction) for linear transformations between layers
- Element-wise operations (addition, activation functions) for non-linear transformations
- Gradient computation and backpropagation using tensor algebra
In data science, tensors can represent multi-dimensional datasets (e.g., time series data, geospatial data, social networks)
- Tensor decomposition techniques (CP decomposition, Tucker decomposition) for dimensionality reduction, feature extraction, and data compression
- Higher-order extensions of matrix factorization and PCA for analyzing multi-way data

Tensor Decomposition Techniques

Tensor decomposition methods generalize matrix decomposition techniques (SVD, PCA) to higher-order tensors
Canonical Polyadic (CP) decomposition (also known as PARAFAC or CANDECOMP)
- Decomposes a tensor into a sum of rank-1 tensors (outer products of vectors)
- Useful for identifying latent factors or components in multi-way data (e.g., identifying user preferences, item characteristics, and time dynamics in a user-item-time tensor)
- Alternating least squares (ALS) algorithm for computing the CP decomposition
Tucker decomposition (also known as higher-order SVD or HOSVD)
- Decomposes a tensor into a core tensor multiplied by factor matrices along each mode
- Core tensor represents the interactions between the latent factors
- Factor matrices represent the loadings or weights of each factor along each mode
- Generalizes SVD to higher-order tensors and allows for different ranks along each mode
Tensor-train (TT) decomposition
- Represents a high-order tensor as a product of lower-order tensors (cores) connected in a chain-like structure
- Allows for efficient storage and computation of high-dimensional tensors with low TT-ranks
- Useful for compressing and approximating large-scale tensors in physics, chemistry, and machine learning

Implementing Tensors in Python

NumPy: fundamental package for scientific computing in Python, provides an
```
ndarray
```
object for representing tensors
- Creating tensors:
```
np.array([1, 2, 3])
```
  ,
```
np.zeros((3, 4))
```
  ,
```
np.ones((2, 3, 4))
```
- Tensor operations:
```
np.dot(A, B)
```
  ,
```
np.tensordot(A, B, axes=1)
```
  ,
```
np.transpose(A)
```
- Slicing and indexing:
```
A[0, :]
```
  ,
```
B[:, 1:3, :]
```
TensorFlow: popular deep learning framework, uses tensors as the primary data structure
- Creating tensors:
```
tf.constant([1, 2, 3])
```
  ,
```
tf.zeros((3, 4))
```
  ,
```
tf.ones((2, 3, 4))
```
- Tensor operations:
```
tf.matmul(A, B)
```
  ,
```
tf.tensordot(A, B, axes=1)
```
  ,
```
tf.transpose(A)
```
- Automatic differentiation and gradient computation using
```
tf.GradientTape
```
PyTorch: another widely used deep learning framework, similar to TensorFlow but with a more dynamic computation graph
- Creating tensors:
```
torch.tensor([1, 2, 3])
```
  ,
```
torch.zeros(3, 4)
```
  ,
```
torch.ones(2, 3, 4)
```
- Tensor operations:
```
torch.matmul(A, B)
```
  ,
```
torch.tensordot(A, B, dims=1)
```
  ,
```
torch.transpose(A, 0, 1)
```
- Automatic differentiation and gradient computation using
```
torch.autograd
```

Real-World Applications of Tensors

Computer vision: representing and processing images and videos as 3D or 4D tensors
- Convolutional neural networks (CNNs) for image classification, object detection, and segmentation
- Tensor-based techniques for image denoising, super-resolution, and style transfer
Natural language processing (NLP): representing text data as tensors
- Word embeddings (word2vec, GloVe) as dense vector representations of words
- Sequence-to-sequence models (RNNs, transformers) for machine translation, text summarization, and language generation
- Tensor-based methods for sentiment analysis, named entity recognition, and relation extraction
Recommender systems: representing user-item interactions as a 2D matrix or higher-order tensor
- Matrix factorization techniques (SVD, NMF) for collaborative filtering
- Tensor factorization methods (CP decomposition, Tucker decomposition) for incorporating additional context (time, location, social network)
- Deep learning-based recommender systems using tensor representations of user and item features
Physics and chemistry: representing quantum states, molecular structures, and physical fields as tensors
- Quantum mechanics: wave functions and density matrices as complex-valued tensors
- Molecular dynamics: representing atomic positions, velocities, and forces as tensors
- Computational fluid dynamics: discretizing and solving partial differential equations using tensor fields
Social network analysis: representing social interactions and relationships as tensors
- Adjacency tensor: capturing multi-relational data in social networks (e.g., friendship, communication, collaboration)
- Tensor-based methods for community detection, link prediction, and anomaly detection in social networks

Key Takeaways and Practice Problems

Tensors are multi-dimensional arrays that generalize vectors and matrices to higher dimensions
Tensors provide a powerful framework for representing and manipulating complex, high-dimensional data structures in linear algebra, physics, and computer science
Tensor notation and algebra extend matrix operations to higher-order tensors, enabling concise and expressive mathematical descriptions
Tensors are the fundamental data structures used in machine learning and deep learning for representing and processing multi-dimensional data
Tensor decomposition techniques (CP decomposition, Tucker decomposition) generalize matrix factorization methods to higher-order tensors for dimensionality reduction, feature extraction, and data compression
Python libraries like NumPy, TensorFlow, and PyTorch provide efficient implementations of tensors and tensor operations for scientific computing and deep learning
Tensors find numerous real-world applications in computer vision, natural language processing, recommender systems, physics, chemistry, and social network analysis

Practice Problems:

Given a 3D tensor $\mathbf{A}$ of shape (2, 3, 4) and a 2D tensor $\mathbf{B}$ of shape (4, 5), compute the tensor contraction $\mathbf{C} = \mathbf{A} \times \mathbf{B}$ along the last axis of $\mathbf{A}$ and the first axis of $\mathbf{B}$ . What is the shape of the resulting tensor $\mathbf{C}$ ?
Implement the CP decomposition of a 3D tensor $\mathbf{X}$ of shape (10, 20, 30) using the alternating least squares (ALS) algorithm in Python with NumPy. Assume a rank of 5 for the decomposition.
Given a 4D tensor $\mathbf{T}$ of shape (batch_size, height, width, channels) representing a batch of RGB images, apply a 2D convolutional layer with 16 filters of size (3, 3) and a stride of (1, 1) to the tensor. What is the shape of the output tensor?
Represent a set of user-item-time interactions as a 3D tensor $\mathbf{R}$ of shape (num_users, num_items, num_time_steps). Perform Tucker decomposition on $\mathbf{R}$ to obtain a core tensor and factor matrices. Interpret the results and discuss how they can be used for recommending items to users at specific time steps.
Compute the tensor product (outer product) of a vector $\mathbf{u}$ of length 3 and a vector $\mathbf{v}$ of length 4. What is the shape of the resulting matrix? How can this operation be used to construct higher-order tensors from lower-order ones?