Linear Algebra for Data Science

Linear Algebra for Data Science Unit 3 – Vector Spaces and Subspaces

Vector spaces and subspaces form the foundation of linear algebra, providing a framework for understanding multidimensional data structures. This unit explores the properties of vector spaces, including closure, associativity, and distributivity, while introducing key concepts like linear combinations, span, and linear independence. The study of subspaces, basis vectors, and dimensionality offers powerful tools for data analysis and machine learning. These concepts enable efficient data representation, feature selection, and dimensionality reduction, crucial for tackling high-dimensional datasets and uncovering underlying patterns in complex systems.

Key Concepts and Definitions

  • Vector space consists of a set of vectors and two operations (vector addition and scalar multiplication) that satisfy certain properties
  • Subspace is a subset of a vector space that is closed under vector addition and scalar multiplication
  • Linear combination expresses a vector as the sum of scalar multiples of other vectors (basis vectors)
  • Span refers to the set of all possible linear combinations of a given set of vectors
    • Geometrically represents the space "spanned" by the vectors
  • Linear independence means a set of vectors cannot be expressed as linear combinations of each other
    • Removing any vector from the set changes the span
  • Linear dependence occurs when one or more vectors can be expressed as linear combinations of the others
  • Basis is a linearly independent set of vectors that spans the entire vector space
  • Dimension equals the number of vectors in a basis for a vector space

Vector Space Fundamentals

  • Vector spaces are defined over a field (real numbers or complex numbers)
  • Elements of a vector space are called vectors and can be represented as arrays or lists of numbers
  • Vector addition combines two vectors element-wise (u+v=[u1+v1,u2+v2,,un+vn]\mathbf{u} + \mathbf{v} = [u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n])
  • Scalar multiplication scales each element of a vector by a constant (cv=[cv1,cv2,,cvn]c\mathbf{v} = [cv_1, cv_2, \ldots, cv_n])
  • Zero vector 0\mathbf{0} has all elements equal to zero and serves as the identity element for vector addition
  • Negative of a vector v-\mathbf{v} satisfies v+(v)=0\mathbf{v} + (-\mathbf{v}) = \mathbf{0}
  • Standard basis vectors e1,e2,,en\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n have a 1 in the ii-th position and 0s elsewhere

Properties of Vector Spaces

  • Closure under vector addition: u+v\mathbf{u} + \mathbf{v} is in the vector space for any u\mathbf{u} and v\mathbf{v} in the space
  • Closure under scalar multiplication: cvc\mathbf{v} is in the vector space for any scalar cc and vector v\mathbf{v} in the space
  • Associativity of vector addition: (u+v)+w=u+(v+w)(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})
  • Commutativity of vector addition: u+v=v+u\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}
  • Identity element for vector addition: v+0=v\mathbf{v} + \mathbf{0} = \mathbf{v} for any vector v\mathbf{v}
  • Inverse elements for vector addition: For any v\mathbf{v}, there exists v-\mathbf{v} such that v+(v)=0\mathbf{v} + (-\mathbf{v}) = \mathbf{0}
  • Distributivity of scalar multiplication over vector addition: c(u+v)=cu+cvc(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}
  • Distributivity of scalar multiplication over field addition: (c+d)v=cv+dv(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}

Subspaces: Definition and Examples

  • Subspace is a non-empty subset of a vector space that is closed under vector addition and scalar multiplication
    • Inherits the vector space properties from the parent space
  • Examples of subspaces include lines and planes passing through the origin in R2\mathbb{R}^2 and R3\mathbb{R}^3
  • Set of all polynomials of degree at most nn forms a subspace of the vector space of all polynomials
  • Null space (kernel) of a matrix AA is a subspace of the domain, defined as {x:Ax=0}\{\mathbf{x} : A\mathbf{x} = \mathbf{0}\}
  • Column space (range) of a matrix AA is a subspace of the codomain, spanned by the columns of AA
  • Row space of a matrix AA is a subspace of the codomain, spanned by the rows of AA
  • Eigenspaces corresponding to eigenvalues of a matrix are subspaces

Linear Combinations and Span

  • Linear combination of vectors v1,v2,,vk\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k is c1v1+c2v2++ckvkc_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \ldots + c_k\mathbf{v}_k for scalars c1,c2,,ckc_1, c_2, \ldots, c_k
  • Span of a set of vectors is the set of all possible linear combinations of those vectors
    • Denoted as span(v1,v2,,vk)\text{span}(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k)
  • Spanning set for a vector space is a set of vectors whose span equals the entire space
  • Trivial subspace {0}\{\mathbf{0}\} is spanned by the empty set
  • Span of standard basis vectors e1,e2,,en\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n is the entire space Rn\mathbb{R}^n
  • Span of a single non-zero vector is a line passing through the origin

Linear Independence and Dependence

  • Set of vectors is linearly independent if no vector can be expressed as a linear combination of the others
    • Equivalently, the only solution to c1v1+c2v2++ckvk=0c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \ldots + c_k\mathbf{v}_k = \mathbf{0} is c1=c2==ck=0c_1 = c_2 = \ldots = c_k = 0
  • Set of vectors is linearly dependent if at least one vector can be expressed as a linear combination of the others
  • Standard basis vectors e1,e2,,en\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n are linearly independent
  • Any set containing the zero vector is linearly dependent
  • In Rn\mathbb{R}^n, any set of more than nn vectors is linearly dependent (by the Steinitz exchange lemma)
  • Linearly independent sets are minimal spanning sets for their span

Basis and Dimension

  • Basis for a vector space is a linearly independent spanning set
    • Minimal set of vectors that spans the entire space
  • Dimension of a vector space equals the number of vectors in any basis
    • All bases for a vector space have the same number of vectors
  • Standard basis {e1,e2,,en}\{\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n\} is a common choice of basis for Rn\mathbb{R}^n
  • Dimension of the trivial subspace {0}\{\mathbf{0}\} is 0
  • Dimension of a line is 1, a plane is 2, and Rn\mathbb{R}^n is nn
  • Rank of a matrix equals the dimension of its column space (or row space)
    • Maximum number of linearly independent columns (or rows)

Applications in Data Science

  • Vector spaces provide a foundation for representing and manipulating data in machine learning and data analysis
  • High-dimensional data can be represented as vectors in Rn\mathbb{R}^n, where each feature corresponds to a dimension
  • Subspaces can model lower-dimensional structures or patterns in data (principal components, clusters, manifolds)
  • Linear independence is crucial for feature selection and dimensionality reduction techniques (PCA, ICA, SVD)
    • Identifies non-redundant features that capture the essential information in data
  • Basis vectors serve as building blocks for representing data efficiently and compactly
    • Change of basis techniques enable data transformations and visualizations
  • Dimension measures the intrinsic complexity or degrees of freedom in a dataset
    • Curse of dimensionality: challenges arise as the number of features grows large relative to the sample size
  • Null space and column space of a data matrix reveal relationships between features and samples
    • Used in least squares regression, matrix factorization, and recommender systems


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.