Linear Algebra: Vectors, Matrices, and Transformations

Linear algebra sits at the foundation of modern computing, physics, economics, and machine learning — a discipline built around the elegant manipulation of vectors, matrices, and the transformations they encode. This page covers the core mechanics of the subject: how vectors and matrices behave, why certain structural rules produce predictable results, where the classification lines between matrix types fall, and what the field's most persistent misconceptions actually get wrong.


Definition and scope

A vector is a quantity with both magnitude and direction — a column of numbers that represents a point in space, a data record, or a physical force, depending on context. A matrix is an ordered rectangular array of numbers that can encode a system of equations, describe a network, or represent a transformation applied to a vector space. Linear algebra is the branch of mathematics studying these objects and the operations — addition, scaling, multiplication, decomposition — that preserve linearity.

The scope of linear algebra, as defined by the MIT OpenCourseWare 18.06 course taught by Gilbert Strang, spans finite-dimensional vector spaces, systems of linear equations, eigenvalue problems, and singular value decomposition. Those four pillars underpin applications from Google's PageRank algorithm to the principal component analysis routinely run on genomic datasets.

Linear algebra sits within the broader map of mathematics covered at Mathematics Authority, and its techniques connect directly to differential equations, calculus, and mathematical modeling.


Core mechanics or structure

Vectors and operations. An n-dimensional vector is an ordered list of n real (or complex) numbers. Vector addition follows the parallelogram rule: corresponding components are summed. Scalar multiplication stretches or flips a vector without changing its direction line. These two operations satisfy 8 axioms — commutativity, associativity, distributivity over scalars, the existence of a zero vector, and additive inverses — that define a formal vector space, per the treatment in Axler's Linear Algebra Done Right (Springer, 4th edition, 2024).

Matrix multiplication. An m×n matrix multiplied by an n×p matrix produces an m×p matrix. Each entry of the result is the dot product of a row from the left matrix with a column from the right. This is not commutative — AB ≠ BA in general — a fact that trips up students who import arithmetic intuitions into this territory.

Linear transformations. Every matrix represents a linear transformation: a function T such that T(u + v) = T(u) + T(v) and T(cu) = cT(u). Geometrically, matrix multiplication rotates, scales, shears, reflects, or projects vectors. The identity matrix I leaves every vector unchanged. A matrix with determinant zero collapses the space into a lower dimension — the transformation literally squashes volume to nothing.

Determinants and invertibility. The determinant of a square matrix measures signed volume scaling. A 2×2 matrix with entries [[a, b], [c, d]] has determinant ad − bc. When that value is zero, the matrix is singular — no inverse exists, and the associated linear system either has no solution or infinitely many.

Eigenvalues and eigenvectors. For a square matrix A, an eigenvector v satisfies Av = λv, where λ is the corresponding eigenvalue. The vector's direction is preserved by the transformation; only its magnitude scales by λ. Eigendecomposition rewrites a diagonalizable matrix as PDP⁻¹, where D is diagonal, enabling efficient repeated matrix powers.


Causal relationships or drivers

The properties of a linear system cascade from one structural fact: whether the system's coefficient matrix has full rank. Rank — the dimension of the column space — determines solvability. An n×n system Ax = b has a unique solution precisely when A has rank n, which is precisely when det(A) ≠ 0, which is precisely when A is invertible. These three conditions are logically equivalent, not merely correlated.

Ill-conditioning causes a related but distinct failure mode. A matrix with a very high condition number (the ratio of its largest to smallest singular values) is technically invertible but numerically unreliable — small perturbations in b produce large swings in the computed solution x. The NIST Digital Library of Mathematical Functions (DLMF) documents the numerical analysis context for these condition number relationships.

Orthogonality drives stability. When a matrix's columns are mutually orthogonal unit vectors, it is orthogonal (Q^T Q = I), and its condition number is exactly 1 — the best possible. Gram-Schmidt orthogonalization converts an arbitrary basis into an orthogonal one, explaining why QR decomposition is a preferred computational strategy over naive Gaussian elimination in many applied settings.


Classification boundaries

Matrix types partition by their structural symmetries, and the distinctions matter because each class admits specific decompositions and guarantees:

Matrix type Defining property Key guarantee
Square Same row and column count May have an inverse
Symmetric A = A^T Real eigenvalues always
Orthogonal A^T = A⁻¹ Preserves length and angle
Diagonal Non-zero entries only on main diagonal Trivially invertible if diagonal ≠ 0
Upper/lower triangular Zeros below/above diagonal Back-substitution solves in O(n²) time
Positive definite x^T Ax > 0 for all x ≠ 0 All eigenvalues strictly positive
Idempotent A² = A Represents a projection
Singular det(A) = 0 No inverse; system underdetermined or inconsistent

Positive definite matrices occupy a particularly important position in machine learning — every covariance matrix is positive semi-definite by construction, ensuring that variance is always non-negative.


Tradeoffs and tensions

Exact vs. numerical methods. Gaussian elimination produces an exact solution (over exact arithmetic) but accumulates floating-point errors in practice. Iterative methods — Jacobi, Gauss-Seidel, conjugate gradient — trade exactness for scalability, converging on solutions for systems with millions of variables where direct methods become computationally prohibitive.

Dense vs. sparse representation. Storing every entry of a 10,000 × 10,000 matrix requires 10⁸ entries. Many real matrices — from finite element simulations, social networks, or adjacency structures — are sparse, with fewer than 1% non-zero entries. Sparse storage formats (CSR, CSC) preserve memory at the cost of code complexity and irregular memory access patterns.

Abstraction vs. computation. The abstract vector space formalism generalizes beautifully — any set satisfying the 8 axioms qualifies, including polynomial spaces and function spaces — but that generality can obscure what computations actually cost. Students who master theoretical elegance sometimes struggle when the same concepts appear in the applied mathematics settings of engineering or finance, where a 512×512 matrix multiply carries a concrete wall-clock time.

Decomposition choice. LU, QR, SVD, Cholesky, and eigendecomposition all factor matrices, but each suits different problems. SVD is the most general and most expensive; Cholesky applies only to symmetric positive definite matrices but runs roughly twice as fast as LU. Choosing the wrong decomposition for the structure at hand wastes cycles or introduces numerical instability.


Common misconceptions

"Matrix multiplication is commutative." It is not, and this is not a technicality. For 2×2 matrices A = [[1,2],[0,1]] and B = [[1,0],[1,1]], AB = [[3,2],[1,1]] while BA = [[1,2],[1,3]]. The order encodes which transformation is applied first.

"A matrix with a small determinant is nearly singular." The determinant scales with matrix size in ways that make this intuition unreliable. A 100×100 identity matrix has determinant 1 — well-conditioned — but multiplying each row by 0.9 produces det = 0.9¹⁰⁰ ≈ 2.66 × 10⁻⁵, which sounds tiny yet the matrix remains perfectly well-conditioned. The condition number, not the determinant magnitude, measures numerical stability.

"The null space of A is just the zero vector." Only when A has full column rank. For any matrix with rank deficiency, the null space contains infinitely many vectors — every solution to Ax = 0 other than x = 0. This matters practically: it means a linear system Ax = b has either 0 or infinitely many solutions if the null space is non-trivial.

"Eigenvalues are only a theoretical tool." Google's original PageRank algorithm, as described in Brin and Page's 1998 Stanford technical report The Anatomy of a Large-Scale Hypertextual Web Search Engine, computes the dominant eigenvector of a web link matrix to rank pages. Eigenvalue computation is among the most industrially consequential operations in modern computing.


Checklist or steps

The following sequence describes the standard process for solving a linear system Ax = b by row reduction:

  1. Write the augmented matrix [A | b] with the coefficient matrix and right-hand-side column joined.
  2. Apply elementary row operations — row scaling, row swapping, row addition — to produce row echelon form (REF).
  3. Check for inconsistency: a row of zeros on the A side with a non-zero entry on the b side signals no solution.
  4. Continue reduction to reduced row echelon form (RREF) by eliminating entries above each pivot.
  5. Identify pivot columns (basic variables) and free columns (free variables).
  6. Express each basic variable in terms of free variables to write the general solution.
  7. If no free variables exist and no inconsistency appeared, the unique solution is read directly from RREF.
  8. Verify by substituting back into the original equation Ax = b.

Reference table or matrix

Concept Symbol / notation Core result Primary application
Vector space V over field F Closed under addition and scalar multiplication Foundations of all linear algebra
Dot product u · v = Σuᵢvᵢ Zero iff vectors are orthogonal Projection, angle computation
Matrix rank rank(A) Dimension of column space Solvability of Ax = b
Determinant det(A) or |A| Zero iff singular Invertibility test, volume scaling
Eigenvalue equation Av = λv Decomposition of transformation PCA, PageRank, stability analysis
SVD A = UΣV^T Always exists for any m×n matrix Data compression, least squares
Condition number κ(A) = σ_max/σ_min Higher = more numerically unstable Numerical methods assessment
Null space ker(A) = {x : Ax = 0} Dimension = n − rank(A) by rank-nullity theorem Solution set structure
Span span(S) Set of all linear combinations of S Basis construction
Orthogonal projection proj_u v = (v·u/u·u)u Nearest point in subspace Least squares, signal processing

The rank-nullity theorem — stating that rank(A) + nullity(A) = n for an m×n matrix — is one of linear algebra's most structurally revealing identities, connecting the two fundamental subspaces associated with any matrix. For a deeper treatment of how these concepts extend into applied mathematics or appear within mathematics in engineering, both areas show how the abstract machinery described here becomes load-bearing infrastructure.

The subject also forms a gateway into pure vs. applied mathematics debates, since linear algebra sits unusually comfortably on both sides of that line — rigorous enough for a proof-based abstract algebra course, computational enough to run at the core of every major machine learning framework.


References