LinAlg Notes

Type cmd + P Fold all to fold bullet points.

  • Properties of Euclidean Norms
    • (Cauchy-Schwartz)
      • The direct way is always shorter or as long as the way with a stop in between
  • Linear Transformation; Matrix Multiplication
    • linear transformation
    • linear transformation exists matrix that acts like
    • Matrix mult. distributive and associative (why?)
    • Why ?
  • Transposes and Inverses
      • Makes sense: x^T A^T$. Matrix is simply a series of vectors.
      • When you perform , you first perform , then . Therefore, to undo, you first have to undo , then .
  • Rank, Column Space, Nullspace
      • in nullspace . is therefore orthogonal to every row of .
      • Since the rows span the rowspace, must be orth. to the rowspace.
    • (and therefore )
      • follows because
    • (and therefore )
      • Result of reduced row echelon form. There are always as many rows as pivot columns.
    • (and therefore )
      • , and by def.
      • All four are equal b/c of
    • with unique
      • You can add an arbitrary element to a unique solution in the row space and still have a solution.
      • Intuition: We can move a solution up and down the nullspace b/c adding a vector in the nullspace to a solution doesn’t change the validity (). Since the nullspace is the orth. complement of the rowspace, it must intercept the rowspace in exactly one position (, which gets moved along )
      • Direct result
      • If there exists a solution , then is also a solution, where
      • for some
      • underdetermined ( wide) full row rank full rank has a unique solution (so unique b/c matrix mult. is well defined)
  • CR-Decomposition
    • Any can be decomposed into
      • : Lin ind. columns of
      • : Each col contains the coeffecients of the lin. comb. of the cols of that make up the cols of (How much of each col goes into each col)
    • by construction b/c are the lin. ind. cols of
      • : Each row of is a lin. comb. of the rows of , with each row of offering the coefficients
      • : has full col. rank left inverse (see pseudoinverse). Therefore: Each row of is a lin. comb. of the rows of , with each row of offering the coefficients
    • composition can be done using Gauss-Jordan elimination. = the cols of where has pivots, and = the non-zero rows of
      • : cols follows def (since cols with pivots are lin. ind.)
      • : We can write the Gauss-Jordan steps into an matrix : w/ as many 0 rows as , which yields 1s on diag. & 0 rows below
  • Certificates
      • has a solution b/c
      • Therefore: has no solution since any nonzero that satisfies can be scaled by without leaving , i.e. while maintaining the condition (vectorspace axioms).
  • Projections onto Subspaces (https://youtu.be/Y_Ac6KiQ1t0)
    • may have no solution. Solve instead, where is projection of onto col. space. projection = closest point in col. space to .
    • is error of , i.e. the vector from to its projection. is orth. to b/c that’s the directest and therefore shortest way to reach the col. space. (normal equation) projection matrix
    • b/c projecting something in the col. space (the first projection) changes nothing
    • b/c
    • Intuitively: Projection matrix annihilates orth. vector component
      • for some () (dot prod. between & every col. of 0; )
  • Least Squares Regression
    • Goal: calculate coefficients of function so that we minimize the squared error, i.e. the sum of the squared distances from our function output to actual datapoints
    • Example: we have datapoints and want a quadratic function. If we plug them in:
    • Minimizing now becomes reducing sum of squared differences from every coordinate of (all our predictions) to the corres. coordinate of (all our expected results): (norm is always positive)
    • from projections. We therefore know how to minimize norm of : find that’s orth. to the col. space. Hence, .
  • Orthogonal Matrices and Gram-Schmidt
    • Orthonormal columns: b/c rows of are always orthogonal to cols of except if row idx = col idx, in which case we have
    • If square: called orthogonal matrix. has a right inverse (left inverse matrix injective: a vector first gets multiplied with , then with . If reduced multiple values to the same value, could not reverse matrix bijective if square) since ( is left inverse, right)
    • orthogonal norm preserving: and
    • Gram-Schmidt: ; ;
      • We project onto the already created subspace and then norm the error
      • The sum above is the same as subtracting , where is proj. matrix: since by definition. b/c is not yet square.
    • For orthonormal matrices: is the least-squares solution to
    • A matrix can be decomposed into
      • has orthonormal columns (not necessarily orthogonal b/c it may not be square).
      • denotes how much each column of contributes to a column of . Since every column of is a linear combination of all the already created orthonormal columns + a new one in Gram-Schmidt, we get an upper triangular matrix.
        • If has linearly independent columns, then has values along the diagonal: each col. adds a new orthonormal vector to . It is therefore invertible. This is part of the def. in this class.
        • Otherwise, we add a column w/o pivot (we don’t add a new col to b/c , so the error is the 0-vector). We get an upper triangular matrix with pivots set back.
      • Since , is projection matrix.
      • () ( and has full col. rank has left inverse)
  • Determinant Formula (https://youtu.be/Sv7VseMsOQc)
    • We can easily derive 3 axioms from volume:
      • : a hypercube with edge length 1 has volume 1
      • Linearity is a direct result of volume:
        • A volume is spanned by multiple vectors. If you scale any one of the vectors by , your volume increases by .
        • A vector always determines two parallel edges in the case of a parallelogram and four parallel edges in the case of a 3d parallelepiped. If you split a vector into two, that equally bulges the shape in on one side and out at the other. Therefore, the volume stays the same.
      • If a spanning vector is linearly dependent on the others, then the parallelepiped collapses into one dimension less volume of 0.
    • From these 3 axioms, it follows that the sign must switch if we swap two columns: (linear dependence)
    • After using our linearity and linear dependence axioms multiple times, we are left with only ‘s permutations. By using sign flipping col. switching, we get to . Hence:
      • : set of all possible permutation functions
    • Trick for sign of permutation: Draw two sets of dots from 1 to . Connect input and output of permutation. is sign.
  • Properties of Determinants
      • , where is the set of all permutation matrices.
      • Each permutation matrix is like a hole puncher.
      • If you turn both the page and the holes you punch, you still get the same holes.
      • Scale volume by A first, then by B
      • Follows from linearity: we can extract entries separately
      • , then divide by .
  • Cofactors, Cramer’s rule and beyond
    • Cofactor: det of matrix = sum of product of row or col entries w/ cofactors . is the minor obtained by deleting row and column from .
      • Performs linearity on row or col: Split row or col into vectors with one entry and otherwise only zeros. Delete row and column b/c of lin dep.
      • Swapping that vector up to , which is part of the diagonal, swaps the sign times.
      • Example matrix cofactor expansion along the first col:
    • , where is the matrix with the co-factors of A as entries.
      • Case 1:
        • Then
        • This is exactly the cofactor expansion of along row .
        • So .
      • Case 2: Then
        • This is like taking the determinant of a matrix where two rows are equal (row and row after replacing row with row )
        • Determinant = 0 in this case.
    • Efficient way to compute det: subtract and add rows until upper triangle form (similar to Gauss). If swapping rows: multiply final det by -1. Scaling not allowed.
      • Intuition: det measures volume of parallelepiped. Adding a multiple of one row to another just slides one face along another, which doesn’t change the volume (, so rows can be seen as spanning vectors as well).
    • Cramer’s rule: , where is -th component of sol. vector for and has the -th col replaced by
      • ( lin. comb. of ‘s cols) (after applying linearity, all other dets = 0 b/c of linearly dep. cols)
  • Eigenvalues and Eigenvectors
    • (because must have nontrivial nullspace)
      • The pair does not count. Only if is singular is an eigenvalue b/c then satisfies .
    • There are always as many (not necessarily unique) eigenvalues as dimensions b/c results in a polynomial with terms, which has (possibly complex) roots (fundamental theorem of algebra)
    • Characteristic polynomial: b/c we want monic polynomial (leading coefficient )
      • Only diagonal contributes to leading coefficient (see proof for ). If we have an uneven , then we have an uneven number of in the diagonal as leading coefficient
      • Monic polynomial can always be factored into a series of positive factors b/c the after combining all must be pos.
  • Properties of Eigenvalues and Eigenvectors
    • If and eigenpair of : and eigenpair of
      • If you repeat the same matrix operation times, evecs are scaled by times.
    • If and eigenpair of : and eigenpair of
      • If multiplying by scales by , then reversing should scale by
    • Eigenvalues distinct corresponding evecs lin. ind.
      • Assume are evecs of w/ distinct eigenvalues , and suppose is a linear combination of lin. ind. and , i.e. for and ( lin. ind.) non-zero, so . Contradiction. Analogous for .
      • Proof above generalizes for any num of vectors (can be written using sum notation)
      • Result: if , then we can create a basis of evecs
    • eigenpair eigenpair
      • if
    • Eigenvalues for = eigenvalues for
      • Set . Then you get the claimed result.
      • Characteristic polynomial:
        • Coefficient of is sum of the eigenvalues since we pick every eigenvalue once and multiply it with the of all the other factors
      • We can calculate the characteristic polynomial by using the Leibnitz formula on .
        • Results in a sum of products of matrix entries
        • Since we add these products, each product can only contribute to coefficients of powers of variables that it contains
        • Therefore: we are looking for product with a . Since only the diagonal contains , we need to pick elements from the diagonal to get the in. Picking elements forces the last one only the diagonal is relevant.
        • The diagonal: , where is ‘s -th diagonal entry
        • Coefficient of is sum of diagonal entries () b/c we pick and then must pick one of ‘s diagonal entries.
      • If diagonalizable:
      • and
    • and share non-zero eigenvalues
      • Since , . Symmetric for .
      • evec of :
      • evec of :
  • Diagonalization and Powers of A
    • the matrix with the evecs as cols. Then , where has the eigenvalues corresponding to the evecs in on the diagonal and if invertible
      • would be wrong b/c you would then multiply each component of every evec with a different eigenvalue
    • To calculate power of : b/c the and the cancel out
      • Can be used for closed formulas for recursions
  • Spectral Theorem
    • Any symmetric matrix has real eigenvalues and an orthonormal basis of consisting of its evecs.
      • Induction proof in script
    • Symmetric matrices can be diagonalized as since can be made orthogonal
  • Positive (Semi-)Definite Matrices
    • All eigenvalues of a symmetric matrix (positive definite, PD) or (positive semi-definite, PSD)
    • Rayleigh-Quotient
      • Projects onto , then norms by
      • Measures how much scales in ‘s direction
      • (evecs of sym. matrix can form orth. matrix)
      • Let , which is a vector. Then ( is orth. and therefore norm preserving)
      • This is a positively weighted average of the eigenvalues , so
    • Matrix PD
      • Implies matrix invertible: if for , then , meaning were an eigenvalue, contradicting PD.
    • Matrix PSD
    • and are always positive semi-definite
      • (the denominator of is always )
  • Gram Matrices and Cholesky Decomposition
    • Gram matrix for some matrix
    • PSD matrix Gram matrix with , where upper triangular (Cholesky Decomposition)
      • b/c PSD symmetric b/c eigenvalues non-negative
  • Singular Value Decomposition
    • We’re looking for orthogonal matrices and s.t. , where is diagonal, with only positive elements along the diagonal (the “singular values”). b/c for orthogonal matrices
      • b/c left most matrix in matrix mult. determines num rows (it telescopes)
      • to match dim
      • b/c right most matrix determines num cols
    • To find and :
      • Since and are sym., they each have a full set of orthogonal evecs (spectral theorem) Above is their diagonalization, where and corresponds to
      • Works b/c both and are PSD and have the same non-zero eigenvalues. is filled from the top left with the square roots of the shared eigenvalues of and . By convention: descending order. Rest of matrix is filled with zeros.
        • : is larger than , so it has additional zero eigenvalues. Therefore, is larger than . is wide.
        • : Reverse. is tall.
    • Intuition: tells you how much the input it rotated, how much it is stretched, and how much the output is rotated. Since and are orth., they don’t stretch at all.
  • Change of Basis
    • A vector is a series of numbers that each scale a basis vector. Therefore, the same vector looks differently in different bases.
    • and two bases
    • matrix w/ ‘s basis vectors as cols as expressed in . Multiplying a vector in by therefore maps it to .
    • matrix of lin. transformation in . To apply it to a vector in , we must first convert to (), then apply () and then convert back to ()
      • called similar
    • Application: Image Compression
      • Represent entire image or parts of image as vector or matrix, where each coordinate or row has a grayscale/color value.
      • Change basis so that basis vectors capture larger parts of the image, for example all pixels black, half black/half white, and so on. We then discard the ones with low coefficients since they don’t contribute much.
      • Common choices:
        • Fourier matrix: You have a wave going across the entire vector, with different frequencies for each basis vector.
        • Wavelet matrix: You have one wave cycle somewhere in the column vector. The crests have a couple different widths and locations.
  • Left and Right Inverses; Pseudoinverse
    • Goal: Always get the most relevant answer to
    • Tall matrix, full col. rank ()
      • Does not collapse information in : Converts vectors into higher dimension vectorspace; full column rank the subspace that the vectors inhabit has the same dim as their original vectorspace, so the transformation is injective. Therefore, should have a left inverse.
      • is an square matrix. Since , is full rank and therefore invertible.
        • is left inverse of
      • Same as projection:
        • Intuition: The inverse must work for all points, not just ones on hyperspace formed by . The inverse must therefore project along the nullspace onto the hyperspace.
    • Wide matrix, full row rank ()
      • Does not collapse information in : Converts vectors into higher dimension vectorspace; full row rank the subspace that the vectors inhabit has the same dim as their original vectorspace, so the transformation is injective. Therefore, should have a right inverse.
      • is an square matrix. Since , is full rank and therefore invertible.
        • is right inverse of
      • is min norm solution to
        • , where and , b/c they’re orth. complements
        • is min-norm solution b/c ()
        • for some b/c ()
        • is the same for a wide matrix as transposing it into a tall matrix, taking the pseudoinverse, and then transposing again.
    • For any matrix
      • Use CR-decomposition:
      • is tall matrix w/ full col. rank. is wide matrix w/ full row rank
      • ( and invertible both bijections)
      • is least squares sol:
        • least squares sol satisfies normal equation
        • () ( right inverse) (def. ) (def. transpose)
      • is min norm solution:
        • We know from wide matrices: min norm
        • for some ( and have same row space)
  • Properties of the Pseudoinverse
    • and
      • ( = right inverse of and = left inverse of )
    • .