LinAlg Notes
Type cmd + P Fold all to fold bullet points.
- Properties of Euclidean Norms
- (Cauchy-Schwartz)
-
- The direct way is always shorter or as long as the way with a stop in between
- (Cauchy-Schwartz)
- Linear Transformation; Matrix Multiplication
- linear transformation
- linear transformation exists matrix that acts like
- Matrix mult. distributive and associative (why?)
- Why ?
- Transposes and Inverses
-
- Makes sense: x^T A^T$. Matrix is simply a series of vectors.
-
- When you perform , you first perform , then . Therefore, to undo, you first have to undo , then .
-
- Rank, Column Space, Nullspace
-
- in nullspace . is therefore orthogonal to every row of .
- Since the rows span the rowspace, must be orth. to the rowspace.
- (and therefore )
- follows because
- (and therefore )
-
- Result of reduced row echelon form. There are always as many rows as pivot columns.
- (and therefore )
- , and by def.
- All four are equal b/c of
- with unique
- You can add an arbitrary element to a unique solution in the row space and still have a solution.
- Intuition: We can move a solution up and down the nullspace b/c adding a vector in the nullspace to a solution doesn’t change the validity (). Since the nullspace is the orth. complement of the rowspace, it must intercept the rowspace in exactly one position (, which gets moved along )
- Direct result
- If there exists a solution , then is also a solution, where
- for some
- underdetermined ( wide) full row rank full rank has a unique solution (so unique b/c matrix mult. is well defined)
-
- CR-Decomposition
- Any can be decomposed into
- : Lin ind. columns of
- : Each col contains the coeffecients of the lin. comb. of the cols of that make up the cols of (How much of each col goes into each col)
- by construction b/c are the lin. ind. cols of
-
- : Each row of is a lin. comb. of the rows of , with each row of offering the coefficients
- : has full col. rank left inverse (see pseudoinverse). Therefore: Each row of is a lin. comb. of the rows of , with each row of offering the coefficients
- composition can be done using Gauss-Jordan elimination. = the cols of where has pivots, and = the non-zero rows of
- : cols follows def (since cols with pivots are lin. ind.)
- : We can write the Gauss-Jordan steps into an matrix : w/ as many 0 rows as , which yields 1s on diag. & 0 rows below
- Any can be decomposed into
- Certificates
-
- has a solution b/c
- Therefore: has no solution since any nonzero that satisfies can be scaled by without leaving , i.e. while maintaining the condition (vectorspace axioms).
-
- Projections onto Subspaces (https://youtu.be/Y_Ac6KiQ1t0)
- may have no solution. Solve instead, where is projection of onto col. space. projection = closest point in col. space to .
- is error of , i.e. the vector from to its projection. is orth. to b/c that’s the directest and therefore shortest way to reach the col. space. (normal equation) projection matrix
- b/c projecting something in the col. space (the first projection) changes nothing
- b/c
- Intuitively: Projection matrix annihilates orth. vector component
- for some () (dot prod. between & every col. of 0; )
- Least Squares Regression
- Goal: calculate coefficients of function so that we minimize the squared error, i.e. the sum of the squared distances from our function output to actual datapoints
- Example: we have datapoints and want a quadratic function. If we plug them in:
- Minimizing now becomes reducing sum of squared differences from every coordinate of (all our predictions) to the corres. coordinate of (all our expected results): (norm is always positive)
- from projections. We therefore know how to minimize norm of : find that’s orth. to the col. space. Hence, .
- Orthogonal Matrices and Gram-Schmidt
- Orthonormal columns: b/c rows of are always orthogonal to cols of except if row idx = col idx, in which case we have
- If square: called orthogonal matrix. has a right inverse (left inverse matrix injective: a vector first gets multiplied with , then with . If reduced multiple values to the same value, could not reverse matrix bijective if square) since ( is left inverse, right)
- orthogonal norm preserving: and
- Gram-Schmidt: ; ;
- We project onto the already created subspace and then norm the error
- The sum above is the same as subtracting , where is proj. matrix: since by definition. b/c is not yet square.
- For orthonormal matrices: is the least-squares solution to
- A matrix can be decomposed into
- has orthonormal columns (not necessarily orthogonal b/c it may not be square).
- denotes how much each column of contributes to a column of . Since every column of is a linear combination of all the already created orthonormal columns + a new one in Gram-Schmidt, we get an upper triangular matrix.
- If has linearly independent columns, then has values along the diagonal: each col. adds a new orthonormal vector to . It is therefore invertible. This is part of the def. in this class.
- Otherwise, we add a column w/o pivot (we don’t add a new col to b/c , so the error is the 0-vector). We get an upper triangular matrix with pivots set back.
- Since , is projection matrix.
- () ( and has full col. rank has left inverse)
- Determinant Formula (https://youtu.be/Sv7VseMsOQc)
- We can easily derive 3 axioms from volume:
- : a hypercube with edge length 1 has volume 1
- Linearity is a direct result of volume:
- A volume is spanned by multiple vectors. If you scale any one of the vectors by , your volume increases by .
- A vector always determines two parallel edges in the case of a parallelogram and four parallel edges in the case of a 3d parallelepiped. If you split a vector into two, that equally bulges the shape in on one side and out at the other. Therefore, the volume stays the same.
- If a spanning vector is linearly dependent on the others, then the parallelepiped collapses into one dimension less volume of 0.
- From these 3 axioms, it follows that the sign must switch if we swap two columns: (linear dependence)
- After using our linearity and linear dependence axioms multiple times, we are left with only ‘s permutations. By using sign flipping col. switching, we get to . Hence:
- : set of all possible permutation functions
- Trick for sign of permutation: Draw two sets of dots from 1 to . Connect input and output of permutation. is sign.
- We can easily derive 3 axioms from volume:
- Properties of Determinants
-
- , where is the set of all permutation matrices.
- Each permutation matrix is like a hole puncher.
- If you turn both the page and the holes you punch, you still get the same holes.
-
- Scale volume by A first, then by B
- Follows from linearity: we can extract entries separately
-
- , then divide by .
-
- Cofactors, Cramer’s rule and beyond
- Cofactor: det of matrix = sum of product of row or col entries w/ cofactors . is the minor obtained by deleting row and column from .
- Performs linearity on row or col: Split row or col into vectors with one entry and otherwise only zeros. Delete row and column b/c of lin dep.
- Swapping that vector up to , which is part of the diagonal, swaps the sign times.
- Example matrix cofactor expansion along the first col:
- , where is the matrix with the co-factors of A as entries.
- Case 1:
- Then
- This is exactly the cofactor expansion of along row .
- So .
- Case 2: Then
- This is like taking the determinant of a matrix where two rows are equal (row and row after replacing row with row )
- Determinant = 0 in this case.
- Efficient way to compute det: subtract and add rows until upper triangle form (similar to Gauss). If swapping rows: multiply final det by -1. Scaling not allowed.
- Intuition: det measures volume of parallelepiped. Adding a multiple of one row to another just slides one face along another, which doesn’t change the volume (, so rows can be seen as spanning vectors as well).
- Cramer’s rule: , where is -th component of sol. vector for and has the -th col replaced by
- ( lin. comb. of ‘s cols) (after applying linearity, all other dets = 0 b/c of linearly dep. cols)
- Cofactor: det of matrix = sum of product of row or col entries w/ cofactors . is the minor obtained by deleting row and column from .
- Eigenvalues and Eigenvectors
-
(because must have nontrivial nullspace)
- The pair does not count. Only if is singular is an eigenvalue b/c then satisfies .
- There are always as many (not necessarily unique) eigenvalues as dimensions b/c results in a polynomial with terms, which has (possibly complex) roots (fundamental theorem of algebra)
- Characteristic polynomial: b/c we want monic polynomial (leading coefficient )
- Only diagonal contributes to leading coefficient (see proof for ). If we have an uneven , then we have an uneven number of in the diagonal as leading coefficient
- Monic polynomial can always be factored into a series of positive factors b/c the after combining all must be pos.
-
(because must have nontrivial nullspace)
- Properties of Eigenvalues and Eigenvectors
- If and eigenpair of : and eigenpair of
- If you repeat the same matrix operation times, evecs are scaled by times.
- If and eigenpair of : and eigenpair of
- If multiplying by scales by , then reversing should scale by
- Eigenvalues distinct corresponding evecs lin. ind.
- Assume are evecs of w/ distinct eigenvalues , and suppose is a linear combination of lin. ind. and , i.e. for and ( lin. ind.) non-zero, so . Contradiction. Analogous for .
- Proof above generalizes for any num of vectors (can be written using sum notation)
- Result: if , then we can create a basis of evecs
- eigenpair eigenpair
- if
- Eigenvalues for = eigenvalues for
-
- Set . Then you get the claimed result.
-
- Characteristic polynomial:
- Coefficient of is sum of the eigenvalues since we pick every eigenvalue once and multiply it with the of all the other factors
- We can calculate the characteristic polynomial by using the Leibnitz formula on .
- Results in a sum of products of matrix entries
- Since we add these products, each product can only contribute to coefficients of powers of variables that it contains
- Therefore: we are looking for product with a . Since only the diagonal contains , we need to pick elements from the diagonal to get the in. Picking elements forces the last one only the diagonal is relevant.
- The diagonal: , where is ‘s -th diagonal entry
- Coefficient of is sum of diagonal entries () b/c we pick and then must pick one of ‘s diagonal entries.
- If diagonalizable:
- and
- Characteristic polynomial:
- and share non-zero eigenvalues
- Since , . Symmetric for .
- evec of :
- evec of :
- If and eigenpair of : and eigenpair of
- Diagonalization and Powers of A
- the matrix with the evecs as cols. Then , where has the eigenvalues corresponding to the evecs in on the diagonal
and if invertible
- would be wrong b/c you would then multiply each component of every evec with a different eigenvalue
- To calculate power of : b/c the and the cancel out
- Can be used for closed formulas for recursions
- the matrix with the evecs as cols. Then , where has the eigenvalues corresponding to the evecs in on the diagonal
and if invertible
- Spectral Theorem
- Any symmetric matrix has real eigenvalues and an orthonormal basis of consisting of its evecs.
- Induction proof in script
- Symmetric matrices can be diagonalized as since can be made orthogonal
- Any symmetric matrix has real eigenvalues and an orthonormal basis of consisting of its evecs.
- Positive (Semi-)Definite Matrices
- All eigenvalues of a symmetric matrix (positive definite, PD) or (positive semi-definite, PSD)
- Rayleigh-Quotient
- Projects onto , then norms by
- Measures how much scales in ‘s direction
-
- (evecs of sym. matrix can form orth. matrix)
- Let , which is a vector. Then ( is orth. and therefore norm preserving)
- This is a positively weighted average of the eigenvalues , so
- Matrix PD
- Implies matrix invertible: if for , then , meaning were an eigenvalue, contradicting PD.
- Matrix PSD
- and are always positive semi-definite
- (the denominator of is always )
- Gram Matrices and Cholesky Decomposition
- Gram matrix for some matrix
- PSD matrix Gram matrix with , where upper triangular (Cholesky Decomposition)
- b/c PSD symmetric b/c eigenvalues non-negative
- Singular Value Decomposition
- We’re looking for orthogonal matrices and s.t. , where is diagonal, with only positive elements along the diagonal (the “singular values”).
b/c for orthogonal matrices
- b/c left most matrix in matrix mult. determines num rows (it telescopes)
- to match dim
- b/c right most matrix determines num cols
- To find and :
- Since and are sym., they each have a full set of orthogonal evecs (spectral theorem) Above is their diagonalization, where and corresponds to
- Works b/c both and are PSD and have the same non-zero eigenvalues. is filled from the top left with the square roots of the shared eigenvalues of and . By convention: descending order. Rest of matrix is filled with zeros.
- : is larger than , so it has additional zero eigenvalues. Therefore, is larger than . is wide.
- : Reverse. is tall.
- Intuition: tells you how much the input it rotated, how much it is stretched, and how much the output is rotated. Since and are orth., they don’t stretch at all.
- We’re looking for orthogonal matrices and s.t. , where is diagonal, with only positive elements along the diagonal (the “singular values”).
b/c for orthogonal matrices
- Change of Basis
- A vector is a series of numbers that each scale a basis vector. Therefore, the same vector looks differently in different bases.
- and two bases
- matrix w/ ‘s basis vectors as cols as expressed in . Multiplying a vector in by therefore maps it to .
- matrix of lin. transformation in . To apply it to a vector in , we must first convert to (), then apply () and then convert back to ()
- called similar
- Application: Image Compression
- Represent entire image or parts of image as vector or matrix, where each coordinate or row has a grayscale/color value.
- Change basis so that basis vectors capture larger parts of the image, for example all pixels black, half black/half white, and so on. We then discard the ones with low coefficients since they don’t contribute much.
- Common choices:
- Fourier matrix: You have a wave going across the entire vector, with different frequencies for each basis vector.
- Wavelet matrix: You have one wave cycle somewhere in the column vector. The crests have a couple different widths and locations.
- Left and Right Inverses; Pseudoinverse
- Goal: Always get the most relevant answer to
- Tall matrix, full col. rank ()
- Does not collapse information in : Converts vectors into higher dimension vectorspace; full column rank the subspace that the vectors inhabit has the same dim as their original vectorspace, so the transformation is injective. Therefore, should have a left inverse.
- is an square matrix. Since , is full rank and therefore invertible.
- is left inverse of
- Same as projection:
- Intuition: The inverse must work for all points, not just ones on hyperspace formed by . The inverse must therefore project along the nullspace onto the hyperspace.
- Wide matrix, full row rank ()
- Does not collapse information in : Converts vectors into higher dimension vectorspace; full row rank the subspace that the vectors inhabit has the same dim as their original vectorspace, so the transformation is injective. Therefore, should have a right inverse.
- is an square matrix. Since , is full rank and therefore invertible.
- is right inverse of
- is min norm solution to
- , where and , b/c they’re orth. complements
- is min-norm solution b/c ()
- for some b/c ()
-
- is the same for a wide matrix as transposing it into a tall matrix, taking the pseudoinverse, and then transposing again.
- For any matrix
- Use CR-decomposition:
- is tall matrix w/ full col. rank. is wide matrix w/ full row rank
- ( and invertible both bijections)
- is least squares sol:
- least squares sol satisfies normal equation
- () ( right inverse) (def. ) (def. transpose)
- is min norm solution:
- We know from wide matrices: min norm
- for some ( and have same row space)
- Properties of the Pseudoinverse
- and
- ( = right inverse of and = left inverse of )
- .
- and