Practical Linear Algebra for Data Science - Cohen M. ![rw-book-cover|200x400](https://readwise-assets.s3.amazonaws.com/media/uploaded_book_covers/profile_40759/f8e12188-3975-4a7b-b31a-3c619a53d9d5.png) ## Metadata - Author: **Cohen M.** - Full Title: Practical Linear Algebra for Data Science - Category: #books - Tags: #ai ## Highlights - Linear algebra has an interesting history in mathematics, dating back to the 17th century in the West and much earlier in China. Matrices—the spreadsheets of numbers at the heart of linear algebra—were used to provide a compact notation for storing sets of numbers like geometric coordinates (this was Descartes’s original use of matrices) and systems of equations (pioneered by Gauss). (Location 63) - Modern linear algebra is computational, whereas traditional linear algebra is abstract. (Location 68) - A proof in mathematics is a sequence of statements showing that a set of assumptions leads to a logical conclusion. Proofs are unquestionably important in pure mathematics. (Location 112) - To be clear: intuition from code is no substitute for a rigorous mathematical proof. The point is that “soft proofs” allow you to understand mathematical concepts without having to worry about the details of abstract mathematical syntax and arguments. This is particularly advantageous to coders who lack an advanced mathematics background. (Location 129) - In abstract linear algebra, vectors may contain other mathematical objects including functions. (Location 167) - Here’s an important rule: transposing twice returns the vector to its original orientation. In other words, vTT= v. That may seem obvious and trivial, but it is the keystone of several important proofs in data science and machine learning, including creating symmetric covariance matrices as the data matrix times its transpose (Location 320) - The magnitude of a vector—also called the geometric length or the norm— is the distance from tail to head of a vector, and is computed using the standard Euclidean distance formula: the square root of the sum of squared vector elements (Location 340) - v∥ =√∑n v2ii=1 (Location 343) - In mathematics, the dimensionality of a vector is the number of elements in that vector, while the length is a geometric distance; in Python, the function len() (where len is short for length) returns the dimensionality of an array, while the function np.norm() returns the geometric length (magnitude). (Location 346) - Creating an associated unit vector is easy; you simply scalar multiply by the reciprocal of the vector norm (Location 361) - orthogonal vectors have a dot product of zero (that claim goes both ways— when the dot product is zero, then the two vectors are orthogonal). So, the following statements are equivalent: two vectors are orthogonal; two vectors have a dot product of zero; two vectors meet at a 90° angle. (Location 439) - The outer product is quite different from the dot product: it produces a matrix instead of a scalar, and the two vectors in an outer product can have different dimensionalities, whereas the two vectors in a dot product must have the same dimensionality. (Location 464) - Gram-Schmidt procedure (Location 486) - we break apart one mathematical object into a combination of other objects. The details of the decomposition depend on our constraints (in this case, orthogonal and parallel to a reference vector), which means that different constraints (that is, different goals of the analysis) can lead to different decompositions of the same vector. (Location 528) - Linear weighted combination simply means scalar-vector multiplication and addition: take some set of vectors, multiply each vector by a scalar, and add them to produce a single vector (Location 593) - In dimension-reduction procedures such as principal components analysis, each component (sometimes called factor or mode) is derived as a linear weighted combination of the data channels, with the weights (the coefficients) selected to maximize the variance of the component (Location 621) - Independence is a property of a set of vectors. That is, a set of vectors can be linearly independent or linearly dependent; (Location 647) - In fact, many problems in applied linear algebra can be conceptualized as finding the best set of basis vectors to describe some subspace. (Location 733) - The basis needs to span a subspace for it to be used as a basis for that subspace, because you cannot describe something that you cannot measure. (Location 746) - Linear weighted combination means to scalar multiply and add vectors in a set. (Location 765) - A set of vectors is linearly dependent if a vector in the set can be expressed as a linear weighted combination of other vectors in the set. And the set is linearly independent if there is no such linear weighted combination. (Location 766) - A basis is a ruler for measuring a space. A vector set can be a basis for a subspace if it (1) spans that subspace and (2) is linearly independent. (Location 768) - A correlation coefficient is a single number that quantifies the linear relationship between two variables. (Location 801) - Kernels are carefully constructed to optimize certain criteria, such as smooth fluctuations, sharp edges, particular waveform shapes, and so on. (Location 840) - “Shifting” a matrix has two primary (extremely important!) applications: it is the mechanism of finding the eigenvalues of a matrix, and it is the mechanism of regularizing matrices when fitting models to data. (Location 1058)