Deep Learning With Python - François Chollet

## Metadata
- Author: **François Chollet**
- Full Title: Deep Learning With Python
- Category: #books
## Highlights
- https://github.com/fchollet/deep-learning-with-pythonnotebooks. (Location 294)
- Concisely, AI can be described as the effort to automate intellectual tasks normally performed by humans. (Location 347)
- Could a general-purpose computer “originate” anything, or would it always be bound to dully execute processes we humans fully understand? Could it ever be capable of any original thought? Could it learn from experience? Could it show creativity? (Location 366)
- Machine learning turns this around: the machine looks at the input data and the corresponding answers, and figures out what the rules should be (Location 375)
- Therefore, the central problem in machine learning and deep learning is to meaningfully transform data: in other words, to learn useful representations of the input data at hand—representations that get us closer to the expected output. (Location 401)
- At its core, it’s a different way to look at data—to represent or encode data. For instance, a color image can be encoded in the RGB format (red-green-blue) or in the HSV format (hue-saturation-value): these are two different representations of the same data. (Location 403)
- Decision trees learned from data began to receive significant research interest in the 2000s, and by 2010 they were often preferred to kernel methods. (Location 615)
- Tags: #ai
- 1 Draw a batch of training samples, x, and corresponding targets, y_true. 2 Run the model on x to obtain predictions, y_pred (this is called the forward pass). 3 Compute the loss of the model on the batch, a measure of the mismatch between y_pred and y_true. 4 Compute the gradient of the loss with regard to the model’s parameters (this is called the backward pass). 5 Move the parameters a little in the opposite direction from the gradient—for example,W -= learning_rate * gradient—thus reducing the loss on the batch a bit. The learning rate (learning_rate here) would be a scalar factor modulating the “speed” of the gradient descent process. (Location 1478)
- Then the chain rule states that grad(y, x) == grad(y, x1) * grad(x1, x). This enables you to compute the derivative offg as long as you know the derivatives of f and g. The chain rule is named as it is because when you add more intermediate functions, it starts looking like a chain: (Location 1543)
- Computation graphs have been an extremely successful abstraction in computer science because they enable us to treat computation as data: a computable expression is encoded as a machine-readable data structure that can be used as the input or output of another program. (Location 1554)
- Tags: #computer-science #ai