[2011.11152] Understanding and Scheduling Weight Decay - cs ![rw-book-cover|200x400](https://readwise-assets.s3.amazonaws.com/static/images/article4.6bc1851654a0.png) ## Metadata - Author: **cs** - Full Title: [2011.11152] Understanding and Scheduling Weight Decay - Category: #articles - URL: https://arxiv.org/abs/2011.11152 ## Highlights - Weight decay is a popular and even necessary regularization technique for training deep neural networks that generalize well. Previous work usually interpreted weight decay as a Gaussian prior from the Bayesian perspective. However, weight decay sometimes shows mysterious behaviors beyond the conventional understanding. For example, the optimal weight decay value tends to be zero given long enough training time. Moreover, existing work typically failed to recognize the importance of scheduling weight decay during training