Tokenization - nlp.stanford.edu ![rw-book-cover|200x400](https://readwise-assets.s3.amazonaws.com/static/images/article2.74d541386bbf.png) ## Metadata - Author: **nlp.stanford.edu** - Full Title: Tokenization - Category: #articles - Tags: #ai - URL: https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html ## Highlights - Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation