Tokenization - nlp.stanford.edu

## Metadata
- Author: **nlp.stanford.edu**
- Full Title: Tokenization
- Category: #articles
- Tags: #ai
- URL: https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html
## Highlights
- Given a character sequence and a defined document unit, tokenization is
the task of chopping it up
into pieces, called tokens , perhaps
at the same time
throwing away certain characters, such as punctuation