# Metadata
Source URL:: https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html
Topics:: #ai
---
# Tokenization
## Highlights
> [!quote]+ Updated on 171022_194950
>
> Given a character sequence and a defined document unit, tokenization is
>the task of chopping it up
>into pieces, called tokens , perhaps
>at the same time
>throwing away certain characters, such as punctuation