# Metadata Source URL:: https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html Topics:: #ai --- # Tokenization ## Highlights > [!quote]+ Updated on 171022_194950 > > Given a character sequence and a defined document unit, tokenization is >the task of chopping it up >into pieces, called tokens , perhaps >at the same time >throwing away certain characters, such as punctuation