What Does Tokenization Mean?
Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded. The tokens become the input for another process like parsing and text mining.
Techopedia Explains Tokenization
Tokenization relies mostly on simple heuristics in order to separate tokens by following a few steps: