Jiawei (Joe) Zhou

· 1 min read

Reference

(2024 May) Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

Could be related to CD-LM (pointed by reviewers), but it is less sophisticated
https://openreview.net/forum?id=Ni9kebsSTt¬eId=SFE8WT1fd7

(2024 Dec) Byte Latent Transformer: Patches Scale Better Than Tokens

Group bytes into patches
This is similar to chunks

(2025 Mar) SuperBPE: Space Travel for Language Models

Change BPE tokenization to respect the phrase boundaries/white space
https://superbpe.github.io/

(2025 July) Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Dynamically merge input tokens (or segment tokens into chunks, with each chunk one representation), based on adjacent token similarities
Keep track of the token chunk segmentation “probabilities”, used for passing gradients through the operations
Expand the number of tokens back in the final layer, by copying the tokens over to fill in positions

(2024, Feb) Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

https://github.com/hao-ai-lab/LookaheadDecoding

To Read (Briefly looked at it)
Good animation made in GIF -> Need to make sth similar
Shares similarity with CD-LM as in using the matched current token to select possible n-grams for continuation

Blog Post (Sept, 2025): https://x.com/linguist_cat/status/1971231846907498582

There is no such thing as a tokenizer-free lunch https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers

Authors

Jan 1, 0001 →