ยท 1 min read

Reference

(2024 May) Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

(2024 Dec) Byte Latent Transformer: Patches Scale Better Than Tokens

  • Group bytes into patches
  • This is similar to chunks

(2025 Mar) SuperBPE: Space Travel for Language Models

(2025 July) Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

  • Dynamically merge input tokens (or segment tokens into chunks, with each chunk one representation), based on adjacent token similarities
  • Keep track of the token chunk segmentation “probabilities”, used for passing gradients through the operations
  • Expand the number of tokens back in the final layer, by copying the tokens over to fill in positions

(2024, Feb) Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

https://github.com/hao-ai-lab/LookaheadDecoding

  • To Read (Briefly looked at it)
  • Good animation made in GIF -> Need to make sth similar
  • Shares similarity with CD-LM as in using the matched current token to select possible n-grams for continuation

Blog Post (Sept, 2025): https://x.com/linguist_cat/status/1971231846907498582

There is no such thing as a tokenizer-free lunch https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers