Show HN: Tinker with Meta's "tokenizer-free" patcher

3 lucalp 0 5/21/2025, 2:25:38 PM huggingface.co ↗

If the future is that we tend towards replacing current tokenisation, I wanted to build intuitions around one of the core contribution of Meta's Byte Latent Transformer: entropy-based patching.

What are it's strengths and weaknesses? No better way of doing that than via tinkering with visualisations in a HF space so thought I'd share!

A few things that emerge as a result that you can try yourself:

1. robustness - high entropy means more compute will get dedicated to those bytes which include cases like low resource languages, spelling tasks etc

2. compute efficiency

2a. low entropy means less compute spent for those bytes

2b. in-context learning applies to tokenisation! It induces low entropy regions later on in the sequence and has to waste less compute!

I'm writing a blog post on an expanded version of this, updates via https://lucalp.dev or https://x.com/lucalp__

Building a fast website with the MASH stack in Rust (emschwartz.me)

Southwest will require passengers to keep chargers visible due to fire risk (npr.org)

The Many Types of Polymorphism (krishna.github.io)

Building CLIs with Click in Python [video] (youtube.com)

The World Wide Web and the Death of Graceful Degradation (hackaday.com)

Everyone Here Is in a Cult (usefulfictions.substack.com)

An Easy Explanation of the Model Context Protocol (MCP) (kaggle.com)

A 70-Year-Old Man's Search for Younger-Looking Skin (wsj.com)

The Auth Deco Manifesto (2023) (ckwalker.com)

Cheating at Search with LLMs (trieve.ai)

A compilation of all the things people have been generating with VEO 3 (old.reddit.com)

Trump admin tells Supreme Court: DOGE needs to do its work in secret (arstechnica.com)

It's the Size of Texas (vissiniti.com)

Apple legend Jony Ive takes control of OpenAI's design future (arstechnica.com)

Hacker who breached app used by Trump aide stole data from across US Government (reuters.com)

Judge Finds U.S. Violated Court Order with Sudden Deportation Flight to Africa (nytimes.com)

Rapid Loads for Country Roads: Making Ambrook 30% Faster with OpenTelemetry (ambrook.com)

Enhanced Games swimmer 'breaks world record' (bbc.com)

Government by algorithm (en.wikipedia.org)

The Enterprise Readiness Playbook: Transforming B2B SaaS Products for Enterprise (guptadeepak.com)

Updown.io – Simple and inexpensive website monitoring (updown.io)

ITXPlus: A ITX Sized Macintosh Plus Logicboard Reproduction (68kmla.org)

NTT Docomo, popularizer of emoji, ends support after 26 years (asia.nikkei.com)

TruVi – Rethinking take home assignment in your hiring (truvi.app)

Box Is Down (status.box.com)

Will Writing Survive A.I.? This Media Company Is Betting on It (nytimes.com)

A new class of molecules against cancer cells refractory to standard treatments (presse.inserm.fr)

ChatGPT is shockingly bad at poker (natesilver.net)

Walmart Eliminates 1500 jobs on technology team (bloomberg.com)

GitHub Developer Advocate Demo: Microsoft Build 2025 [video] (youtube.com)

Zotac Gaming Handheld Running Manjaro (forum.manjaro.org)

Mortality in Male Bodybuilding Athletes (pubmed.ncbi.nlm.nih.gov)

Show HN: I made a Google Duplex alternative (altodial.com)

Seriously, How Do You Get More Reviews on Amazon Products? (medium.com)

David Hoffman's Stroke: Things Nobody Told Me That I Experience [video] (youtube.com)

Show HN: Aiden AI – An Executive Assistant App for Founders and Professionals (apps.apple.com)

Diseases are spreading. The CDC isn't warning the public like it was months ago (npr.org)

Nearly 50k emails leaked from org linked to Russian intelligence (occrp.org)

The Twopenny Hangover (mikedashhistory.com)

The Machine Stops (1909) (standardebooks.org)

Google rejected giving publishers more choice to opt out of AI Search (theverge.com)

Death by a 1000 Standups (niyas.me)

Show HN: TripJam – Collaborative AI Travel Agent (tripjam.app)

Hacker who breached comms app used by Trump aide stole data from across US govt (yahoo.com)

How to Use Mermaid Chart as an AI Diagram Generator for Developers (2023) (docs.mermaidchart.com)

Scientists figure out how the brain forms emotional connections (arstechnica.com)

My Tony Robbins Experience (idiallo.com)

UnitedHealth paid nursing homes to reduce hospital transfers (theguardian.com)

SysModeler AI – Instantly turn text, code, or voice into SysML models (sysmodeler.ai)

Marked decline in semicolons in English books, study suggests (theguardian.com)

Show HN: Tinker with Meta's "tokenizer-free" patcher

Comments (0)