Show HN: Context Rot Technical Report – How Input Length Impacts LLM Performance (research.trychroma.com)

This has a lot of caveats and limitations. However, the model is available for download via a script in the repo, and the exact benchmarks I used are available. The white paper gets into theory and application, as well as reveals a lot of limitations and interesting differences from transformers in terms of training and prompting behavior. It also produces extensive appendices (over 100 pages) on training datasets used, and performance on the ~260 (I think?) NIV2 tasks in its validation dataset.

Running inference for the DSRU model + BGE embedding model together takes a bit shy of 10GB of VRAM, and the reference comparison model -- Zephyr 7B -- takes about 15GB of VRAM.

Comments (4)

tripplyons · 1h ago

How does this model compare to just using a linear classifier trained on BGE embeddings?

throwawayffffas · 4h ago

Can I ask? why do you have a single model for all these tasks?

Wouldn't it be easier and more ergonomic to users to have dedicated models for each of this tasks?

orderone_ai · 4h ago

Thank you for the question!

I would say that ease of use and deployment is actually a good reason to have a single model.

We don't train 20 LLMs for different purposes - we train one (or, I guess 3-4 in practice, each with their own broad specialization), and then prompt it for different tasks.

This simplifies deployment, integration, upgrading, etc.

This model is basically the same - instead of having a restriction to doing single-task classification. This means that a user can complete new tasks using a new prompt, not a new model.

throwawayffffas · 3h ago

While I agree with the general reasoning, isn't it harder for the user to prompt the model correctly as opposed to selecting a specialized model that they wish to use?

That's the feeling I have when I try to use LLMs for more general language processing.

Have you run in cases where the model "forgets" the task at hand and switches to another mid text stream?

Regardless of all of the above. It looks to me that your choice of reasoning and problem solving in the latent space is a great one and where we should be collectively focusing our efforts, keep up the good work.

Goldman Sachs doesn't have to hire a $180k software engineer–meet Devin (bloomberg.com)

Panasonic opens country's largest EV battery plant in De Soto, Kansas (kctv5.com)

A Concept for Reimagining Browser Bookmarks (chromewebstore.google.com)

Show HN: Pentra Desktop – Local pentesting tool for automated report generation (pentra.ai)

Ask HN: DAO governance beyond crypto treasuries?

What Is Ears? The Easy Approach to Requirements Syntax (Ears) (alistairmavin.com)

Multi-agent framework and user workflows for data analysis [video] (youtube.com)

SceneScript: An AI model and method to understand and describe 3D spaces (projectaria.com)

Give and Take: An End-to-End Investigation of Giveaway Scam Conversion Rates (arxiv.org)

Ani's Character Profile in Grok (twitter.com)

Ask HN: Why isn’t Hollywood producing WWIII films in these perilous times?

Guessing the Player's Sunrise (docs.getlost.gg)

The gains from trade are not the gains from trade (nicholasdecker.substack.com)

Plastic surgeon off the hook for alleged Covid fraud, injecting kids with saline (arstechnica.com)

Claude Code token usage and costs from local JSONL files (github.com)

Iceberg Is Wrong (database-doctor.com)

The Battle for Britain's First Book of the Month Club (historytoday.com)

Show HN: Context Rot Technical Report – How Input Length Impacts LLM Performance (research.trychroma.com)

AI is killing the web. Can anything save it? (economist.com)

Improving AVIF in Open Source (halide.cx)

ZX Spectrum – Introduction To Programming (1983) [video] (youtube.com)

Commodore 64 Ultimate: Basic Beige (commodore.net)

ETT: Expanding the Long Context Understanding Capability of LLMs at Test-Time (arxiv.org)

C++ Library (mcyoung.xyz)

Giant map details nerves across a mouse's body: see stunning pics (nature.com)

Being Boring app: relax and meditate for a short while on Apple devices (peterborgapps.com)

Ice cream producers to phase out artificial food dyes (cnbc.com)

As AI advances, the best interfaces will be the ones we don't see (airesidency.substack.com)

AI's Goldilocks Problem: Powell, Huang, and Amodei Can't Agree (fortune.com)

Sell Yourself Well – What Soham Parekh Can Teach Us (fldr.zip)

Undiscovered galaxies orbiting the Milky Way, supercomputer simulations hint (livescience.com)

Collatz's Tape (gbragafibra.github.io)

My Cybersecurity Research on Red Lion G3 Web Server Vulnerabilities

C-: A Portable Assembly Language (1997) (microsoft.com)

I Answer 18 Questions (honest-broker.com)

Show HN (hexar.ai)

LittleHorse Kernel: A Platform for Distributed Event-Driven Applications (github.com)

Show HN: I Made a Product Image and Ad Cloner (extension.xsocialai.com)

Practical Design Patterns for Modern AI Systems (infoq.com)

Guinea Worm Eradication Program (cartercenter.org)

Show HN: Build your app's backend with just 1 prompt (sitegui.app)

Perplexity's Comet AI browser, I like where it's going (but it's not there yet) (zdnet.com)

Row Polymorphic Programming (stranger.systems)

Canada steals the spotlight at Europe's biggest tech event (betakit.com)

Is there a cost to try catch blocks? (brandewinder.com)

Spotted in Prod – Mobile animation examples (spottedinprod.com)

UnoCSS: The instant on-demand Atomic CSS engine (unocss.dev)

Brain drug: The deadliest "addiction" isn't a drug. It's something much worse (slate.com)

The CIA Reveals More of Its Connections to Lee Harvey Oswald (washingtonpost.com)

Updated default robots.txt on Shopify storefronts (twitter.com)

Show HN: A reasoning model that infers over whole tasks in 1ms in latent space

Comments (4)