Meta Is Delaying the Rollout of Its Flagship AI Model

Comments (3)

Bostonian · 9h ago

tiahura · 3h ago

Looks like Meta just hit the llama of diminishing marginal returns. When even $72 billion can’t buy breakthrough progress, you know the curve’s gone flat.

RiDiracTid · 22m ago

It confirms my understanding that lab culture and researcher quality is incredibly incredibly important in training a good model, and while Meta seems to have the money to hire the latter, there's just something rotten that makes them unable to execute well.

This gwern comment seems to describe the situation well:

https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-hel...

> FB is buying data from the same places everyone else does, like Scale (which we know from anecdotes like when Scale delivered FB a bunch of blatantly-ChatGPT-written 'human rating data' and FB was displeased), and was using datasets like books3 that are reasonable quality. The reported hardware efficiency numbers have never been impressive, they haven't really innovated in architecture or training method (even the co-distillation for Llama-4 is not new, eg. ERNIE was doing that like 3 years ago), and insider rumors/gossip don't indicate good things about the quality of the research culture. (It's a stark contrast to things like Jeff Dean overseeing a big overhaul to ensure bit-identical reproducibility of runs and Google apparently getting multi-datacenter training working by emphasizing TPU interconnect.) So my guess is that if it's bad, it's not any one single thing like 'we trained for too few tokens' or 'some of our purchased data was shite': it's just everything in the pipeline being a bit mediocre and it multiplying out to a bad end-product which is less than the sum of its parts.

> Remember Karpathy's warning: "neural nets want to work". You can screw things up and the neural nets will still work, they will just be 1% worse than they should be. If you don't have a research culture which is rigorous about methodology or where people just have good enough taste/intuition to always do the right thing, you'll settle for whatever seems to work... (Especially if you are not going above and beyond to ensure your metrics aren't fooling yourself.) Now have a 1% penalty on everything, from architecture to compute throughput to data quality to hyperparameters to debugging implementation issues, and you wind up with a model which is already obsolete on release with no place on the Pareto frontier and so gets 0% use.

Baby is healed with first personalized gene-editing treatment (nytimes.com)

Cracked - method chaining/CSS-style selector web audio library (github.com)

Ollama's new engine for multimodal models (ollama.com)

A leap year check in three instructions (hueffner.de)

Teal – A statically-typed dialect of Lua (teal-language.org)

The Awful German Language (1880) (faculty.georgetown.edu)

The unreasonable effectiveness of an LLM agent loop with tool use (sketch.dev)

Initialization in C++ is bonkers (2017) (blog.tartanllama.xyz)

Bringing 3D shoppable products online with generative AI (research.google)

Launch HN: Tinfoil (YC X25): Verifiable Privacy for Cloud AI

Lock-Free Rust: How to Build a Rollercoaster While It's on Fire (yeet.cx)

Tek – A music making program for 24-bit Unicode terminals (codeberg.org)

NASA keeps ancient Voyager 1 spacecraft alive with Hail Mary thruster fix (theregister.com)

GTK Krell Monitors (gkrellm.srcbox.net)

The current state of TLA⁺ development (ahelwer.ca)

"Goodwill", key member of the SoCal Python Community has passed away (socalpython.org)

Dia – An Early Review (fldr.zip)

A Tiny Boltzmann Machine (eoinmurray.info)

Show HN: Easel – Code multiplayer games like singleplayer (easel.games)

Rolling Highway (en.wikipedia.org)

Dr. Dobb's Journal interviews Jef Raskin (1986) (computeradsfromthepast.substack.com)

Windsurf SWE-1: Our First Frontier Models (windsurf.com)

Meta Battles an 'Epidemic of Scams' as Criminals Flood Instagram and Facebook (wsj.com)

Malicious compliance by booking an available meeting room (clientserver.dev)

Show HN: Min.js style compression of tech docs for LLM context (github.com)

“The Mind in the Wheel” lays out a new foundation for the science of mind (experimental-history.com)

Improving Naval Ship Acquisition (construction-physics.com)

Náhuatl and Mayan Language Renaissance Occurring in Mexico (yucatanmagazine.com)

Fetii (YC S22) Is Hiring (ycombinator.com)

In the US, a rotating detonation rocket engine takes flight (arstechnica.com)

I don't like NumPy (dynomight.net)

Refactoring Clojure (orsolabs.com)

Coinbase says hackers bribed staff to steal customer data, demanding $20M ransom (cnbc.com)

Pathfinding (juhrjuhr.itch.io)

Show HN: Real-Time Gaussian Splatting (github.com)

Working on complex systems: What I learned working at Google (thecoder.cafe)

Stop using REST for state synchronization (2024) (mbid.me)

Show HN: Undetectag, track stolen items with AirTag (undetectag.com)

Lua for Elixir (davelucia.com)

Huawei is spamming open source community for its Harmony ecosystem (See Comment) (github.com)

The Scalar Select Anti-Pattern (matklad.github.io)

The Fastest Way yet to Color Graphs (quantamagazine.org)

Sitting for a long time shrinks your brain even if you exercise (alz-journals.onlinelibrary.wiley.com)

Wavelet Trees: An Introduction (2011) (alexbowe.com)

ML-Enhanced Code Completion Improves Developer Productivity (2022) (research.google)

I was a Theranos whistleblower. Here's what I think Elizabeth Holmes is up to (statnews.com)

"You Knew What You Were Signing Up For" – A Harmful Narrative in DFIR? (forensicfocus.com)

Harvard Law paid $27 for a copy of Magna Carta. It's an original (nytimes.com)

Model Organisms Are Not Static (asimov.press)

Demystifying Ruby: It's all about threads (2024) (blog.papey.fr)

Meta Is Delaying the Rollout of Its Flagship AI Model

Comments (3)