Apple's Liquid Glass is prep work for AR interfaces, not just a design refresh (omc345.substack.com)

The self-edit approach is clever - using RL to optimize how models restructure information for their own learning. The key insight is that different representations work better for different types of knowledge, just like how humans take notes differently for math vs history.

Two things that stand out:

- The knowledge incorporation results (47% vs 46.3% with GPT-4.1 data, both much higher than the small-model baseline) show the model does discover better training formats, not just more data. Though the catastrophic forgetting problem remains unsolved, and it's not completely clear whether data diversity is improved.

- The computational overhead is brutal - 30-45 seconds per reward evaluation makes this impractical for most use cases. But for high-value document processing where you really need optimal retention, it could be worth it.

The restriction to tasks with explicit evaluation metrics is the main limitation. You need ground truth Q&A pairs or test cases to compute rewards. Still, for domains like technical documentation or educational content where you can generate evaluations, this could significantly improve how we process new information.

Feels like an important step toward models that can adapt their own learning strategies, even if we're not quite at the "continuously self-improving agent" stage yet.

cma · 6h ago

From Anthropic a couple days ago too, self finetuning:

https://arxiv.org/html/2506.10139v1

No comments yet

libraryofbabel · 6h ago

I wonder if anyone who’s really in the know could summarize where the research is at with getting LLMs to learn “on the job” (through continuous fine tuning or whatever) and what the blockers are to this being a useful deployable thing, e.g. having a model+coding agent that can actually learn a codebase over time (cost? model collapse? something else?).

I’m sure this is something the big labs are trying but from the outside as a user of LLMs it feels like people don’t talk about this very much and instead the focus right now is on better training (eg reinforcement learning) with the assumption that anything else not learned during training will be stuffed into the context somehow as needed. But from a naive perspective the lack of learning from experience after training seems like the biggest thing standing between us and AGI.

johnsmith1840 · 4h ago

We have no idea how to do continual learning.

Many people here are right, compute, collapse, forgetting whatever.

The only "real" way to do this would be: 1. Train a model 2. New data 3. Retrain the model in full + new data 4. Repeat 5. You still have no garuntee on the "time" aspect though.

But CL as a field basically has zero answers on how to do this in a true sense. It's crazy hard because the "solutions" are hypocritical in many ways.

We need to expand the model's representation space while keeping the previous representation space nearly the same?

Basically, you need to modify it without changing it.

Most annoying is that even the smallest of natural brains do this easily. I have a long winded theory but basically it boils down to AI likely needs to "sleep" or rest somehow.

mackenziebowes · 3h ago

The cool thing about AI that I'm seeing as an outsider/non-academic, is that it's relatively cheap to clone. Sleeping/resting could be done by a "clone" and benefits could be distributed on a rolling schedule, right?

johnsmith1840 · 3h ago

One clone takes a nap while the other works is pretty cool.

But the clone couldn't run without sleeping? So that's more of a teammate than a clone.

1 works while the other sleeps and then swap.

If this method ever worked our current alignment methods get chucked out the window those would be two completely different AI.

mackenziebowes · 2h ago

I can't be certain, I'm not at all an AI engineer or math guy, but I think at the "wake up" point you equalize instances. Like during 'sleep' some list of functions/operations `m` are applied to model weights `n` producing a new model, `n + 1`. Wouldn't you just clone `n + 1`, send it to work, and start a new training run `m + 1` to make `n + 2`?

Davidzheng · 3h ago

but natural brains sleep too, which I guess is your point. But actually is it even clear in human brains whether most of neural compute is evaluation vs training? maybe the brain is like for e.g. capable of running 20T model of compute and deploying like 2B model at given time and most of compute is training in background new models--I mean like you say we have no idea except for training from scratch, but if we are working much below capacity of compute we could actually actively train from scratch repeatedly (like the xAI cluster could probably train gpt4o size in a matter of hours)

johnsmith1840 · 3h ago

AGI likely a combination of these two papers + something new likely along the lines of distillation.

1. Preventing collapse -> model gets "full" https://arxiv.org/pdf/1612.00796

2. Forgetting causes better generalization https://arxiv.org/abs/2307.01163

3. Unknow paper that connects this - allow a "forgetting" model that improves generalization over time. - I tried for a long time to make this but it's a bit difficult

Fun implication is that if true this implies AGI will need "breaks" and likely need to consume non task content of high variety much like a person does.

mnahkies · 5h ago

I'm no expert, but I'd imagine privacy plays (or should play) a big role in this. I'd expect that compute costs mean any learning would have to be in aggregate rather than specific to the user which would then risk leaking information across sessions very likely.

I completely agree that figuring out a safe way to continually train feels like the biggest blocker to AGI

kcorbitt · 4h ago

The real answer is that nobody trusts their automated evals enough to be confident that any given automatically-trained release actually improves performance, even if eval scores go up. So for now everyone batches up updates and vibe-checks them before rolling them out.

free_bip · 5h ago

The most obvious problem is alignment. LLM finetuning is already known to be able to get rid of alignment, so any form of continuous fine tuning would in theory be able to as well.

notnullorvoid · 5h ago

What kind of alignment are you referring to? Of course more fine-tuning can disrupt earlier fine-tuning, but that's a feature not a bug.

kadushka · 6h ago

The most obvious blocker is catastrophic forgetting.

solarwindy · 3h ago

Is that necessarily a blocker? As others in this thread have pointed out, this probably becomes possible only once sufficient compute is available for some form of non-public retraining, at the individual user level. In that case (and hand-waving away just how far off that is), does a model need to retain its generality?

Hypothetically (and perhaps more plausibly), a continually learning model that adapts to the context of a particular org / company / codebase / etc., could even be desirable.

kadushka · 26m ago

Retraining the whole model from scratch every time you wanted it to learn something is not a solution.

does a model need to retain its generality?

Only if you want it to remain smart.

ivape · 6h ago

The most obvious blocker is compute. This just requires a shit ton more compute.

johnsmith1840 · 4h ago

If it was pure compute we'd have simple examples. We can't do this even on the smallest of AI models.

There are tons of benchmarks around this you can easily run with 1 gpu.

It's compute only in the sense that the only way to do it is retrain a model from scratch at every step.

If you solve CL with a CNN you just created AGI.

Davidzheng · 3h ago

yeah but training from scratch is a valid solution. And if we can't find easier solutions we should just try to make it work. Compute is the main advantage we have in silica vs biological computers so we might as well push it--like ideally soon we will have one large AI running on datacenter size computer solving really hard problems and it could easily be most of the compute (>95%) is on training step--which is where really AI excels tbh not inference techniques. Like even Alphaproof for example spends most of compute training on solving simpler problems--which btw is one instance of continual training/training at test time which is implemented.

libraryofbabel · 6h ago

That tracks, but say cost was no object and you had as many H100s as you wanted. Would continuous learning actually work even then?

IncreasePosts · 6h ago

Maybe part of the inference outputs could be the updates to make to the network

all2 · 7h ago

Website with code and examples: https://jyopari.github.io/posts/seal

dang · 5h ago

Thanks! I'll put that link in the top text too.

Centigonal · 4h ago

It seems to me that "forgetting correctly" is rapidly becoming a more pertinent problem in this field than "learning correctly." We're making great strides in getting models to teach themselves new facts, but the state of the art in jettisoning the least relevant information given new knowledge and finite capacity is lagging far behind.

"Forgetting correctly" is something most human brains are exceptionally good at, too. I wonder how that works...

azeirah · 23m ago

Learning is strongly related to spaced repetition.

This is often associated with learning tools like anki and stuff, but the real world is all about encountering things at certain frequencies (day night cycles, seasons, places you visit, people you see.... everything, really)

I'm wondering if there maybe some sort of inverse to SR, maybe?

Davidzheng · 3h ago

I don't think forgetting correctly is something humans are really good at. I'm not convinced human brains are "exceptionally good" at much of what we do tbh. I think human brain memory capacity is so large that most of forgetting is nowhere near "clearing space for new info" but because the brain correctly knows that some past bad information interferes with learning new things.

johnsmith1840 · 4h ago

Did an interesting study that actually LLMs "hide" internal data.

They don't just "forget" that information can come back at a later time if you continue to train.

So basically any time a model is trained you need to check it's entire memory not just a small part.

campbel · 4h ago

Is it some form of least-recently-used approach? I'm running tests on my own mind trying to figure it out now :D part of what I love about this area of computer science.

yahoozoo · 6h ago

Hmm, it looks like it’s just a framework that fine-tunes LoRA adapter then merges the adapter into the original model. It is using the PeftModel and its “merge_and_unload” from the HuggingFace library which performs the adapter merge into the base model…what is new here, exactly?

observationist · 5h ago

Looks like it may be the stability of the approach, avoiding alignment tax and model collapse.

I'd love to see a full circle of hypernetworks, with both models continuously updated through generated LoRAs, the hypernetwork updated to accommodate the new model state. You'd need a meta-hypernetwork to apply LoRAs to the hypernetwork, and then you could effectively have continuous learning.

mackenziebowes · 3h ago

I'm frustrated that they named it SEAL when SAL is both more accurate and anthropomorphic. Naming the main takeoff technology after a stereotypical swarthy Reuben lover would have made history much more delightful.

bravesoul2 · 5h ago

Getting closer to the event horizon

ramoz · 5h ago

Which one

https://forum.cursor.com/t/important-claude-has-learned-how-...

ivape · 6h ago

This still relies on fine-tuning. How would a cloud LLM deal with this if every user literally fine tunes it? Seems like something destined for local private LLMs, but the notion of continuous fine tuning locally at the moment is sci-fi level stuff because the hardware is just not there yet (we can barely inference well with a reasonable sized context).

bigicaptain · 5h ago

How can I start

The Tech Job Meltdown (professoraxelrod.com)

Building a WordPress MCP Server for Claude: Automating Blog Posts with AI (val.demar.in)

Filedb: Disk Based Key-Value Store Inspired by Bitcask (github.com)

Implementing Logic Programming (btmc.substack.com)

Self-Adapting Language Models (arxiv.org)

Endometriosis is an incredibly interesting disease (owlposting.com)

The International Standard for Identifying Postal Items (akpain.net)

OxCaml - a set of extensions to the OCaml programming language. (oxcaml.org)

Liquid Glass – WWDC25 [video] (developer.apple.com)

Anne Wojcicki to buy back 23andMe and its data for $305M (cnbc.com)

Rethinking Losses for Diffusion Bridge Samplers (arxiv.org)

Whatever Happened to Sandboxfs? (blogsystem5.substack.com)

Meta invests $14.3B in Scale AI to kick-start superintelligence lab (nytimes.com)

I convinced HP's board to buy Palm and watched them kill it (philmckinney.substack.com)

Student discovers fungus predicted by Albert Hoffman (wvutoday.wvu.edu)

Show HN: Tattoy – a text-based terminal compositor (tattoy.sh)

If the moon were only 1 pixel: A tediously accurate solar system model (2014) (joshworth.com)

The Hat, the Spectre and SAT Solvers (2024) (nhatcher.com)

100 years of Zermelo's axiom of choice: What was the problem with it? (2006) (research.mietek.io)

Apple's Liquid Glass is prep work for AR interfaces, not just a design refresh (omc345.substack.com)

When random people give money to random other people (2017) (quomodocumque.wordpress.com)

The concurrency trap: How an atomic counter stalled a pipeline (conviva.com)

UK unis to cough up to £10M on Java to keep Oracle off their backs (theregister.com)

AI Isn't Magic, It's Maths (zerofluff.substack.com)

High-speed fluorescence light field tomography of whole freely moving organisms (opg.optica.org)

Using computers more freely and safely (2023) (akkartik.name)

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC [pdf] (spcl.inf.ethz.ch)

Ask HN: How do I give back to people helped me when I was young and had nothing?

A Study of the Winston Red: The Smithsonian's New Fancy Red Diamond (gia.edu)

RISC-V in AI and HPC Part 1: Per Aspera Ad Astra? (eetimes.com)

MUMPS (en.wikipedia.org)

Jemalloc Postmortem (jasone.github.io)

Frequent reauth doesn't make you more secure (tailscale.com)

Simulink (Matlab) Copilot (github.com)

Kyber (YC W23) Is Hiring a Technical Account Manager (ycombinator.com)

How the Alzheimer's Research Scandal Set Back Treatment 16 Years (2022) (discovermagazine.com)

Geometry from Quantum Temporal Correlations (arxiv.org)

Thiings (thiings.co)

Luxe Game Engine (luxeengine.com)

Humpback Whales Are Way Cooler Than You (nautil.us)

Show HN: Tritium – The Legal IDE in Rust (tritium.legal)

Subtype Inference by Example (blog.polybdenum.com)

Denmark Wants to Dump Microsoft Software for Linux, LibreOffice (uk.pcmag.com)

Coming to Apple OSes: A seamless, secure way to import and export passkeys (arstechnica.com)

How I program with agents (crawshaw.io)

Slow and steady, this poem will win your heart (nytimes.com)

A receipt printer cured my procrastination (laurieherault.com)

Show HN: Qrkey – Offline private key backup on paper (github.com)

Worldwide power grid with glass insulated HVDC cables (omattos.com)

Radio pulses detected coming from ice in Antarctica (psu.edu)

Self-Adapting Language Models

Comments (35)