Is there a balance to be struck between simple hierarchical models and (statmodeling.stat.columbia.edu)

Very interesting! The one thing I don't understand is how the author made the jump from "we lost the confidence signal in the move to 4.1-mini" and "this is because of the alignment/steerability improvements."

Previous OpenAI models were instruct-tuned or otherwise aligned, and the author even mentions that model distillation might be destroying the entropy signal. How did they pinpoint alignment as the cause?

mlin4589 · 11h ago

Good question! We do know from OpenAI's system card from GPT-4 that the post-trained RLHF model is significantly less calibrated compared to the pre-trained model, so it's a matter of speculation that something similar is occurring. However, it's more of a hunch more than anything. I would be curious if it's possible to reproduce this behavior, or the impact of distillation on calibration.

Disclaimer: I wrote this blog post.

Workaccount2 · 1h ago

Wouldn't it be something if AI parlance crept into common parlance...

itchyjunk · 2h ago

Could you please elaborate what less or more calibrated means here? Thanks!

Scene_Cast2 · 2h ago

For binary labels: you take a slice of labeled data. The mean of the ML model prediction on this data is different from the mean of the label. In practice, often a synonym for "loss is worse / could be better".

Not sure if that's what the GP meant, I only worked with binary labels stuff.

behnamoh · 12h ago

there's evidence that alignment also significantly reduces model creativity: https://arxiv.org/abs/2406.05587

it’s it similar to humans. when restricted in terms of what they can or cannot say, they become more conservative and cannot really express all sorts of ideas.

Alex_001 · 11h ago

That paper is a great pointer — the creativity vs. alignment trade-off feels a lot like the "risk-aversion" effect in humans under censorship or heavy supervision. It makes me wonder: as we push models to be more aligned, are we inherently narrowing their output distribution to safer, more average responses?

And if so, where’s the balance? Could we someday see dual-mode models — one for safety-critical tasks, and another more "raw" mode for creative or exploratory use, gated by context or user trust levels?

gamman · 4h ago

Maybe this maps to some human structures that manage control-creativity tardeoff through hierarchy?

I feel that companies with top-down management would have more agency and perhaps creativity towards (but not at) the top, and the implementation would be delegated to bottom layers with increasing levels of specification and restriction.

If this translates, we might have multiple layers with varied specialization and control, and hopefully some feedback mechanisms about feasibility.

Since some hierarchies are familiar to us from real-life, we might prefer these to start with.

It can be hard to find humans that are very creative but also able to integrate consistently and reliably (in a domain). Maybe a model doing both well would also be hard to build compared to stacking few different ones on top of each other with delegation.

I know it's already being done by dividing tasks between multiple steps and models / contexts in order to improve efficiency, but having explicit strong differences of creativity between layers sounds new to me.

pjc50 · 3h ago

In humans this corresponds to "psychological safety": https://en.wikipedia.org/wiki/Psychological_safety

> is the belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes

Maybe you can do that, but not on a model you're exposing to customers or the public internet.

jsnider3 · 51m ago

That comparison isn't very optimistic for AI safety. We want AI to do good things because they are good people, not because they are afraid being bad will get them punished. Especially since AI will very quickly be too powerful for us to punish.

pjc50 · 41m ago

> We want AI to do good things because they are good people

"Good" is at least as much of a difficult question to define as "truth", and genAI completely skipped all analysis of truth in favor of statistical plausibility. Meanwhile there's no difficulty in "punishment": the operating company can be held liable, through its officers, and ultimately if it proves too anti-social we simply turn off the datacentre.

jsnider3 · 24m ago

> Meanwhile there's no difficulty in "punishment": the operating company can be held liable, through its officers, and ultimately if it proves too anti-social we simply turn off the datacentre.

Punishing big companies who obviously and massively hurt people is something we struggle with already and there are plenty of computer viruses that have outlived their creators.

malfist · 11h ago

How are you defining "creativity" in context with a statistical model?

hansvm · 10h ago

> defined as syntactic and semantic diversity

malfist · 1h ago

That's not creativity, that's entropy.

It would make sense that fine tuning and alignment reduce diversity in the response, that's the goal.

exe34 · 6h ago

> it’s it similar to humans. when restricted in terms of what they can or cannot say, they become more conservative and cannot really express all sorts of ideas.

This reminds me of the time when I was a child, and my parents decreed that all communications would henceforth happen in English. I became selectively mute. I responded yes/no, and had nothing further to add and ventured no further information. The decree lasted about a week.

andai · 5h ago

What did you use to communicate before that? Were you fluent in English?

exe34 · 3h ago

No, it was a local creole. And no, I was learning it at school.

qwertytyyuu · 53m ago

People use llm as part of their high precision systems? That’s worrying

sega_sai · 3h ago

Can we have models also return a probability, reflecting how accurate the statements it made is ?

jsnider3 · 49m ago

You can ask a model to give you probability estimates of its confidence, but none of the frontier models were trained to be good at giving probability estimates to my knowledge.

cyanydeez · 3h ago

Sure, but then you need probability stats on the probability stats.

sega_sai · 2h ago

I am not sure what you mean. The idea is that the network should return the text, and a confidence expressed as probability. When trained, the log-score should be optimized. (i'm not sure it would actually work given how the training is structured, but something like this would be useful)

redman25 · 1h ago

It's not that simple how would the model know when it knows? Removing hallucination has to be a post-training thing because you need to test the model against what it actually knows first in order to provide training examples of what it knows and doesn't know and how to respond in those circumstances.

erwin-co · 8h ago

Why not make a completely raw uncensored LLM? Seems it would be more "intelligent".

khafra · 8h ago

"LLM whisperer" folks will confidently claim that base models are substantially smarter than fine-tuned chat models; with qualitative differences in capabilities. But you have to be an LLM whisperer to get useful work out of a base model, since they're not SFT'ed, RLHF'ed, or RLAIF'ed into actually wanting to help you.

andai · 5h ago

How can I learn more about this?

Is it like in the early GPT-3 days, when you had to give it a bunch of examples and hope it catches the pattern?

im3w1l · 3h ago

Back in those days I would either create a little scene with a knowledgeable person and someone with a question. Or I would start writing a monologue and generate a continuation for it.

qwertytyyuu · 51m ago

Before rlhf, it’s much harder to use, remember the difference between gtp3 and chat gpt. The fine tuning for chat made it easier to use

No comments yet

msp26 · 7h ago

Brand safety. Journalists would write articles about the models being 'dangerous'.

teruakohatu · 8h ago

In theory that sounds great, but most LLM providers are trying to produce useful models that ultimately will be widely used and make them money.

A model that is more correct but swears and insults the user won't sell. Likewise a model that gives criminal advice is likely to open the company up to lawsuits in certain countries.

A raw LLM might perform better on a benchmark but it will not sell well.

andai · 5h ago

Disgusted by ChatGPT's flattery and willingness to go along with my half-baked nonsense, I created an anti-ChatGPT, which is unfriendly and pushes back on nonsense as hard as possible.

All my friends hate it, except one guy. I used it for a few days, but it was exhausting.

I figured out the actual use cases I was using it for, and created specialized personas that work better for each one. (Project planning, debugging mental models, etc.)

I now mostly use a "softer" persona that's prompted to point out cognitive distortions. At some point I realized, I've built a therapist. Hahaha.

alganet · 8h ago

What kinds of contents do you want them to produce that they currently do not?

simion314 · 6h ago

>What kinds of contents do you want them to produce that they currently do not?

OpenAI models refuse to translate or do any transformation for some traditional, popular stories because of violence, the story was about a bad wolf eating some young goats that did not listen the advice from their mother.

So now try to give me a prompt that works with any text and that convinces the AI that is ok in fiction to have violence or bad guys/animals that get punished.

Now I am also considering if it censors the bible where some pretend good God kills young chilren with ugly illnesses to punish the adults, or for this book they made excaptions.

Mountain_Skies · 3h ago

>alignment

Amazing how this Orwellian spin on propaganda has been so quickly embraced.

qwertytyyuu · 49m ago

It supposed to mean getting the ai to share our values so it doesn’t do things we don’t like in pursuit of what we tell it to do. Not necessarily political alignment

rusk · 6h ago

Upgrade scripts it is so. plus ca change

Forty-One (justintadlock.com)

Small Reforms to Improve the US Medical System (brownstone.org)

Disney to Build a Magic Kingdom Theme Park in the Middle East (nytimes.com)

The Death of Shopify's Startup Dream, One Layoff at a Time (thewalrus.ca)

Show HN: A simple MCP Server in bash (gist.github.com)

Show HN: eInk optimized manga with Kindle Comic Converter (+Kobo/ReMarkable) (github.com)

How much does childbirth cost in the US? A lot, depending on where you live (pricepoints.health)

Laser-Powered Harvesting Tool for Tabletop Grown Strawberries (mdpi.com)

Perinatal SSRI exposure impacts fear circuit activation&behavior in mice&humans (nature.com)

90-second Newark blackout exposes parlous state of US air traffic control (theregister.com)

Esbuild – An Fast Bundler for the Web (github.com)

World-first genetic index helps Aussie farmers breed heat tolerant cows (abc.net.au)

After 17 Years Underground Cicada Brood to Swarm U.S. (e360.yale.edu)

Apple Working to Move to AI Search in Browser Amid Google Fallout (bloomberg.com)

Yggdrasil Network – Public Node by Thingylabs (ygg.thingylabs.io)

Switch 2 confirms the use of the T239 chip [video] (youtube.com)

WeightWatchers files for bankruptcy protection to eliminate debt burden (npr.org)

Are 'CSS Carousels' Accessible? (sarasoueidan.com)

Writing Arabic in English (sherifelmetwally.com)

CrowdStrike to Cut 5% of Workforce. CEO Points to AI Productivity Gains (investors.com)

AI Just Disappoints (fortschrittsanzeige.de)

The Path to Memory Safety Is Inevitable (hardenedlinux.org)

Pironman 5-Max tower PC case for the Raspberry Pi 5 (cnx-software.com)

Ask HN: What's your favorite text editor on Mac and why

Licensing Was About Avoiding Copies. AI Now Adapts Code–Not Copies (blog.perguth.de)

Pentagon to shake up "outdated" software procurement—declares war on open source (techradar.com)

The Beauty of 3D Web Graphics (aircada.com)

Show HN: 80s Style AI Interface (vt320.odai.chat)

Is there a balance to be struck between simple hierarchical models and (statmodeling.stat.columbia.edu)

Google and Elementl Power to advance nuclear energy site development (blog.google)

Ask HN: What career will you switch to when AI replaces developers?

Show HN: Co-Op Translator – Automate Docs and Image Localization (github.com)

Concurrency Control and Recovery in Database Systems: Preface and Chapter 1 (muratbuffalo.blogspot.com)

Variability, Not Repetition, Is the Key to Mastery (scotthyoung.com)

A molecular cloud, long invisible, is discovered near our solar system (phys.org)

Understanding Common Factor Attacks: An RSA-Cracking Puzzle (loyalty.org)

The Lego Island decompilation is now 100% complete [video] (youtube.com)

OpenAI to make country-specific ChatGPT (bgr.com)

3D Printed TPU Bellows [video] (youtube.com)

Show HN: I made a Type Script game (atypescriptgame.com)

PyTorch Foundation Expands and Welcomes VLLM and DeepSpeed (pytorch.org)

Fast reveals new millisecond pulsar missed earlier due to signal overlap (phys.org)

Twenty Years of Attacks on the RSA Cryptosystem (1999) [pdf] (ams.org)

ICE Arrests Workers Involved in Landmark Labor Rights Case (theintercept.com)

Open cluster King 6: Astronomers find three new variable stars (phys.org)

Detect and crash Chromium bots with one weird trick (bots hate it) (blog.castle.io)

LLM Mini PC War_: Beelink Enters the Race with $1,800 Strix Halo GTR9 Pro AI (hardware-corner.net)

Physicists uncover how geometric frustration shapes the rose's iconic blossom (phys.org)

Color Models for Humans and Devices (developer.mozilla.org)

From data to diagnosis – how AI is changing the world of medicine (cosmosmagazine.com)

Alignment is not free: How model upgrades can silence your confidence signals

Comments (37)