Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs (emergent-misalignment.com)

The City of Chicago's lawyers went the opposite direction in response to @tpacek's affidavit that the release of table/column names would have "marginal value" to an attacker. The city latched onto that to get a trial that eventually went to the IL Supreme Court and lost.

    [I]n my affidavit, I wrote that SQL schemas would provide “only marginal value” to an attacker. Big mistake. Chicago jumped on those words and said “see, you yourself agree that a schema is of some value to an attacker.” Of course, I don’t really believe that; “only marginal value” is just self-important message-board hedging. I also claimed on the stand that “only an incompetently built application” could be attacked with nothing but it’s schema. Even I don’t know what I meant by that.

His post: https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/ My post: https://mchap.io/losing-a-5yr-long-illinois-foia-lawsuit-for...

snitty · 1h ago

>The City of Chicago's lawyers went the opposite direction

Not really.

>I wrote that SQL schemas would provide “only marginal value” to an attacker. Big mistake. Chicago jumped on those words and said “see, you yourself agree that a schema is of some value to an attacker.”

The City of Chicago's argument was that something of ANY value, no matter how insignificant, would help an attacker exploit their system, and was therefore possible to keep secret under the FOIA law.

fc417fc802 · 1h ago

Such a literal interpretation isn't reasonable. There are all sorts of patterns that can be indirectly leaked through supposedly unrelated data. Yet FOIA exists and is obviously intended to be useful.

So obviously there must be some threshold for the value to an attacker. He attempted to communicate that schemas are clearly below such a threshold and they used his wording to attempt to argue the opposite.

chaps · 9m ago

Yes really. Our argument, upheld by a judge, was that there was no value to an attacker. Their point stands legally, but nothing else.

Despite all that, Chicago still pushes back aggressively. Here's a fun one from a recent denial letter they sent for data within the same database:

    "When DOF referred to reviewing over 300 variable CANVAS pages, these are not analog sequential book style pages of data. Instead, they are 300 different webpages with unique file layouts for which there is no designated first page."

This is after I requested every field reflected in within the 300 different pages because it would be unduly burdensome to go through. I'm waiting for the city's response for the TOP page rather than the FIRST page. It's asinine that we have to do this in order to understand how these systems can blindly ruin the lives of many.

They also argued the same 7(1)(g) exemption despite me being explicit about not wanting the column names.

https://www.documentcloud.org/documents/25930500-foia-burden...

https://www.documentcloud.org/documents/25930501-foia-burden...

numpad0 · 46m ago

> “only marginal value” to an attacker

> “see, you yourself agree that a schema is of some value to an attacker.”

IANAL, it appears justice systems universally interpret this type of "technically yes if that makes you happy but honestly unlikely" statements as "yes with technical bonus", not a "no with extra steps" at all, and it has to be shortened as just "unlikely from my professional perspective" or something lawyer approved for intended effect. Courts are weird.

mcphage · 1h ago

> The City of Chicago's argument was that something of ANY value, no matter how insignificant, would help an attacker exploit their system, and was therefore possible to keep secret under the FOIA law.

I’m glad that argument lost, since it totally subverts the purpose and intention of the FOIA. Any piece of information could be of value to some attacker, but that doesn’t outweigh the need for transparency.

hlieberman · 3h ago

It’s not just the UK who has standardized on these language; the U.S. intelligence community also has a list of required terminology to use for different confidence levels and different likelihoods — and distinguishing between them. It’s all laid out in ICD-203, publicly available at https://www.dni.gov/files/documents/ICD/ICD-203.pdf

I’ve found it very helpful in the same vein as RFC 2119 terminology (MUST, SHOULD, MAY, etc.); when you need your meanings to be understood by a counterparty and can agree on a common language to use.

bo1024 · 8m ago

Interesting. This terminology really makes no sense without more shared context, in my view. For example, I would not describe something that happens to me every month as a "remote possibility". Yet for a 3% chance event, repeated every day, monthly occurrences are what we expect. Similarly, someone who describes events as "nearly certain" would surely be embarrassed when one of the first 20 fails to happen, no?

senderista · 2h ago

I was so frustrated when I tried to get doctors to quantify their assessment of risk for a surgery my sister was about to undergo. They simply wouldn't give me a number, not even "better or worse than even odds". Finally an anesthesiologist privately told me she thought my sister had maybe a one-third chance of dying on the table and that was enough for me. I'm not sure how much fear of liability had to do with this reluctance, or if it was just a general aversion to discussing risk in quantitative terms (which isn't that hard, gamblers do it all the time!).

chychiu · 1h ago

Doctor here

1. It’s generally difficult to quantify such risks in any meaningful manner

2. Provision of any number adds liability, and puts you in a damned-if-does, damned-if-it-doesn’t-work-out situation

3. The operating surgeon is not the best to quantify these risks - the surgeon owns the operation, and the anaesthesiologist owns the patient / theatre

4. Gamblers quantify risk because they make money from accurate assessment of risk. Doctors are in no way incentivised to do so

5. The returned chance of 1/3 probably had an error margin of +/-33% itself

fc417fc802 · 1h ago

> It’s generally difficult to quantify such risks in any meaningful manner

According to the literature 33 out of 100 patients who underwent this operation in the US within the past 10 years died. 90% of those had complicating factors. You [ do / do not ] have such a factor.

Who knows if any given layman will appreciate the particular quantification you provide but I'm fairly certain that data exists for the vast majority of serious procedures at this point.

I've actually had this exact issue with the veterinarian. I've worked in biomed. I pulled the literature for the condition. I had lots of different numbers but I knew that I didn't have the full picture. I'm trying to quantify the possible outcomes between different options being presented to me. When I asked the specialist, who handles multiple such cases every day, I got back (approximately) "oh I couldn't say" and "it varies". The latter is obviously true but the entire attitude is just uncooperative bullshit.

> puts you in a damned-if-does, damned-if-it-doesn’t-work-out situation

Not really. Don't get me wrong, I understand that a litigious person could use just about anything to go after you and so I appreciate that it might be sensible to simply refuse to answer. But from an academic standpoint the future outcome of a single sample does not change the rigor of your risk assessment.

> Doctors are in no way incentivised to do so

Don't they use quantifications of risk to determine treatment plans to at least some extent? What's the alternative? Blindly following a flowchart? (Honest question.)

> The returned chance of 1/3 probably had an error margin of +/-33% itself

What do you mean by this? Surely there's some error margin on the assessment itself but I don't see how any of us commenting could have any idea what it might have been.

christiangenco · 2h ago

I've had the same sort of difficulty with phrases like "most" or "almost all" or "hardly any"—I crave for these to map to unambiguous numbers like the probability yardstick referenced in this article.

I spun up a quick survey[1] that I sent out to friends and family to try to get some numbers on these sorts of phrases. Results so far are inconclusive.

1. https://www.vaguequantifiers.com/

SAI_Peregrinus · 1h ago

"Almost all" is an interesting one, because it has family of mathematical definitions in addition to any informal definitions. If X is a set, "almost all elements of X" means "all elements of X except those in a negligible subset of X", where "negligible" depends on context but is well-defined.

If there's a finite subset of an infinite set, almost all members of the infinite set are not in the finite set. E.g. Almost all integers are not 5: the set of integers equal to five is finite and the set of integers not equal to five is countably infinite.

Likewise for two infinite sets of different size: Almost all real numbers are not integers.

Etc.

mannykannot · 1h ago

The more precisely they are defined, the less frequently will you see them used correctly.

jbaber · 2h ago

"Almost all" in math can mean "except at every integer or fraction" :)

tejtm · 1h ago

I would expect almost NO numbers are rational (integer or fraction) with an infinite number of Reals between each.

noqc · 56m ago

in between any two real numbers, there is a rational number, and vice versa.

JadeNB · 1h ago

> I would expect almost NO numbers are rational (integer or fraction) with an infinite number of Reals between each.

You're right (technically correct, which is the best etc.)! That is why "almost all" can mean everything except rational numbers.

layer8 · 2h ago

The semantics are almost always reasonable: https://en.wikipedia.org/wiki/Almost_all

dullcrisp · 1h ago

Sure but that’s because 100% of real numbers, by any standard measure, aren’t integers or fractions. It bothers me if it’s used to mean 95% of something though.

JadeNB · 1h ago

> "Almost all" in math can mean "except at every integer or fraction" :)

I am a mathematician, but, even so, I think that this is one of those instances where we have to admit that we have mangled everyday terminology when appropriating it, and so non-measure theoretic users should just ignore our definition. (Similarly with "group," where, at the risk of sounding tongue-in-cheek because it's so obvious, if I were trying to analyze how people usually understand its everyday meaning I wouldn't include the requirement that every element have an inverse.)

tempestn · 44m ago

Interesting. Two things that jumped out to me were 1) why do the regions of the standardization line not overlap or at least meet? And 2) What's up with the small but clear minority of people who took all the 'unlikely' phrasings to mean somewhere in the realm of 90 to 100%? My guess would be they're misreading the question and that is their estimate of unlikelihood?

pictureofabear · 10m ago

Because many people cannot or will not accept ambiguity. Charitably, I suppose this comes from a desire to logically deduce risk by multiply the severity of the consequences by the chance that something will happen. Uncharitably, it gives decisionmakers a scapegoat should they need one.

hunter2_ · 2h ago

"Rare" versus "common" is an interesting one. They sound like antonyms, but I don't think the typical probabilities are really symmetrical. Maybe something like 0%-10% for rare (although some sources say 5%) and something like 40%-100% for common.

konstantinua00 · 1h ago

"common" has such a large spread because meaning behind it is sort of "at least one in each sample", where that sample can be anything (graspable)

if you're a teacher and one student per class does the same thing - it's common. Even though it's only 1/25 or 1/30 of all students

Macha · 1h ago

Maybe it's my amount of video games played in childhood that influenced that, but common and rare are just two points on a spectrum (with at least "uncommon" in between)

bmurray7jhu · 56m ago

Text of NIE 29-51 "Probability of an Invasion of Yugoslavia in 1951"

Partial HTML: https://history.state.gov/historicaldocuments/frus1951v04p2/...

Full text PDF scan: https://www.cia.gov/readingroom/docs/CIA-RDP79R01012A0007000...

Macha · 1h ago

Who are the people that have a small bump of believing "better than even" is 10-20%? Why?

tempestn · 33m ago

You also see the opposite bump for most of the negative assessments. My assumption is that they're likely reading the question backwards. ie. "how unlikely" vs "how likely" or similar.

dejobaan · 3h ago

That was a good read (and short, with a cool graph—I want to know who tagged "Almost No Chance" as 95% likely; a would-be Pratchett fan, perhaps). In biz, that's part of why I like to separate out goals ("we'll focus on growing traffic") and concrete objectives ("25% audience growth between now and June 1st").

patrickmay · 2h ago

But is it EXACTLY a million to one chance?

forrestthewoods · 2h ago

I hate “one in a million” because its meaning depends on how many times you’re rolling the die!

I’ll never forgot old World of Warcraft discussions about crit probability. If a particular sequence is “one in a million” and there are 10 million players and each player encounters hundreds or thousands of sequences per day then “one in a million” is pretty effing common!

gpcz · 2h ago

In functional safety, probabilities are usually clamped to an hour of use.

JadeNB · 1h ago

> I hate “one in a million” because its meaning depends on how many times you’re rolling the die!

I'd argue that it doesn't depend on that at all; its meaning is the same whether you're performing the trial once, a million times, ten million times, or whatever. It's just whether its implication is "the possibility may be disregarded" or "this should be expected to happen a few times" that depends on how many times you're performing the trial.

ModernMech · 1h ago

My feeling is it's a measure of the number of people who read the question wrong.

chipsrafferty · 40m ago

Why not just actually list the number you have in mind so everyone's on the same page "we consider it a serious possibility - about 60% - that bla bla bla"

tasuki · 18m ago

Because then it doesn't happen and (dumb) people will say "see you were wrong".

SoftTalker · 2h ago

I have a habit of saying "almost definitely" which I have tried to break but I still fall back to it occasionally. And I know several people who will say something is "definitely possible" or "certainly a possibility" or something along those lines. It's all insecure language we use to avoid making a statement that might turn out to be wrong.

rekenaut · 2h ago

I often say "definitely possible" when I am not sure what the chance of something happening is but I ought to acknowledge that it is possible. It is definitely possible that I should choose better language to communicate this.

smitty1e · 1h ago

When they won't quit asking, I'm "willing to commit to a definite maybe".

didgetmaster · 1h ago

"The odds are more like a million to one!"

"So...you're telling me there is a chance!"

jMyles · 3h ago

> Since then, some governments have tried to clean up the language of probability. After the Iraq War—which was influenced by misinterpreted intelligence

While I laud the gracious application of Hanlon's Razor here, I also think that, for at least some actors, the imprecision was the feature they needed, rather than the bug they mistakenly implemented.

mempko · 3h ago

It's strange to map language to probability ranges. The guidance should be to just say the probability range. No ambiguity. Clear. Actionable and also measurable.

throwaway81523 · 3h ago

It's still a subjective estimate, but Samosvety (predictor group) does seem to work that way, and HPMOR suggested something similar. Basically assign probabiltiies to less complex unknowns using numbers pulled out of your butt if that's all you can do. Then you can compute conditional probabilities of various more complicated events using those priors. At least then, a consistent set of numbers has carried through the calculation, even if those numbers were wrong at the outset. It's suppose to help your mental clarity. I guess you can also perturb the initial numbers to guess something like a hyperdistribution at the other end.

I haven't tried this myself and haven't run across a situation to apply it to lately, but I thought it was interesting.

andrewflnr · 3h ago

> a consistent set of numbers has carried through the calculation, even if those numbers were wrong at the outset

I kind of see how this might be useful, but what I've actually seen is an illusion of certainty from looking at numbers and thinking that means you're being quantitative instead of, you know, pulling things out of your butt. Garbage in, garbage out still applies.

throwaway81523 · 3h ago

Yes, the potential illusion is most dangerous if you show someone else the numbers and they take them seriously. If they're only for your own calculations then you can remember what they are made of.

plorkyeran · 2h ago

In practice people seem to be very bad at remembering that. Pretty universally people act as though doing math on made up numbers makes them less erroneous rather than more.

chrisweekly · 2h ago

Yeah, mistaking precision for accuracy is a common fallacy.

JadeNB · 1h ago

> Yeah, mistaking precision for accuracy is a common fallacy.

I remember an appealing description of the difference being that a precise archer might sink all their arrows at the same spot on the edge of the target, whereas an accurate archer might sink all their arrows near the bull's eye without always hitting the same spot.

widforss · 2h ago

That's the whole point of Fermi estimates. Find a plausible number given uncertain inputs.

Muromec · 3h ago

That's the other way around -- there was no probability range to begin with.

layer8 · 1h ago

How would you possibly measure the “Probability of an Invasion of Yugoslavia in 1951”, in March 1951?

konstantinua00 · 1h ago

9/12

3 months have passed, 9 to go :)

csours · 2h ago

Or use a histogram.

a3w · 1h ago

How was this not on lesswrong.com, they are all about ]0..1[

photochemsyn · 3h ago

This problem crops up everywhere, especially when it's a consequential claim. Eg when the US Department of Energy says with 'low confidence' that the Sars-COV2 outbreak and pandemic was 'most likely' was the result of a laboratory leak, what number does that translate to on the certainty scale?

Also, what likelihood can we assign to claims that the virus was deliberately modified at the furin cleavage site as part of a gain-of-function research program aimed at assessing the risks of species-jumping behavior in bat coronaviruses? This is a separate question from the lab escape issue, which dould have involved either a collected wild-type virus or one that had been experimentally modified.

Perhaps experts in the field 'misinterpreted the evidence' back in the early months of the pandemic, much as happened with the CIA and its 'intelligence on Iraq'?

https://interestingengineering.com/health/us-doe-says-covid-...

nightpool · 3h ago

I highly recommend that you read through https://www.astralcodexten.com/p/practically-a-book-review-r... and watch the underlying debate (starting here https://www.youtube.com/watch?v=Y1vaooTKHCM), it does a really good job of laying out the arguments for and against lab leak in a very thorough and evidence-based way like you're asking for here.

photochemsyn · 19m ago

"Viral" by Alina Chan and Matt Ridley is worth reading. But I don't think there's much doubt now that Sars-CoV2 was the result of reckless gain-of-function research conducted jointly between China's Wuhan Institute of Virology, America's Baric Lab in North Carolina, and facilitated by funding through EcoHealth Alliance, the NIH and the NIAID. Whoopsie.

Evolving OpenAI's Structure (openai.com)

Replacing Kubernetes with systemd (2024) (blog.yaakov.online)

Possibly a Serious Possibility (kucharski.substack.com)

Show HN: Real-time AI Voice Chat at ~500ms Latency (github.com)

Databricks in Talks to Acquire Startup Neon for About $1B (upstartsmedia.com)

"An independent journalist" who won't remain nameless (thehandbasket.co)

Show HN: VectorVFS, your filesystem as a vector database (vectorvfs.readthedocs.io)

Show HN: TextQuery – Query CSV, JSON, XLSX Files with SQL (textquery.app)

How are cyber criminals rolling in 2025? (vin01.github.io)

Technical analysis of the Signal clone used by Trump officials (micahflee.com)

Show HN: Tkintergalactic - Declarative Tcl/Tk UI Library for Python (github.com)

Why it is (nearly) impossible that we live in a simulation (arxiv.org)

Dimension 126 Contains Twisted Shapes, Mathematicians Prove (quantamagazine.org)

Kate and Python Language Server (akselmo.dev)

As an experienced LLM user, I don't use generative LLMs often (minimaxir.com)

The vocal effects of Daft Punk (bjango.com)

Show HN: Klavis AI – Open-source MCP integration for AI applications (github.com)

The Death of Daydreaming (afterbabel.com)

Instant (YC S22) Is Hiring a Founding TypeScript Engineer (instantdb.com)

Geometrically understanding calculus of inverse functions (2023) (tobylam.xyz)

A Tektronix TDS 684B Oscilloscope Uses CCD Analog Memory (tomverbeure.github.io)

History of "Adventure" for the Atari 2600 (atariarchive.org)

How Kim Jong Il Kidnapped a Director, Made a Cult Hit Godzilla Knockoff (2015) (vanityfair.com)

AWS Built a Security Tool. It Introduced a Security Risk (token.security)

You can't git clone a team (virtualize.sh)

Show HN: Bracket – selfhosted tournament system (github.com)

The Uncanny Mirror: AI, Self-Doubt, and the Limits of Reflection (lucidnonsense.net)

Show HN: Journelly for iOS: like tweeting but for your eyes only (in plain text) (xenodium.com)

The Beauty of Having a Pi-Hole (2024) (den.dev)

AI Meets WinDBG (svnscha.de)

Tuning Timbre Spectrum Scale (sethares.engr.wisc.edu)

Dreariness Index (2015) (us-climate.blogspot.com)

Modern LaTeX (github.com)

Distributed server for social and realtime games and apps (github.com)

No Instagram, no privacy (blog.wouterjanleys.com)

Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs (emergent-misalignment.com)

Internet usage pattern during power outage in Spain and Portugal (blog.akamai-mpulse.com)

V.S. Naipaul: The Grief and the Glory (granta.com)

Phoenician culture spread mainly through cultural exchange (mpg.de)

Show HN: My AI Native Resume (ai.jakegaylor.com)

I'd rather read the prompt (claytonwramsey.com)

Cursor hits $9B valuation (ft.com)

Show HN: Tired of bloated time trackers? Here's a dead-simple, free one I built (apps.apple.com)

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs (arxiv.org)

Circuitpainter: Create PCBs using a simplfiied graphics language (github.com)

Flies in the evidence room: Inside Belgium's rotting Justice Palace (ft.com)

The Creative Power of Constraints (arun.is)

Towards the Cutest Neural Network (kevinlynagh.com)

Digitization Complete for World-Renowned Franco Novacco Map Collection (newberry.org)

Apple Shortcuts is falling into "the automation gap" (sixcolors.com)

Possibly a Serious Possibility

Comments (58)