Why deterministic output from LLMs is nearly impossible

17 naren87 12 8/11/2025, 5:42:43 PM unstract.com ↗

Comments (12)

kazinator · 5h ago

This is a SaaS problem, not a LLM problem. If you have a local LLM that nobody is upgrading behind your back, it will calculate the same thing on the same inputs. Unless there is a bug somewhere, like using uninitialized memory, the flaoting-point calculations and the token embedding and all the rest do the same thing each time.

Cilvic · 4h ago

So could SaaS LLM or cloud/api LLMs not offer this as an option? A guarantee that the "same prompt" will always produce the same result.

Also the way I usually interpret this "non-deterministic" a bit "broader".

Say i have have slightly different prompts "what's 2+2?" vs. "can you please tell me what's 2 plus 2" or even "2+2=?" or "2+2" for most applications it would be useful if they all produce the same result

alphan0n · 3h ago

The form of the question determines the form of the outcome, even if the answer is the same. Asking the same question in a different way should result in the adherence to the form of the question.

2+2 is 4

2 plus 2 is 4

4=2+2

Having the LLM pass the input to a tool (python) will result in deterministic output.

nativeit · 1h ago

Doesn’t that imply that LLMs are just “if then, then that” but bigger?

ezst · 1h ago

Sure, why would you expect it to be different?

lsy · 3h ago

There are two additional aspects that are even more critical than the implementation details here:

- Typical LLM usage involves the accretion of context tokens from previous conversation turns. The likelihood that you will type prompt A twice but all of your previous context will be the same is low. You could reset the context, but accretion of context is often considered a feature of LLM interaction.

- Maybe more importantly, because the LLM abstraction is statistical, getting the correct output for e.g. "3 + 5 = ?" does not guarantee you will get the correct output for any other pair of numbers, even if all of the outputs are invariant and deterministic. So even if the individual prompt + output relationship is deterministic, the usefulness of the model output may "feel" nondeterministic between inputs, or have many of the same bad effects as nondeterminism. For the article's list of characteristics of deterministic systems, per-input determinism only solves "caching", and leaves "testing", "compliance", and "debuggability" largely unsolved.

jqpabc123 · 4h ago

Probabilistic processes are not the most appropriate way to produce deterministic results. And definitely not if the system is designed to update, grow or "learn" from inputs.

redsymbol · 3h ago

There may be something I do not understand about LLMs. But it seems it is more correct to say LLMs are chaotic - in the mathematical sense of sensitive dependence on initial conditions.

The only actual nondeterminism is deliberately injected. E.g. the temperature parameter. Without that, it is deterministic but chaotic. This is the case both in training LLMs, and in using the trained models.

If I missed something, someone point it out please.

trod1234 · 2h ago

You aren't understanding the properties of Determinism, and many people even graduates of a Computer Science programe often don't have a working knowledge of this (the most competent do).

Its more correct to say that determinism occurs because the mathematical property is preserved or closed under its domain and the related operations. This connection becomes clear once you've taken an abstract algebra (modern algebra) course. It was a critical leap towards computers, based in the design of emergent systems.

The property can be broken quite easily by not preserving it, but then you have no way to tell the reliability of the output from randomness thereafter, and there is no concept of correctness in stochastic environments (where one token can be more than one token, and are not 'unique').

To put it plainly, Determinism is mathematical relabeling (i.e. a function test on the domain of operations that are performed).

While the constraints hold true, and the ISA and related stack maintain those constraints (i.e. are closed over those operations), you get reliable consistency. The property acts as an abstract guide rail to do work, which is how such simple combinations of circuit logic controlled by software can perform all the magical things we do and see today.

Time Invariance usually goes hand-in-hand with Determinism, and is needed for troubleshooting, and that usually requires memory-less properties, though it depends on where you are on the stack. Determinism is required for any automatic layer for reliability, and that is over the entire domain of possible things that can happen. Without Determinism, you run into halting and incompletness problems in classical Computer Science which have stood a good long test of time.

Error handling also generally stops working because you need to know and specify a state to match in order to handle a state, and that requires a determinable state in the first place.

A mapping of one unique input to one unique output, and projection onto are required for relabeling. The electronics are designed to preserve the property up to the logic layer.

The moment you have a unique item which is not actually unique, this is broken, and its real subtle. ldd for linux for example has two different but similar such types of these errors that remain unfixed (for over 10 years) because they weren't viewed as errors by the maintainers. This is to say that even long-term professional programmers (likely non-engineers) often lack in recognizing these types of foundational errors.

The result is the output of that utility prevents useful passing to any further automation because of the non-deterministically structured output. Specifically, the null token, and in-memory kernel structure tokens. Regex also requires these properties. You'll find there is at least one easily found instance of ldd on the ssh utility where you can't simply grep -ev to separate or filter material (to try and pigeonhole the output into a deterministic state), and even adding a DFA program sequentially can't be done to reverse this; a patch must occur at the point of error.

These crop up in production automation all the time, and usually are the most costly to fix given the required expertise needed to recognize the error. If determinism isn't present, no automation further downstream can be guaranteed to work. Determinism lets you constrain or expand the scope of a system in systems to narrow and home in on where the failure occurs.

Troubleshooting is an abstract application of testing for determinism, and you can easily tell when a problem won't have this tool available by probing inputs and outputs. In the absence of this property, you only have guess and check which requires intimate knowledge of the entirety of the system at all levels of abstraction. This is most costly in time given such documentation is almost never available.

As a final real-world example, consider an excel roster of employees at a large company, where you are only given the name of a person to shut down their account. What do you do when one person has the same name? What can you do without further input? Nothing. If you shut down both accounts, your fired, if you shut down the wrong account your fired, you have an indeterminable state.

The interactive layer is a lot more forgiving than the automatic layer because people can recognize when we need to get or provide more information.

Hopefully this clarifies your understanding.

kbelder · 1h ago

I don't see anything you said that indicates the OP was incorrect in any way.

trod1234 · 1h ago

If that is the case, then you didn't read or comprehend what was actually said, and no one can tailor a response to people who can't read and comprehend.

There are important distinctions, its beyond the scope for me to try and guess at where that failure of comprehension might be for an individual such as yourself.

Basic reading comprehension would note: Properties are not individual inputs, they apply to the whole system as a relationship between input and output, individual inputs cannot define properties.

"Chaos" has a very rigorous definition (changes in small inputs lead to large changes in outputs).

"Injection of non-determinism" is only correct if it included a reference that determinism is built-in to all computation which is not a common understanding. Without that reference, the context improperly includes an indeterminable indirection resulting in fallacy.

The two are unrelated and independent to the context of the conversation or determinism, and so defining such understanding in those terms would result in fallacy (by improper isolation), delusion, or hallucination.

These are fundamental errors in reasoning and by extension understanding.

The correct, on firm foundations understanding, was provided. It is on the individual without knowledge to come into a conversation with the bare minimum requirements for comprehension based in rational thought and practice.

Edit: No amount of down-voting will change the truth of this, though I understand why someone would want useful knowledge to be hidden.

tacker2000 · 48m ago

I downvoted you because your tone is unnecessarily harsh and rude.

Ask HN: Why Is My Happiness Tied to My Productivity?

Snapchat open source cross-platform mobile framework. Looking for beta testers

We'll need a universal basic income (UBI) in an AI-driven world

GitHub Outage?

Tell HN: Regulations.gov Comments API is shutting down on Friday

Google's RCS disconnected in several countries

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Ask HN: What toolchains are people using for desktop app development in 2025?

Ask HN: With all the AI hype, how are software engineers feeling?

Ask HN: What trick of the trade took you too long to learn?

Comparing 6M Feature Selection Methods on Credit Risk Data

Ask HN: What tech skill gave you the biggest boost in your career?

Ask HN: Has anyone built anything useful using AI?

Tell HN: Anthropic expires paid credits after a year

Vectorless: open-source PDF chatbot without RAG

Ask HN: Has any of the Pivotal Tracker replacement attempts succeeded?

Ask HN: What are some comfy/stress-free jobs a SWE can do? (LCOL country)

What's your favorite CLI tool for integrating LLMs into your terminal workflow?

Ask HN: Advice for someone who wants to try AI-assisted coding?

Ask HN: Canadian founders, how do you build in SF?

Does anyone know a detailed residential cost estimator

Ask HN: Why is Usenet not coming back?

Ask HN: Best way to get a land line for my kids?

Ask HN: What do you dislike about ChatGPT and what needs improving?

Ask HN: What's Going on with AI Psychosis?

Feature Request: "Copy" Button Should Copy Only Main Output

GPT5 is worse than 4.1-mini for text and worse than Sonnet 4 for coding

ChatGPT 5 is slow and no better than 4

Ask HN: How would you build second brain in the AI era?

Ask HN: In which programming language is it better to make your own language?

Ask HN: Do you think differently about working on open source these days?

ChatGPT-5 Can't Do Basic Math

Tell HN: Charles Irby has passed away

GPT-5 streaming requires submission of biometric data

Ask HN: Are you running local LLMs? What are your key use cases?

Ask HN: OpenAI GPT-5 API seems to be significantly slower – is this expected?

Why deterministic output from LLMs is nearly impossible

Comments (12)