LLMs for Engineering: Teaching Models to Design High Powered Rockets

124 tamassimond 45 4/30/2025, 10:03:03 PM arxiv.org ↗

Comments (45)

Workaccount2 · 66d ago

My hypothesis is until they can really nail down image to text and text to image, such that training on diagrams and drawings can produce fruitful multi modal output, classic engineering is going to be a tough nut to crack.

Software engineering lends itself greatly to LLMs because it just fits so nicely into tokenization. Whereas mechanical drawings or electronic schematics are sort of more like a visual language. Image art but with very exacting and important pixel placement, with precise underlying logical structure.

In my experience so far, only O3 can kind of understand an electronic schematic, but really only at a "Hello World!" level difficulty. I don't know how easy it will be to get to the point where it can render a proper schematic or edit one it is given to meet some specified electronic characteristics.

There are programming languages that are used to define drawings, but the training data would be orders of magnitude less than what is written for humans to learn from.

heisenzombie · 66d ago

My experience is that SOTA LLMs still struggle to read even the metadata from a mechanical drawing. They're getting better -- they now are mostly ok at reading things like a BOM or revision table -- but moderately complicated title blocks often trip them up.

As for the drawings themselves, I have found them pretty unreliable at reading even quite simple things (i.e. what's the ID of the thru hole?), even when they're specifically dimensioned. As soon as spatial reasoning is required (i.e. there's a dimension from A to B and from A to C and one asks for the dimension B to C), they basically never get it right.

This is a place where there's a LOT of room for improvement.

Terr_ · 66d ago

I'm scared of something like the Xerox number-corruption bug [0], where some models will subtly fuck everything up in a way that is too expensive to recover from by the time it's discovered.

[0] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

flipflipper · 66d ago

Try having it output the circuit in SPICE. It actually works surprisingly well and does a good job picking out components values for parts and can describe the connectivity well. It falls apart when it writes the SPICE (professionally, there isn’t really one well accepted syntax really)and making the wires to connect your components, like you say missing the minds eye. But I can imagine adding a ton spice schematics with detailed descriptions with maybe an LLM optimized SPICE syntax to the training data set… it’ll be designing and simulating circuits in no time.

kurthr · 66d ago

Yeah, how to you thing that schematic is represented internally? How do you think the netlist is modeled? It's SPICE and HDL all the way down!

There are good reasons not to vibecode Verilog, but a lot of test cases are already being written by LLMs and the big EDA vendors (Cadence, Synopsys, Siemens) all tout their new AI capabilities.

It's like saying it can't read handwritten mathematical formulas, when it solves most math problems in markup (and if you aren't using it you're asking for trouble).

flipflipper · 65d ago

I brainfarted a bit and mixed up my attempts with making LTSPICE asc schematics (which are the text representations of the GUI sch, with wires) with the normal node based SPICE syntax. I just tried this specifically asking for spice to run with ngspice to run in a CLI. Seemed to run great! Going to play around with this for a bit now…

tintor · 66d ago

Problem #1 with text-to-image models is that focus is on producing visually attractive photo-realistic artistic images, which is completely orthogonal from what is needed for engineering: accurate, complete, self-consistent, and error-free diagrams.

Problem #2 is low control over outputs of text-to-image models. Models don't follow prompts well.

discordance · 66d ago

Mechanical drawings and schematics are visualizations for humans.

If you look at the data structure of a gerber or DWG, it’s vectors and metadata. These happen to be great for LLMs.

My hypothesis is that we haven’t done the work on that yet because the market is more interested in things like Ghibli imagery.

jayd16 · 66d ago

More like there isn't a resource of trillions of user generated schematics uploaded to the big tech firms that they can train on for free by skirting fair use laws.

danielbln · 66d ago

Ate you being facetious or is that really your hypothesis?

notahacker · 66d ago

Not the OP, but Ghibli imaging doesn't kill people or make things stop working if it falls into uncanny valley territory, so the bar for a useful product is lower than a "designer" based on a NN which has ingested annotated CAD files...

rjsw · 66d ago

Programming languages don't really define drawings. There are several standards for the data models behind the exchange file formats used in engineering though.

Someone could try training a LLM on a combination of a STEP AP242 [1] data model and sample exchange files, or do the same for the Building Information Model [2].

[1] http://www.ap242.org/ [2] https://en.wikipedia.org/wiki/Industry_Foundation_Classes

slicktux · 66d ago

Electrical schematics can be represented with linear algebra and Boolean logic… Maybe their being able to “understand” such schematics is just a matter of them becoming better at mathematical logic…which is pretty objective.

nyrikki · 66d ago

This paper works because it explicitly is a problem domain that was intentionally constrained to ensure safety in the Amateur high-power rocket hobby. Specifically with constraints and standards that were developed for teenagers of various skill to do with paper and pen well before they had access to containers. While modern applications have added more functions, those core constrains remain.

It works explicitly because it doesn't hit the often counter-intuitive limitations with generalization in pure math.

Remember that Boolean circuit satisfiability is NP-complete, and is beyond UHAT's + poly length CoT expressibility, which is capped at PTIME.

Even int logic with boolean circuits is in PSPACE.

When you start to deal with values, you are going to have to add in heuristics and/or find reductions that will cost your generalizability.

Even if you model analog circuits as finite labelled directed graphs with labelled vertices, similar to what Shannon used; removing some of the real world electrical impacts and focus on them as computational units, the complexity can get crazy fast.

Those circuits, with specific constraints (IIRC local feedback, etc..) can be simulated by a Turing machine, but require ELEMENTARY space or time, and despite it's name ELEMENTARY is iterated exponential: 2^2^2^2^2^...^n with k n's.

Also note that P/poly, viewed as problems that can be solved by small circuits is not a practical class and in fact contains all of the unary languages that we know are unsolvable by real computers in the general case.

That apparent paradox that P/poly, which has small bool circuits, also contains all of those undecidable unary languages is a good starter into that rat hole.

While we will have tools and models that are better at math logic, the constrains are actually limits on computation in the general case. Generalization often has these types of costs, and the RL benefits in this case relate to demonstrating that IMHO.

davemp · 66d ago

Not entirely true. Routing is a very important part of electrical schematics.

echoangle · 66d ago

Is it? Isn’t that more like PCB design? The schematic is just the abstract connection of components, right?

davemp · 66d ago

I would consider a PCB schematic to be part of an electrical schematic. Even if you don’t, you still have to consider final layout because some lines will need EMF protection. The linear equations and boolean algebra are just a (extremely useful) model after all.

imranq · 66d ago

You can describe a diagram with markdown like mermaid, so you can at least understand state changes and processes which are core to engineering.

neodypsis · 66d ago

Try one of the models with good vision capabilities and ask it to output code using build123d.

yieldcrv · 66d ago

Tell it how to read schematics in the prompt

akomtu · 66d ago

Imagine a fake engineer who read books about engineering as scifi, and thanks to his superhuman memory, he's mastered the engineer-speak so well that he sounds more engineery than top engineers in the world. Except that he has no clue about engineering and to him it's the same as literature or prose. Now he's tasked with designing a bridge. He pauses for a second and starts speaking, in his usual polished style: "sure, let me design a bridge for you." And while he's talking, he's starring at you with his perfect blank face expression, for his mind is blank as well.

Think of the absurdity of trying to understand the Pi number by looking at its first billion digits and trying to predict the next digit. And think of what it takes to advance from memorizing digits of such numbers and predicting continuation with astrology-style logic to understanding the math behind the digits of Pi.

DaiPlusPlus · 66d ago

> Think of the absurdity of trying to understand the Pi number by looking at its first billion digits and trying to predict the next digit. And think of what it takes to advance from memorizing digits of such numbers and predicting continuation with astrology-style logic to understanding the math behind the digits of Pi.

I'm prepared to believe that a sufficiently advanced LLM around today will have some "neural" representation of a generalization of a Taylor Series, thus allowing it to "natively predict" digits of Pi.

walleeee · 66d ago

Anthropic had a recent paper on why llms can't even get e.g. simple arithmetic consistently correct, much less generalize the concept of infinite series. The finding was that they don't find a way to represent the mechanics of an operation, they build chains of heuristics that sometimes happen to work.

discreteevent · 66d ago

> I'm prepared to believe that a sufficiently advanced LLM

This is the opposite of engineering/science. This is animism.

otabdeveloper4 · 66d ago

I want to believe, man. Just two more layers and this thing finally becomes a real boy.

kneegerman · 66d ago

Sometimes I feel this website, very much like LLMs themselves, prove that handling of language in general and purple prose in particular have absolutely no (as in 0) correlation with intelligence.

DaiPlusPlus · 66d ago

I suspect your definition of "intelligence" differs from mine.

weq · 66d ago

You have decribed enron musk perfectly without probably even meaning to. I concur that we have "software engineers" in every role at our tech company now that the general populous has learnt how to use chatgtp. This leads to some interesting conversations as above.

buescher · 66d ago

It's worse than that. Imagine he's consistently plausibly wrong about everything, but when you point that out, people think it's just sour grapes at how smart he is.

imtringued · 66d ago

That's not even the worst part. The worst part is that there are people who fit this description as well, and the singularity crowd anthropomorphizes the "human" flaws of the AI as proof of human level intelligence.

frumiousirc · 66d ago

A fundamental problem with this entire class of machine learning is that it is based on a model / simulation of reality. "RocketPy, a high-fidelity trajectory simulation library for high-power rocketry" in this case.

Nothing against this sim in particular but all such simulations that attempt to model any non-trivial system are imperfect. Nature is just too complex to model precisely and accurately. The LLM (or other DL network architecture) will only learn information that is presented to it. When trained on simulation the network can not help but infer incorrectly about messy reality.

For example, if RocketPy lacks any model of cross breezes, the network would never learn to design to counter them. Or, if it does model variable winds but does so with the wrong mean, or variance, or skew (of intensity, period, etc) the network can not properly learn and the design will not be optimal. The design will fail when it faces reality that differs from model.

Replace "rocket" with any other thing and you have AI/ML applied to science and engineering - fundamentally flawed, at least at some level of precision/accuracy.

At the least, real learning on reality is required. Once we can back-propagate through nature, then perhaps DL networks can begin to be actually trustworthy for science and engineering.

diggan · 66d ago

I don't think it's a "problem" as much as it is a "tradeoff". You basically have two approaches to take here: 1) try to simulate as best as you can, iteratively improve the simulation space after trying it out in real-life, and go back and forth or 2) skip the simulation step and do the same process but only in real-life, not having any simulation step at all and only rely on real scenarios, but few of them.

Considering how fast you can go with simulations vs real launches, I'm not surprised the took the first approach.

theptip · 66d ago

> A fundamental problem with this entire class of machine learning is that it is based on a model / simulation of reality… all such simulations that attempt to model any non-trivial system are imperfect

Depends on what your goal is. If you are trying to solve the narrow problem of rocketry or whatever, sure. But maybe not if your goal is making models smarter.

The broader context is that we need new oracles beyond math and programming in order to exercise CoT models on longer planning-horizon tasks.

In this case, if working with a toy world model lets you learn generalizable strategies (I bet it does, as video games do too) then this sort of eval can be a useful addition.

londons_explore · 66d ago

> all such simulations that attempt to model any non-trivial system are imperfect.

I believe the future of such simulation is to start from the lowest level - ie. schrodinger's equation, and get the simulator to derive all higher level stuff.

Obviously the higher level models are imperfect, but then it's the AI's job to decide if a pile of soil needs to be simulated as a load of grains of sand, or as crystals of quartz, or as atoms of silicon, or as quarks...

The AI can always check its answer by redoing a lower level simulation of a tiny part of the result, and check it is consistent with a higher level/cheaper simulation.

xigency · 66d ago

> I believe the future of such simulation is to start from the lowest level - ie. schrodinger's equation, and get the simulator to derive all higher level stuff.

I do hate to burst your bubble here but I've been doing real-time simulation (in the form of games, 2D, 3D, VR) for enough decades to know this is only a pipe-dream.

Maybe at the point when we have a Dyson sphere and have all universally agreed upon the principles that cause an airfoil to generate lift this would be possible, otherwise it's orders of magnitude beyond all of the terrestrial compute that we have now.

To quote Han Solo, the way we do effective and convincing science and simulation now is ... "a lot of simple tricks and nonsense."

londons_explore · 65d ago

I don't think it's a pipe dream from an 'amount of compute' perspective.

Any competent person can simulate 100 atoms in a crystal of some material, and say "whoa, it seems the bulk of this material behaves like a spring with f=kx, lets replace the individual atom simulation with a bulk simulation which is computationally far cheaper", and then we can simulate trillions of atoms.

I don't see why AI couldn't do the same.

xigency · 65d ago

One trillion atoms of the heaviest element is less than a nanogram. I get your point it's just that we can't even simulate all the blades of grass on a one acre lawn with every shortcut we have.

Really I think it would be cool to explore -- I've been working on a procedural game engine (conceptually at least) for a long time and want to incorporate even "basic" things like chemistry. I think it's still decades away for that, not even considering quantum phenomena.

1W6MIC49CYX9GAP · 66d ago

Accurate simulation is also an AI problem, but that should be a separate paper

tmaly · 56d ago

I am waiting for a new type of LLM that can understand primitives in either 2D or 3D and can construct vector art or 3D models.

I have seen some demos of Claude being connected to Blender etc. But when I dug into the code, it was using another LLM to generate the objects rather than building the objects from fundamental shapes.

simianwords · 66d ago

More evidence that we need fine tuned domain specific models. Some one should come up with a medical LLM fine tuned on a 640b model. What better RL dataset can you have than a patient with symptoms and the correct diagnosis?

FilosofumRex · 66d ago

Established engineering firms are trying to incorporate LLMs into their fancy simulation software, but that's counterproductive, just like professors who use LLMs to write new textbooks!

We need innovative disruptors to train LLMs to do engineering from ground up and to make calls to simulation software/routines when they need specialized/unique datapoints.

revskill · 66d ago

How about halting problem ? I see llm often got infinite recursive problem.

rel_ic · 66d ago

I think doing stuff like this probably has more downsides than upsides.

aaron695 · 66d ago

I think what might work is people coming together around this LLM like a God.

Similar to Rod of Iron Ministries (The Church of the AR-15) Taking what is says, fine tuning it, testing it, feeding back in and mostly waiting as LLMs improve.

LLMs will never be smarter than humans, but they can be a meeting place where people congregate to work on goals and worship.

Like QAnon, that's where the collective IQ and power comes from, something to believe in. At the micro level this is also mostly how LLMs are used in practical ways.

If you look to the Middle East there is a lot of work on rockets but a limited community working together.

otabdeveloper4 · 66d ago

Okay. As long as they don't start sacrificing virgins to the Prompt Gods.

Show HN: Ossia score – a sequencer for audio-visual artists (github.com)

Show HN: Unlearning Comparator, a visual tool to compare machine unlearning (gnueaj.github.io)

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics (alpha.lisagui.com)

Show HN: From Photos to Positions: Prototyping VLM-Based Indoor Maps (arjo129.github.io)

Show HN: Piano Trainer – Learn piano scales, chords and more using MIDI (github.com)

Show HN: NYC Subway Simulator and Route Designer (buildmytransit.nyc)

Show HN: Integrated System for Enhancing VIC Output (github.com)

Show HN: I Got Tired of Calculator Sites, So I Built My Own

Show HN: A simpler geofence reminder UI (apps.apple.com)

Show HN: Modernized file manager and program manager from Windows 3.x (github.com)

Show HN: Yoink AI – macOS AI app that writes everywhere (docs, browser, etc.) (useyoink.ai)

Show HN: A Language Server Implementation for SystemD Unit Files (github.com)

Show HN: A recursive DNS resolver written in Erlang (github.com)

Show HN: A browser extension that removes the algorithmic X 'For you' evil tab (github.com)

Show HN: Exploring emotional self-awareness via action-based journaling and AI

Show HN: Pixel Art Generator Using Genetic Algorithm (github.com)

Show HN: Simple wrapper for Chrome's built-in local LLM (Gemini Nano) (github.com)

Show HN: a community for collaborating on sideprojects (relentlessly.no)

Show HN: I made Logic gates using CSS if() function (yongsk0066.github.io)

Show HN: A tool that explains Python errors like you're five (github.com)

Show HN: GraphFlow – A lightweight Rust framework for multi-agent orchestration (github.com)

Show HN: Data Alchemy – Automated feature engineering with specialized AI agents (github.com)

Show HN: AirBending – Hand gesture based macOS app MIDI controller (nanassound.com)

Show HN: BunkerWeb – the open-source and cloud-native WAF (docs.bunkerweb.io)

Show HN: ParsePoint – AI OCR that pipes any invoice straight into Excel (parsepoint.app)

Show HN: I AI-coded a tower defense game and documented the whole process (github.com)

Show HN: uvtarget – a helpful utility to manage Python in CMake, powered by uv (github.com)

Show HN: HomeBrew HN – Generate personal context for content ranking (hackernews.coffee)

Show HN: Fast Thermodynamic Calculations in Python (dlr-institute-of-future-fuels.github.io)

Show HN: A cross-platform terminal emulator written in Java (github.com)

Show HN: A continuation of IRS Direct File that can be self-hosted (github.com)

Show HN: Jobs by Referral: Find jobs in your LinkedIn network (jobsbyreferral.com)

Show HN: AI-Powered SLA Breach Predictor for Jira (Open Source, Python) (github.com)

Show HN: BreakerMachines – Modern Circuit Breaker for Rails with Async Support (github.com)

Show HN: I made a crowd counting game (crowdle.xyz)

Show HN: Pelyos – A calm, minimal task manager for clarity and focus (pelyos.app)

Show HN: I made a Duolingo but for Investing (with a Simplified Trading Window) (getmomoney.app)

Show HN: LLML: Data Structures => Prompts

Show HN: I built a website to summarize Tech Twitter each day (todayontechtwitter.com)

Show HN: I made this free tool in 4 days to make your screenshots 10x better. (shotcanvas.com)

Show HN: CSS generator for a high-def glass effect (glass3d.dev)

Show HN: MCP-123, a 2-line MCP server/client (Windows-friendly) (github.com)

Show HN: Kuqu: SQL for Kubernetes Resources (github.com)

Show HN: Guess the Sharpe (guessthesharpe.in)

Show HN: Comically – TUI manga and comic optimizer for e-readers (github.com)

Show HN: ASCII Fireworks (asciifireworks.com)

Show HN: Tinykv – Minimal file-backed key-value store for Rust (crates.io)

Show HN: I built the tool I wished existed for moving Stripe between countries (stripemove.com)

Show HN: Kuvasz – an open-source uptime and SSL monitoring service (kuvasz-uptime.dev)

Show HN: Chat Capsule – Convert ChatGPT Chats to Markdown (For Notion, etc.) (chat-capsule.com)

LLMs for Engineering: Teaching Models to Design High Powered Rockets

Comments (45)