My hypothesis is until they can really nail down image to text and text to image, such that training on diagrams and drawings can produce fruitful multi modal output, classic engineering is going to be a tough nut to crack.
Software engineering lends itself greatly to LLMs because it just fits so nicely into tokenization. Whereas mechanical drawings or electronic schematics are sort of more like a visual language. Image art but with very exacting and important pixel placement, with precise underlying logical structure.
In my experience so far, only O3 can kind of understand an electronic schematic, but really only at a "Hello World!" level difficulty. I don't know how easy it will be to get to the point where it can render a proper schematic or edit one it is given to meet some specified electronic characteristics.
There are programming languages that are used to define drawings, but the training data would be orders of magnitude less than what is written for humans to learn from.
heisenzombie · 2h ago
My experience is that SOTA LLMs still struggle to read even the metadata from a mechanical drawing. They're getting better -- they now are mostly ok at reading things like a BOM or revision table -- but moderately complicated title blocks often trip them up.
As for the drawings themselves, I have found them pretty unreliable at reading even quite simple things (i.e. what's the ID of the thru hole?), even when they're specifically dimensioned. As soon as spatial reasoning is required (i.e. there's a dimension from A to B and from A to C and one asks for the dimension B to C), they basically never get it right.
This is a place where there's a LOT of room for improvement.
tintor · 2h ago
Problem #1 with text-to-image models is that focus is on producing visually attractive photo-realistic artistic images, which is completely orthogonal from what is needed for engineering: accurate, complete, self-consistent, and error-free diagrams.
Problem #2 is low control over outputs of text-to-image models. Models don't follow prompts well.
slicktux · 2h ago
Electrical schematics can be represented with linear algebra and Boolean logic…
Maybe their being able to “understand” such schematics is just a matter of them becoming better at mathematical logic…which is pretty objective.
davemp · 28m ago
Not entirely true. Routing is a very important part of electrical schematics.
Software engineering lends itself greatly to LLMs because it just fits so nicely into tokenization. Whereas mechanical drawings or electronic schematics are sort of more like a visual language. Image art but with very exacting and important pixel placement, with precise underlying logical structure.
In my experience so far, only O3 can kind of understand an electronic schematic, but really only at a "Hello World!" level difficulty. I don't know how easy it will be to get to the point where it can render a proper schematic or edit one it is given to meet some specified electronic characteristics.
There are programming languages that are used to define drawings, but the training data would be orders of magnitude less than what is written for humans to learn from.
As for the drawings themselves, I have found them pretty unreliable at reading even quite simple things (i.e. what's the ID of the thru hole?), even when they're specifically dimensioned. As soon as spatial reasoning is required (i.e. there's a dimension from A to B and from A to C and one asks for the dimension B to C), they basically never get it right.
This is a place where there's a LOT of room for improvement.
Problem #2 is low control over outputs of text-to-image models. Models don't follow prompts well.