Using elliptic curves to solve a math meme (artofproblemsolving.com)

Does anybody else find it peculiar that the majority of these articles about AI say things like "of course I don't doubt that AI will lead to major discoveries", and then go on to explain how they aren't useful in any field whatsoever?

Where are the AI-driven breakthroughs? Or even the AI-driven incremental improvements? Do they exist anywhere? Or are we just using AI to remix existing general knowledge, while making no progress of any sort in any field using it?

strogonoff · 3h ago

There is rarely a constructive discussion around the term “AI”. You can’t say anything useful about what it might lead to or how useful it might be, because it is purely a marketing term that does not have a specific meaning (neither do both of the words in its abbreviation).

Interesting discussions tend to avoid “AI” in favour of specific terms such as “ML”, “LLM”, “GAN”, “stable diffusion”, “chatbot”, “image generation”. These terms refer to specific tech and applications of that tech, and allow to argue about specific consequences for sciences or society (use of ML in biotech vs. proliferation of chatbots).

However, certain sub-industries prefer “AI” precisely because it’s so vague, offers seemingly unlimited potential (please give us more investment money/stonks go up), and creates a certain vibe of a conscious being useful when pretending not to be working around IP laws and creating tools based on data obtained without relevant licensing agreements (cf. the countless “if humans have the freedom to read, therefore it’s unfair to restrict the uses of a software tool” fallacies, often perpetuated even by seemingly technically literate people, in pretty much every relevant forum thread).

roenxi · 14m ago

Also, the strong predictions about AI are using a vague term because the tech often doesn't exist yet. There isn't a chatbot right now that I feel confident can out-perform me at systems design but I'm pretty certain something that can is coming. Odds are also good that in 2-4 years there will be new hotness to replace LLMs that are much more functional (maybe MLLMs, maybe called something else). We can start to predict and respond to their potential even though they don't exist yet; it just takes a little extrapolating. But it doesn't have a name yet.

Which is to agree - obviously if people are talking about "AI" they don't want to talk about something that exists right this second. If they did it'd be better to use a precise word.

Closi · 44m ago

I think AI is a useful term which usually means a neural network architecture but without specifying the exact architecture.

I think Machine Learning doesn't mean this as a word, as it can also refer to linear regression, non-linear optimisation, decision trees, bayesian networks etc.

That's not saying that AI isn't abused as a term - but I do think a more general term to describe the latest 5 years advancements in neural networks to solve problems is useful. Particularly as it's not obvious which model architectures would apply to which fields without more work (or even if novel architectures will be required for frontier science applications).

mnky9800n · 2h ago

This article is all about PINNs being overblown. I think it’s a reasonable take. I’ve seen way too many people dump all their eggs in the PINNs basket when there are plenty of options out there. Those options just don’t include a ticket to the hype train.

simianparrot · 4h ago

It’s why it keeps looking exactly like NFT’s and crypto hype cycles to me: Yes the technology has legitimate uses, but the promises of groundbreaking use cases that will change the world are obviously not materialising and to anyone that understands the tech it can’t.

It’s people making money off hype until it dies and move on to the next scam-with-some-use.

Flamentono2 · 3h ago

We already have breakthroughs. Benchmark results which have been unheard of before ML.

Alone language translation got so much better, voice syntesis, voice transcription.

All my meetings now are searchable and i can ask 'ai' to summarize my meetings in a relative accurate way impossible before that.

Alphafold made a breakthrough in protein folding.

Image and Video generation can now do unbelievable things.

Realtime voice communication with computer.

Our internal company search suddenly became usefull.

I have 0 use case for NFT and Crypto. I have tons of use case for ML.

Yoric · 2h ago

That is absolutely correct.

The problem is that the hype assumes that all of this is a baseline (or even below the baseline), while there are no signs that it can go much further in the near future – and in some cases, it's actually cutting-edge research. This leads to a pushback that may be disproportionate.

vanattab · 49m ago

Which ai program do you use for live video meeting translation?

exe34 · 2h ago

You have to understand, real AI will never exist. AI is that which a machine can't do yet. Once it can do it, it's engineering.

No comments yet

uludag · 2h ago

I'm sure there's many people out there who could say that they hardly use AI but that crypto has made them lots of money.

At the end of the day searching work documents and talking with computers is only desirable inasmuch as they are economically profitable. Crypto at the end of the day is responsible for a lot of people getting wealthy. Was a lot of this wealth obtained on sketchy grounds? probably, but the same could be said AI (for example, the recent sale of windsurf for an obscene amount of money).

Flamentono2 · 5m ago

Crypto is not making people rich, it is about moving money from Person A to Person B.

And sure everyone who got the money from others by gambling are biased. Fine with me.

But in comparision to crypto, people around me actually use AI/ML (most of them).

littlestymaar · 2h ago

> is only desirable inasmuch as they are economically profitable.

The bug difference is that they are profitable because they create value, when cryptocurrencies are a zero sum game between participants. (It is in fact a negative-sum game, since some people are getting paid to make the thing work so that others can gamble on the system).

StopDisinfo910 · 3h ago

I don’t remember when NFTs and cryptos helped me draft an email, wrote my meetings minutes for me or allowed me to easily search information previously locked in various documents.

I think there is this weird take amongst some on HN where LLMs are either completely revolutionary and making break through or utterly useless.

The truth is that they are useful already as a productivity tool.

ktallett · 3h ago

The hype surrounding them is not as a pa and tbh a lot of these use cases already have existing methods that work just fine. There are ways to find key information in files already, and speedy meeting minutes is really just a template away.

Flamentono2 · 3h ago

Absolutly not true

I was not able to get meeting transcription in that quality that cheap ever before. I followed dictation software for over a decade and tx to ML the open source software is suddenly a lot better than ever before.

Our internal company search with state of the art search indexes and search software was always shit. Now i ask an agent about a product standard and it just finds it.

Image generation never existed before.

Building a chatbot in a way that it actually does what you expect and its more complicated than answering the same 10 theoretical features it can do was hard and never really good and it now just works.

Im also not aware of any software rewriting or even writing documents for me, structer them etc.

ktallett · 3h ago

A lot of these issues you have had are simply user error or not using the right tool for the job.

jodrellblank · 2h ago

You know what they were doing and what tools they were using… how?

ktallett · 2h ago

Ok take Transcription, they were trying to use free as in cost tools instead of using software that works efficiently that has been effective for decades now.

StopDisinfo910 · 2h ago

Well, LLMs are the right tool for the job. They just work.

I mean if you are going to deny their usefulness in the face of plenty of people telling you they actually help, it’s going to be impossible to have a discussion.

ktallett · 2h ago

They can be useful, however for admin tasks, there are plenty of valid alternatives that really take no longer time wise so why bother using all that computing power.

They don't just work though, they are not fool proof and definitely require double checking.

StopDisinfo910 · 1h ago

> valid alternatives that really take no longer time wise

That’s not my experience.

We use them more and more at my job. It was already great for most office tasks including brainstorming simple things but now suppliers are starting to sell us agents which pretty much just work and honestly there are a ton of things for which LLMs seem really suited for.

CMDB queries? Annoying SAP requests for which you have to delve through dozens of menus? The stupid interface of my travel management and expense software? Please give me a chatbot for all of that which can actually decipher what I’m trying to do. These are hours of productivity unlocked.

We are also starting to deploy more and more RAG on select core business dataset and it’s more useful than even I anticipated and I’m already convinced. You ask, you get a brief answer and the documents back. This used to be either hours of delving through search results or emails with experts.

As imperfect as they are now, the potential value of LLMs is already tremendous.

ktallett · 1h ago

How do you check accuracy of these? You stated brainstorming as an example that they are great at. As obviously experts are experts for a reason.

My issue here is that a lot of this is solved by good practice, for example,travel management and expenses have been solved, company credit card. I don't need one slightly better piece of software to manage one terrible piece of software to solve an issue that has a solution.

StopDisinfo910 · 2h ago

Microsoft is absolutely selling them as pa and already selling a lot. I think HNers being mostly software developers live in a bubble when it comes to the reality of what LLMs are actually used for.

Speedy minutes are absolutely not a template away. Anyone who ever had to write minutes for a complicated meetings knows it’s hard and requires a lot of back and forth for everyone to agree about what was said and decided.

Now you just turn on Copilot and you get both a transcript and an adequate basis for good minutes. Bonus point: it’s made by a machine so no one complains it has bias.

Some people here are blind to how useful that is.

IanCal · 2h ago

There are so many tasks in the world that

1. Involve a computer

2. Do not require incredible intelligence

3. Involve the messiness of the real world enough that you can't write exact code to do it without it being insanely fragile

LLMs suddenly start to tackle these, and tackle them kind of all at once. Additionally they are "programmed" in just English and so you don't need a specialist to do something like change the tone of the summary or format, you just write what you want.

Assuming the models never get any smarter or even cheaper, and all we get is neater integrations, I still think this is all huge.

ktallett · 2h ago

Do you really believe the outlay in terms of computer power is worth it to change the tone of an email? If it never gets better, this is a vast waste of an enormous amount of resources.

IanCal · 34m ago

That's not what I've talked about them being for, but regardless it depends on the impact surely. If it can show you how someone may misunderstand your point and either help correct it or just show the problem then yes that can easily be worth spending a few cycles on. The additional energy cost of further back and forths caused by a misunderstanding could very easily be higher. At full whack, my GPU draws something like 10x what my monitor does, so fixing something quickly and automatically can easily use less power than doing it automatically.

Again though, that's not at all what I've talked about.

ktallett · 2h ago

This is a business practice issue and staff issue, not a meeting minutes issue. I have meetings daily, and have never had this issue. You make it clear what is decided during the meeting, give anyone a chance to query or question, then no one can argue.

rkuodys · 3h ago

What I see LLMs at this point is simplified input and output solutions with reduced barriers of entry. So application could become more widespread.

Now that I think of it, maybe this AI era is not electricity, but rather GUI - like the time when Jobs(or whoever) figured out and adopted modern GUI on computers allowing more widespread uses of computer

flir · 1h ago

It's a good analogy because the key development does seem to have been the interface. Instead of wrapping it up as a text autocomplete (a la google search), openai wrapped it up as an IM client, and we were off to the races.

ktallett · 3h ago

Do they only have reduced barriers of entry if you aren't fussed about the accuracy of the output? If you care that everything works correctly and is factually correct, do you not need the same competency as just doing the task by hand.

cornholio · 2h ago

For now, the reasoning abilities of the best and largest models are somewhat on par with those of a human crackpot with an internet connection, that misunderstands some wild fact or theory and starts to speculate dumb and ridiculous "discoveries". So the real world application to scientific thought is low, because science does not lack imbeciles.

But of course, models always improve and they never grow tired (if enough VC money is available), and even an idiot can stumble upon low hanging fruits overlooked by the brightest minds. This tireless ability to do systematic or brute-force reasoning about non-frontier subjects is bound to produce some useful results like those you mention.

The comparison with a pure financial swindle and speculative mania like NFTs is of course an exaggeration.

Xmd5a · 1h ago

I see myself in these words:

>that misunderstands some wild fact or theory and starts to speculate dumb and ridiculous "discoveries"

>even an idiot can stumble upon low hanging fruits overlooked by the brightest minds.

lazide · 3h ago

Having tried to use various tools - in those specific examples - I found them either pointless or actively harmful.

Writing emails - once I knew what I wanted to convey, the rest was so trivial as to not matter, and any LLM tooling just got in the way of actually expressing it as I ended up trying to tweak the junk it was producing.

Meeting minutes - I have yet to see one that didn’t miss something important while creating a lot of junk that no one ever read.

And while I’m sure someone somewhere has had luck with the document search/extract stuff, my experience has been that the hard part was understanding something, and then finding it in the doc or being reminded of it was easy. If someone didn’t understand something, the AI summary or search was useless because they didn’t know what they were seeing.

I’ve also seen a LOT of both junior and senior people end up in a haze because they couldn’t figure out what was going on - and the AI tooling just allowed them to produce more junk that didn’t make any sense, rather than engage their brain. Which causes more junk for everyone to get overwhelmed with.

IMO, a lot of the ‘productivity’ isn’t actually, it’s just semi coherent noise.

silon42 · 3h ago

> Writing emails - once I knew what I wanted to convey, the rest was so trivial as to not matter, and any LLM tooling just got in the way of actually expressing it as I ended up trying to tweak the junk it was producing.

+1 LLM will help you produce the "filler" nobody wants the read anyway.

denvrede · 2h ago

+1 for all of the above.

> Meeting minutes - I have yet to see one that didn’t miss something important while creating a lot of junk that no one ever read.

Especially that one. In the beginning for very structured meetings with a low number of participants it seemed to be ok but once they got more crowded, maybe not all are native speakers and took longer than 30 minutes (like workshops) it went bad.

apwell23 · 2h ago

> wrote my meetings minutes

why is this such a posterchild for llms. everyone always leads with this.

how boring are these meetings and do ppl actually review these notes? i never ever saw anyone reading meeting minutes or even mention them.

Why is this usecase even mentioned in LLM ads.

batty_alex · 1h ago

I think the same thing every time. I've never had anyone read my meeting notes and they're better off in some sort of work order system anyways.

All I'm hearing is an appeal to making the workplace more isolating. Don't talk to each other, just talk to the machine that might summarize it wrong

SiempreViernes · 59m ago

Indeed, it seems doubtful that an org having so structurless meetings that they are struggling to write minutes is capable of having meetings for which minutes serves any purpose beyond covering ass.

bgnn · 3h ago

Exactly this. What we expect from them is our speculation. In reality nobody knows the future and there's no way to know the future.

jstummbillig · 2h ago

I think people are mostly bad at value judgements, and AI is no exception.

What they naively wished the future was like: Flying cars. What they actually got (and is way more useful but a lot less flashy): Cheap solar energy.

aleph_minus_one · 1h ago

> What they naively wished the future was like: Flying cars.

This future is already there:

We have flying cars: they are called "helicopters" (see also https://xkcd.com/1623/).

AstralStorm · 1h ago

Oh they're not even close in availability as cars, much harder to operate, much more expensive and tend to fall out of the sky.

Thank you for providing an example that directly maps to usefulness of ANN in most research though.

Voloskaya · 3h ago

> to anyone that understands the tech it can’t.

This is a ridiculous take that makes me think you might not « understand the tech » as much as you think you do.

Is AI useful today ? That depends on the exact use case but overall it seems pretty clear the hype is greater than the use currently. But sometimes I feel like everyone forgets that ChatGPT isn’t even 3 years old, 6 years ago we were stuck with GPT-2 whose most impressive feat was writing a non sense poem about a unicorn, and AlphaGo is not even 10 years old.

If you can’t see the trend and just think that what we have today is the best we will ever achieve, thus the tech can’t do anything useful, you are getting blinded by contrarianism.

helloplanets · 19m ago

I'd be interested in reading some more from the people you're referring to when talking about experts who understand the field. At least to the extent I've followed the discussion, even the top experts are all over the place when it comes to the future of AI.

As a counterpoint: Geoffrey Hinton. You could say he's gone off the deep end on a tangent, but I definitely don't his incentive is to make money off of hype. Then there's Yann LeCun saying AI "could actually save humanity from extinction". [0]

Are these guys just washed out talking heads at this point, and who are the "new guard" who people should read up on?

[0]: https://www.theguardian.com/technology/2024/dec/27/godfather...

guardian5x · 3h ago

AI looks exactly like NFTs to you? I don't understand what you mean by that. AI already has tons more uses.

waldrews · 3h ago

One is a technical advance as important as anything in human history, realizing a dream most informed thinkers thought would remain science fiction long past our lifetimes, upending our understanding of intelligence, computation, language, knowledge, evolution, prediction, psychology... before we even mention practical applications.

The other is worse than nothing.

littlestymaar · 2h ago

> It’s why it keeps looking exactly like NFT’s and crypto hype cycles to me: Yes the technology has legitimate uses

AI have legitimate uses, cryptocurrency only has “regulations evasion” and NFT has literally no use at all, though.

But that's very true that the AI ecosystem is crowded with grifters who feed on baseless hype, and many of them actually come from cryptocurrencies.

montebicyclelo · 2h ago

> then go on to explain how they aren't useful in any field whatsoever

> Where are the AI-driven breakthroughs

> are we just using AI to remix existing general knowledge, while making no progress of any sort in any field using it?

The obvious example of a highly significant AI-driven breakthrough is Alphafold [1]. It has already had a large impact on biotech, helping with drug discovery, computational biology, protein engineering...

[1] https://blog.google/technology/ai/google-deepmind-isomorphic...

swyx · 3h ago

> Where are the AI-driven breakthroughs? Or even the AI-driven incremental improvements?

literally last week

https://deepmind.google/discover/blog/alphaevolve-a-gemini-p...

dwroberts · 3h ago

But it only seems to be labs and companies that also have a vested interest in selling it as a product that are able to achieve these breakthroughs. Which is a little suspect, right?

swyx · 2h ago

too tinfoil hat. google is perfectly happy to spend billions dogfooding their own TPUs and not give the leading edge to the public.

dwroberts · 1h ago

I’m not saying they’re phoney - just we need to take this stuff with a big pinch of salt.

The Microsoft paper around the quantum “ breakthrough ” is in a different field, but maybe a good example of why we need to be a little more cautious of research-as-marketing

Wilsoniumite · 2h ago

Some new ish maths has been discovered. It's up to you if this is valid or impressive enough, but I think it's significant for things to come: https://youtu.be/sGCmu7YKgPA?si=EG9i0xGHhDu1Tb0O

snodnipper · 1h ago

Personally, I have been very pleased with the results despite the limitations.

Like many (I suspect), I have had several users provide comments that the AI processes I have defined have made meaningful impacts on their daily lives - often saving them double digit hours of effort per week. Progress.

eschaton · 4h ago

If they didn’t say that the rah-rah-AI crowd would come for them with torches and pitchforks. It’s a ward against that, nothing more.

Sharlin · 2h ago

Similar to the way many Trump supporters, when daring to criticize him, feel the need to assert that they still love him and would vote for him again.

(See, eg. r/LeopardsAteMyFace for examples. It’s fascinating.)

NotCamelCase · 51m ago

Or any time one dares to criticize Israel for their recent contributions to peace on Earth (wink wink) -- it has to be prefaced with "Let me say that I'm the biggest defender of the Jews and fight against anti-Semitism".

It's moot.

isaacfrond · 2h ago

The article itself lists as successful, even breakthrough, applications of AI: protein folding, weather forecasting, and drug discovery.

PurpleRamen · 21m ago

> Where are the AI-driven breakthroughs?

Define breakthrough. When is the improvement big enough to count as one?

Define AI. Are you talking about modern LLM, or is old school ML also in that question?

I mean Googles AI-company had with AlphaFold and other project quite the impact.

> Or are we just using AI to remix existing general knowledge

Is remixing bad? Isn't many science today "just" remixing with slight improvements? I mean, there is a reason why we have theoretical and practical scientists. Doing boring Lab-work and accidentally discovering something exciting is not the only way science is happening. Analysing data and remixing information, building new theories, is also important.

And don't forget, we don't have AGI yet. Whatever AI is doing today, is limited by what humans are using it for. Another question is, whether LLM is not normalized enough already that we do not see it as very special any more, if it's used somewhere. So we might not even see it if AI has significant impact on any breakthrough.

nerdponx · 55m ago

New numerical computing algorithms are being developed with AI assistance, which probably would not have been discovered otherwise. There was an article here a few days ago about one of those. It's incremental but it's not nothing.

perlgeek · 3h ago

> Where are the AI-driven breakthroughs?

The only thing that seems to live up to the hype is AlphaFold, which predicts protein folding based on amino acid sequences, and of which people say that it actually makes their work significantly easier.

But, disclaimer, this is only from second-hand knowledge, I'm not working in the field.

rafaelmn · 3h ago

This is another dimension of the problem - what's even considered AI ? AlphaFold is a very specialized model - and I feel the AI boom is driven by hypothesis that general models eventually outperform specialized ones given enough size/data/whatever.

_0ffh · 2h ago

While I hate the apparent renaming of everything ML to "AI", things like AlphaFold would be "narrow AI".

As to the common idea of having to wait for general AI (AGI) to bring the gains, I have been quite sure since the start of the recent AI hype cycle that narrow AI will have silently transformed much of the world before AGI even hits the town.

rsynnott · 2h ago

They do mention that it has been somewhat useful in protein folding.

> Or are we just using AI to remix existing general knowledge, while making no progress of any sort in any field using it?

AIUI they are generally not talking about LLMs here.

fabian2k · 3h ago

I suspect that people saying this are avoiding to make broad conclusions based only on the AI tools that exist right now at this moment. So they leave a lot of room for the next versions to improve.

Maybe too much room, but it's hard to predict if AI tools will overcome their limitations in the near future.

phpnode · 1h ago

it's the new "I love my tesla but here are 15 reasons why it's broken" - if you don't provide platitudes the mob provides pitchforks

moffkalast · 4h ago

> AlphaEvolve’s procedure found an algorithm to multiply 4x4 complex-valued matrices using 48 scalar multiplications, improving upon Strassen’s 1969 algorithm that was previously known as the best in this setting. This finding demonstrates a significant advance over our previous work, AlphaTensor, which specialized in matrix multiplication algorithms, and for 4x4 matrices, only found improvements for binary arithmetic.

> To investigate AlphaEvolve’s breadth, we applied the system to over 50 open problems in mathematical analysis, geometry, combinatorics and number theory. The system’s flexibility enabled us to set up most experiments in a matter of hours. In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.

> And in 20% of cases, AlphaEvolve improved the previously best known solutions, making progress on the corresponding open problems. For example, it advanced the kissing number problem. This geometric challenge has fascinated mathematicians for over 300 years and concerns the maximum number of non-overlapping spheres that touch a common unit sphere. AlphaEvolve discovered a configuration of 593 outer spheres and established a new lower bound in 11 dimensions.

https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

(this is an LLM driven pipeline)

Ygg2 · 3h ago

That's less LLM and more three projects by DeepMind team.

And it's far from commercial availability.

moffkalast · 3h ago

Well to me personally it at least proves something that's long been touted as impossible, that the current architecture can in fact do better than all humans at novel tasks, even if it needs a crutch at the moment.

An LLM-based system now holds the SOTA approach on several math problems, how crazy is that? I wasn't convinced before, but now I guess it won't be many decades before we view making new scientific advancements as viable as winning against stockfish.

Ygg2 · 1h ago

Yeah, but those rely on setting up the evolve function. And they aren't well guaranteed to be better than humans. They might find you an improvement, but aren't guaranteed to do as shown here[1] (green means better, red means worse, gray means same).

[1] https://youtu.be/sGCmu7YKgPA?t=480

shusaku · 2h ago

> Or even the AI-driven incremental improvements?

You have no idea what you are talking about. Every day there is plenty of research published that used AI to help achieve scientific goals.

Now LLMs are another matter, and probably a ways off before we reap the benefit beyond day to day programming / writing productivity.

apples_oranges · 3h ago

"AI is a competent specialist in all fields except in mine."

yfontana · 3h ago

From the article:

> Besides protein folding, the canonical example of a scientific breakthrough from AI, a few examples of scientific progress from AI include:1

> Weather forecasting, where AI forecasts have had up to 20% higher accuracy (though still lower resolution) compared to traditional physics-based forecasts.

> Drug discovery, where preliminary data suggests that AI-discovered drugs have been more successful in Phase I (but not Phase II) clinical trials. If the trend holds, this would imply a nearly twofold increase in end-to-end drug approval rates.

nicoco · 5h ago

I am not a AI booster at all, but the fact that negative results are not published and that everyone is overselling their stuff in research papers is unfortunately not limited to AI. This is just a consequence of the way scientists are evaluated and of the scientific publishing industry, which basically suffers from the same shit than traditional media does (craving for audience).

Anyway, winter is coming, innit?

moravak1984 · 4h ago

Sure, it's not. But often on AI papers one sees remarks that actually mean: "...and if you throw in one zillion GPUs and make them run until the end of time you get {magic_benchmark}". Or "if you evaluate this very smart algo in our super-secret, real-life dataset that we claim is available on request, but we'd ghost you if you dare to ask, then you will see this chart that shows how smart we are".

Sure, it is often flag-planting, but when these papers come from big corps, you cannot "just ignore them and keep on" even when there are obvious flaws/issues.

It's a race over resources, as a (former) researcher on a low-budget university, we just cannot compete. We are coerced to believe whatever figure is passed on in the literature as "benchmark", without possibility of replication.

nicoco · 4h ago

I agree with that. Classically used "AI benchmarks" need to be questioned. In my field, these guys have dropped a bomb, and no one seem to care: https://hal.science/hal-04715638/document

baxtr · 4h ago

Can you give brief summary why this paper is a breakthrough for an outsider of the field?

mzl · 4h ago

Checking it shortly (I haven't seen the paper before) this seems to be a very good analysis of how results are reported specifically for medical imaging benchmarks.

As is often the case with statistics, selecting just a single number to report (whatever that number is) will hide a lot of different behaviours. Here, they show that just using the mean is a bad way to report data as the confidence intervals (reconstructed by the methods in the paper in most cases) show that the models can't really be distinguished based on their mean.

amarcheschi · 2h ago

Hell, I was asked to use confidence interval as well as average values for by bs thesis when doing ml benchmarks and scientist publishing results in medical fields aren't doing it...

How can something like that happen? I mean, i had a supervisor tell me "add the confidence interval to the results as well", and explained me why. I guess that at nobody ever told them? Or they didn't care? Or it's just a honest mistake

nicoco · 34m ago

I don't think it qualifies as a breakthrough. In short:

1. Segmentation is a very classical in medical image processing. 2. Everyday there are papers claiming that they beat the state of the art 3. This paper says that most of the time, the state of the art has not been beat because they actually are in the margin of error.

KurSix · 2h ago

AI just happens to be the current hype magnet, so the cracks show more clearly

croes · 5h ago

But AI makes it easier to write convincing looking papers

Flamentono2 · 3h ago

I'm not sure why people on HN (of all places) are so divided regarding the perception of AI/ML.

I have not seen anything like it before. We literaly had not system or way of even doing things like code generation based on text input.

Just last week i asked for a script to do image segmentation with a basic UI and claude just generated that for me in under 1 Minute.

I could list tons of examples which are groundbreaking. The whole Image generation stack is completly new.

That blog article is fair enough, there is hype around this topic for sure, but alone for every researcher who needs to write code for their research, AI can make them already a lot more efficient.

But i do believe, that we have entered a new ara: An ara were we take data again very serious. A few years back, you said 'the internet doesn't forget' then we realized that yes the internet starts to forget. Google deleted pages, removed the cache feature and it felt like we stoped caring for data because we didn't knew what to do with it.

Then ai came along. And not only is now data king again but we are now in the mids of reinforcment ara: We now give feedback and the systems incorporate that feedback into their training/learning.

And the ai/ml topic is getting worked on on every single aspect of it: Hardware, Algorithm, use cases, data, tools, protocols, etc. We are in the middle of incorporating and building for and on it. This takes a little bit of time. Still the progress is crazy exhausting.

We will only see in a few years if there is a real ceiling. We do need more GPUs, bigger Datacenters to do a lot more experiments on AI architecture and algorithm. We have a clear bottleneck. Big companies train one big model for weeks and month.

whyowhy3484939 · 50m ago

> Just last week i asked for a script to do image segmentation with a basic UI and claude just generated that for me in under 1 Minute.

Thing is we just see that it's copy pasting stack overflow, but now in a fancy way so this is sounding like "I asked Google for a nearby restaurant and it found it in like 500ms, my C64 couldn't do that". It sounds impressive (and it is) because it sounds like "it learned about navigating in the real world and it can now solve everything related to that" but what it actually solved is "fancy lookup in a GIS database". It's useful, damn sure it is, but once the novelty wears off you start seeing it for what it is instead of what you imagine it is.

Edit: to drive the point home.

> claude just generated that

What you think happened is AI is "thinking" and building a ontology over which it reasoned and came to the logical conclusion that this script was the right output. What actually happened is your input correlates to this output according to the trillion examples it saw. There is no ontology. There is no reasoning. There is nothing. Of course this is still impressive and useful as hell, but the novelty will wear off in time. The limitations are obvious by this point.

callc · 2h ago

> “I'm not sure why people on HN (of all places) are so divided regarding the perception of AI/ML.”

Everyone is a rational actor from their individual perspective. The people hyping AI, and the people dismissing the hype both have good reasons.

The is justification to see this new tech as ground breaking. There is justification to be weary about massive theft of data and dismissiveness of privacy.

First, acknowledge and respect that there are so many opinions on any issue. Take yourself out of the equation for a minute. Understand the other side. Really understand it.

Take a long walk in other people’s shoes.

KurSix · 2h ago

But on the flip side, the "AI will revolutionize science" narrative feels way ahead of what the evidence supports

Barrin92 · 1h ago

>but alone for every researcher who needs to write code for their research, AI can make them already a lot more efficient.

scientists don't need to be efficient, they need to be correct. Software bugs were already a huge cause of scientific error, and responsible for lack of reproducibility, see for example cases like this (https://www.vice.com/en/article/a-code-glitch-may-have-cause...)

Programming in research environments is done with some notoriously questionably variation in quality, as is the case for the industry to be fair, but in research minor errors can ruin results of entire studies. People are fed up and come to much harsher judgements on AI because in an environment like a lab you cannot write software with the attitude of an impressionist painter or the AI equivalent, you need to actually know what you're typing.

AI can make you more efficient if you don't care if you're right, which is maybe cool if you're generating images for your summer beach volleyball event, but it's a disastrous idea if you're writing code in a scientific environment.

raesene9 · 4h ago

Interesting article. There is always risk that a new hot technique will get more attention that it ultimately warrants.

For me the key quote in the article is

"Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled."

Understanding people's incentives is often very useful when you're looking at what they're saying.

ktallett · 3h ago

There are those who have realised they can make a lot of cash from it and also get funding by using the term AI. But at the end of the day what software doesn't have some machine learning built in. It's nothing new, nor is the current implementations particularly extraordinary or accurate.

rhubarbtree · 5h ago

I think this is mostly just a repeat of the problems of academia - no longer truth-seeking, instead focused on citations and careerism. AI is just a.n.other topic where that is happening.

geremiiah · 4h ago

I don't want to generalize because I do not know how widespread this pattern is, but my job has me hopping between a few HPC centers around Germany, and a pattern I notice is that, a lot of these places are chuck full of reject physicists, and a lot of the AI funding that gets distributed gets gobbled up by these people and the consequence of which is a lot of these ML4Science projects. I personally think it is a bit of a shame, because HPC centers are not there to only serve physicists, and especially with AI funding we in Germany should be doing more AI-core research.

ktallett · 3h ago

HPCs are usually in Collab with universities for specific science research. Using up their resources is hopping on the bandwagon to damage another industry.an industry (AI) which is neither new nor anywhere close to being anything more than an personal assistant at the moment. Not even a great one at that.

shusaku · 2h ago

> a pattern I notice is that, a lot of these places are chuck full of reject physicists

Utter nonsense, these are some of the smartest people in the world who do incredibly valuable science.

barrenko · 4h ago

Seriously don't understand what "no longer" does here.

omneity · 4h ago

The article initially appears to suggest that all AI in science (or at least the author’s field) is hype. But their gripe seems to be specific to an architecture named PINN that seems to be overhyped, as they mention in the end how they end up using other DL models to successfully compute PDEs faster than traditional numerical methods.

geremiiah · 3h ago

It's more widespread than PINNs. PINNs have been widely known to be rubbish a long time ago. But the general failure of using ML for physics problems is much more widespread.

Where ML generally shines is either when you have relatively lots of experimental data with respect to a fairly narrow domain. This is the case for machine learned interatomic potentials MLIPs which have been a thing since the '90s. Also potentially the case for weather modelling (but I do not want to comment about that). Or when you have absolute insane amounts of data, and you train a really huge model. This is what we refer to as AI. This is basically why Alphafold is successful, and Alphafold still fails to produce good results when you query it on inputs that are far from any data points in its training data.

But most ML for physics problems tend to be somewhere in between. Lacking experimental data and working with not enough simulation data because it is so expensive to produce. And also training models that are not large enough, because inference would be too slow, anyway, if they were too big. And then expecting these models to learn a very wide range of physics.

And then everyone jumps in on the hype train, because it is so easy to give it a shot. And everyone gets the same dud results. But then they publish anyway. And if the lab/PI is famous enough or if they formulate the problem in a way that is unique and looks sciency or mathy, they might even get their paper in a good journal/conference and get lots of citations. But in the end, they still only end up with the same results as everyone else: replicates the training data to some extent, somebody else should work on the generalizability problem.

hyttioaoa · 4h ago

He published a whole paper providing a systematic analysis of a wide range of models. There's a whole section on that. So it's not specific to PINN.

nottorp · 4h ago

Replace PINN with any "AI" solution for anything and you'll still find it overhyped.

The only realistic evaluations of "AI" so far are those that admit it's only useful for experts to skip some boring work. And triple check the output after.

eviks · 2h ago

> I found that AI methods performed much worse than advertised.

Lesson learned: don't trust ads

> Most scientists aren’t trying to mislead anyone

More learning ahead, the exciting part of being a scientist!

therebase · 2h ago

It is not only about the results we create with these tools, but as well about the effect they have on us as a result.

Just about tech engineering here but I do think it transfers to science as well.

https://dev.to/sebs/the-quiet-crisis-how-is-ai-eroding-our-t...

pawanjswal · 4h ago

Appreciate the honesty. AI isn’t magic, and it’s refreshing to see someone actually say it out loud.

wrren · 1h ago

AI companies are hugely motivated to show beyond-human levels of intelligence in their models, even if it means flubbing the numbers. If they manage to capture the news cycle for a bit, it's a boost to confidence in their products and maybe their share price if they're public. The articles showing that these advances are largely junk aren't backed by corporate marketing budgets or the desires of the investor class like the original announcements were.

sublimefire · 2h ago

Great analysis and spot on examples. Another issue with AI related research is that a lot of papers are new and not that many get published in “proper” places, yet being quoted right/left/center, just look at google scholar. It is hard to repro the results and check the validity of some statements, not to mention that research which was done 4 years ago used one set of models and now another set of models with different training data is used in tests. It is hard to establish what really affects the results and if the conclusions are applicable to some specific property of the outdated model or if it is even generalisable.

shalmanese · 3h ago

This is less an article about AI and more about, one of the less talked about functions of a PhD program is becoming literate at “reading” academic claims beyond their face value.

None of the claims made in the article are surprising because they’re the natural outgrowth of the hodgepodge of incentives we’ve accreted as what we call “science” over time and you just need to practice over time to be able to place the output of science in the proper context and understand that a “paper” is an artifact of a sociotechnical system with all the entailing complexity that demands.

tonii141 · 3h ago

This article addresses the misconception that arises when someone lacks a clear understanding of the underlying mathematics of neural networks and mistakenly believes they are a magical solution capable of solving every problem. While neural networks are powerful tools, using them effectively requires knowledge and experience to determine when they are appropriate and when alternative approaches are better suited.

sgt101 · 55m ago

I think that while the mathematics of neural networks are clearly completely understood we do not really understand why neural networks behave the way that they do when combined with large amounts of real world data.

In particular the ability of auto regressive transformer based networks to produce sequences speech while being immutable still shocks me whenever I think about it. Of course, this says as much about what we think of ourselves and other humans as it does about the matrices. I also think that the weather forcasting networks are quite shocking, the compression that they have achieved in modeling the physical system that produces weather is frankly.... wrong... but it obviously does actually work.

constantcrying · 3h ago

This does not apply to PINNs though. They were used and investigated by people deeply knowledgeable about numerics and neural networks, they just totally failed to live up to expectation.

tonii141 · 3h ago

"they just totally failed to live up to expectation"

Because the expectation was too high. If you are aiming for precision, neural networks might not be the best solution for you. That is why generative AI works so well, it doesn’t need to be extremely precise. On the other hand you don't see people use neural networks in system control for cricital processes.

sgt101 · 30m ago

Apart from self driving and autonomous flying...

tonii141 · 16m ago

AI is used in scene understanding for those applications, but there is no neural network that is steering the wheel.

intended · 4h ago

Verification is at the heart of economic and intellectual activity.

reify · 4h ago

Extremely diplomatic

Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled.

In other words scientists are trying to mislead everyone because there are a lot of incentives; money and professional status to name just two.

A common problem across all disciplines of science.

thrdbndndn · 4h ago

I understand lots of people would (rightfully) say "no shit", but I think it's good to actually describe how it is in details. So kudos to the author.

KurSix · 2h ago

The comparison to the replication crisis is spot on

spwa4 · 3h ago

TLDR: AI is like any new method in software engineering. It is not a general solution, and by itself not that useful, only as an addition. Unless an expert human takes a LOT of time to fine-tune the method (ie. automatically selecting what works well in what case, using the best method in almost all cases) it only performs well in a very small subset of cases.

kumarvvr · 4h ago

Are complex math problems just solvable by LLMs, as a stream of language tokens?

I mean, there ought to be an element of abstract thought, abstract reasoning, abstract inter-linking of concepts, etc, to enable mathematicians to solve complex math theorems and problems.

What am I missing?

geremiiah · 4h ago

LLMs are not involved anywhere. You start with some data. Either simulation data or experimental data. Then you train a model to either learn a time evolution operator or a force field. Then you apply it to more input data, and you visualize the results.

One typical use case is that the simulation data takes months to generate. So for experimental use cases, it is very slow. So the idea was, to train a model that can learn the underlying physics. The model will be small enough so that inference won't be prohibitively expensive. So you can then use the ML model in lieu of the classical physics based model.

Where this usually fails is that while ML models can be trained well enough to replicate the training data, they typically fail to generalize well outside of the domain and regime of the training data. So unless your experimental problems are entirely within the same domains and regimes as the training data, your model is of not much use.

So claims of generalizability and applicability are always dubious.

Lots of publications on this topic follow the same pattern: conceive of a new architecture or formalism, train an ML model on widely available data, results show that it can reproduce the training data to some extent, mention generalizability in the discussion but never test it.

yapyap · 4h ago

Glad to see some people who bought into the nonsense are waking up

BenFranklin100 · 5h ago

The author is a Princeton PhD grad working in physics. Funding for this type of work usually comes from the NSF. NSF is under attack by DOGE, and Trump has proposed slashing the NSF budget by 55%.

A reason used to justify these massive cuts is that AI will soon replace traditional research. This post demonstrates this assumption is likely false.

thrdbndndn · 4h ago

Kinda off-topic, but about the cutting itself:

I used to work in academia and was involved in NSF-funded programs, so I have mixed feelings about this. Waste and inefficiency are rampant. BTW I'm not talking about failed projects or research that were dimmed "not important", but things like buying million-dollar equipment just to use up the budget, which then sits idle.

That said, slashing NSF funding by 50% won’t fix any of that. It’ll just shrink both the waste and the genuinely valuable research proportionally. So it's indeed a serious blow to real scientific progress.

I don’t really have a point, just want to put it here. Also to be fair this kind of inefficiency isn't unique to academia; it’s common anywhere public money is involved.

nottorp · 4h ago

> but things like buying million-dollar equipment just to use up the budget, which then sits idle.

That's not specific to the US, and it's because of a perverse incentive coming from those who assign the funds.

If you don't use them they cut your funding for the next cycle.

thrdbndndn · 2h ago

This is a common saying, and I’m sure there’s some truth to it. But IMHO, the main reason is that PIs just want to use the money.

Because… why not? It's free money. Having the newest shiny equipment, at the very least, boosts your group’s reputation. Not to mention that straight-up corruption (pocketing funds for personal gain) is not unheard of.

dgb23 · 1h ago

Even considering these inefficiencies, which are certainly to be taken seriously, there aren't a lot of things that have such a high ROI as research and education.

Upvoter33 · 59m ago

Yeah companies never waste money.

I’m tired of all the complaining about waste and overhead in academics. Companies waste money all the time…

surfingdino · 5h ago

When politicians get involved in research scientific proof and reason aren't always winning.

adastra22 · 5h ago

Politicians have been involved in research for over a century.

toolslive · 5h ago

Didn't a politician invent the internet ?

eschaton · 3h ago

That’s a meme created and spread by pseudo-journalist Declan McCullagh specifically to tar Al Gore in the lead-up to the 2000 election.

Specifically, Gore said in an interview that he “took the initiative in creating the Internet” by introducing the bill to allow commercial traffic on ARPAnet, which McCullagh twisted in an article to “Al Gote claimed he invented the Internet” in order to to smear him.

sundarurfriend · 2h ago

Given how close that election turned out to be, this smear campaign likely changed the presidency, and given George WMD Bush's actions, changed the course of the world for the worse in many ways. (For those who were too young or not yet born at the time, these jokes were MASSIVE to the extent that became largely Al Gore was known for, for years after. So it's not much of an exaggeration to say they had a material impact on his perception and hence the votes.)

Al Gore understood technology, the internet, was a champion for the environment, and it's unbelievable today that he came that close to presidency (and then lost). When people say "we live in the bad timeline", one of the closest good timelines is probably one where this election went differently.

TypingOutBugs · 4h ago

Who?

sundarurfriend · 2h ago

Al Gore played a big role in getting political (and hence economic) support for the expansion of the Internet.

https://en.wikipedia.org/wiki/Al_Gore_and_information_techno... :

> Al Gore, a strong and knowledgeable proponent of the Internet, promoted legislation that resulted in President George H.W Bush signing the High Performance Computing and Communication Act of 1991. This Act allocated $600 million

> In the early 1990s the Internet was big news ... In the fall of 1990, there were just 313,000 computers on the Internet; by 1996, there were close to 10 million. The networking idea became politicized during the 1992 Clinton–Gore election campaign, where the rhetoric of the information highway captured the public imagination.

Your parent comment is either joining in on the ridicule side or at least in misquoting:

> Gore became the subject of controversy and ridicule when his statement, "I took the initiative in creating the Internet", was widely quoted out of context. It was often misquoted by comedians and figures in American popular media who framed this statement as a claim that Gore believed he had personally invented the Internet.[54] Gore's actual words were widely reaffirmed by notable Internet pioneers, such as Vint Cerf and Bob Kahn, who stated, "No one in public life has been more intellectually engaged in helping to create the climate for a thriving Internet than the Vice President."

nathias · 4h ago

people will continue to publish cope articles about how AI is useless, far after superintelligence will be reached

vaylian · 3h ago

citation needed

rajnathani · 1h ago

Just 2 days ago, there was an HN post about an AI-aided discovery of a fast matrix multiplication algorithm ("X X^t can be faster" | 198 points, 61 comments): https://news.ycombinator.com/item?id=44006824