Show HN: Nallely – A Python signals/MIDI processing system inspired by Smalltalk (dr-schlange.github.io)

The new 2.5 Pro (05-06) definitely does not have any sort of meaningful 1 million context window, as many users have pointed out. It does not even remember to generate its reasoning block at 50k+ tokens.

Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.

dandiep · 120d ago

I wish Google would provide a WebRTC endpoint for their Live mode like Open AI does for their Realtime models [1]. Makes it so much easier to deploy without needing something like LiveKit or Pipecat.

1. https://platform.openai.com/docs/guides/realtime#connect-wit...

Aeolun · 121d ago

I think it’s pretty strange how time and time again I see the scores for other models go up, but when I actually use them it sucks, and then I go back to Claude.

It’s also nice Claude just doesn’t update until they have actual improvements to show.

jacob019 · 121d ago

Claude is great for code, if pricy, but when it gets stuck I break out Gemini 2.5 pro. It's smarter, but wants to rewrite everything to be extremely vebose and defensive, introducing bugs and stupid comments. 2.5 flash is amazing for agentic work. Each frontier model has unique strengths.

mchusma · 121d ago

I strongly dislike the “updating of versions” whenever possible. Versions are rarely better in all ways, makes things harder. Just make it version 2.6.

andrewstuart · 120d ago

I love Gemini.

I just wish they’d give powerful options for getting files out of it.

I’m so sick of cutting and pasting.

It would be nice to git push and pull into AI Studio chats or SFTP.

russfink · 121d ago

Why don’t companies publish hashes of emitted answers so that we, eg teachers, could verify if the AI produced this result?

perdomon · 121d ago

Hashes of every answer to every question and every variation of that question? If that were possible, you’d still need to account for the extreme likelihood of the LLM providing a differently worded answer (it virtually always will). This isn’t how LLMs or hashing algorithms work. I think the answer is that teachers need to adjust to the changing technological landscape. It’s long overdue, and LLMs have almost ruined homework.

fuddy · 121d ago

Hashing every answer you ever give is the kind of thing that is done with hashing algorithms, the trouble is that the user can trivially make an equally good variant with virtually any (well an unlimited number of possible) change, and nothing has hashed it.

haiku2077 · 121d ago

Ever heard of the meme:

"can I copy your homework?"

"yeah just change it up a bit so it doesn't look obvious you copied"

No comments yet

evilduck · 121d ago

Local models are possible and nothing in that area of development will ever publish a hash of their output. The huge frontier models are not reasonably self-hosted but for normal K-12 tasking a model that runs on a decent gaming computer is sufficient to make a teacher's job harder. Hell, a small model running on a newer phone from the last couple of years could provide pretty decent essay help.

haiku2077 · 121d ago

Heck, use a hosted model for the first pass, send the output to a local model with the prompt "tweak this to make it sound like it was written by a college student instead of an AI"

Atotalnoob · 121d ago

There are the issues others mentioned, but also you could write something word for word of what an LLM says.

It’s statistically unlikely, but possible

BriggyDwiggs42 · 121d ago

There’s an actual approach where you have the LLM generate patterns of slightly less likely words and then can detect it easily from years ago. They don’t want to do any of that stuff because cheating students are their users.

subscribed · 121d ago

This is exactly where users of English as second language are being accused of cheating -- we didn't grew with the live language, but learnt from movies, classic books, and in school (the luckiest ones).

We use rare or uncommon words because of how we learned and were taught. Weaponising it against us is not just a prejudice, it's idiocy.

You're postulating using a metric that shows how much someone deviates from the bog standard, and that will also discriminate against the smart, homegrown erudites.

This approach is utterly flawed.

BriggyDwiggs42 · 121d ago

I’m referencing a paper I saw in passing multiple years ago, so forgive me if I didn’t elaborate the exact algorithm. The LLM varies its word selection in a patterned way, eg most likely word, 2nd most, 1st, 2nd, and so on. It’s statistically impossible for an esl person to happen to do this on accident.

haiku2077 · 121d ago

I remember when my parents sent me to live with my grandparents in India for a bit, all the English language books available were older books, mostly British authors. I think the newest book I read that summer that wasn't a math book was Through the Looking Glass.

dietr1ch · 121d ago

I see the problem you face, but I don't think it's that easy. It seems you can rely on hashes being noisy and alter questions or answers a little bit to get around the LLM homework naughty list.

staticman2 · 121d ago

It would be pretty trivial to paraphrase the output wouldn't it?

fenesiistvan · 121d ago

Change one character and the hash will not match anymore...

silisili · 121d ago

Just ctrl-f for an em dash and call it a day.

Apple: SSH and FileVault (keith.github.io)

The Sagrada Família Takes Its Final Shape (newyorker.com)

Nvidia buys $5B in Intel (tomshardware.com)

Want to piss off your IT department? Are the links not malicious looking enough? (phishyurl.com)

Learn Your Way: Reimagining Textbooks with Generative AI (research.google)

This map is not upside down (maps.com)

AI tools are making the world look weird (strat7.com)

Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs (github.com)

David Lynch LA House (wallpaper.com)

Meta’s live demo fails; “AI” recording plays before the actor takes the steps (reddit.com)

Rupert's snub cube and other Math Holes (tom7.org)

Show HN: Asxiv.org – Ask ArXiv papers questions through chat (asxiv.org)

Show HN: Nallely – A Python signals/MIDI processing system inspired by Smalltalk (dr-schlange.github.io)

Visual lexicon of consumer aesthetics from the 1970s until now (cari.institute)

Configuration files are user interfaces (ochagavia.nl)

Show HN: I created a small 2D game about an ant (aanthonymax.github.io)

Tldraw SDK 4.0 (tldraw.dev)

Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com)

TernFS – An exabyte scale, multi-region distributed filesystem (xtxmarkets.com)

Slack has raised our charges by $195k per year (skyfall.dev)

Classic recessive-or-dominant gene dynamics may not be so simple (news.stanford.edu)

KDE is now my favorite desktop (kokada.dev)

Flipper Zero Geiger Counter (kasiin.top)

Luau – Fast, small, safe, gradually typed scripting language derived from Lua (luau.org)

OpenTelemetry Collector: What It Is, When You Need It, and When You Don't (oneuptime.com)

When Knowing Someone at Meta Is the Only Way to Break Out of "Content Jail" (eff.org)

They Know More Than I Do (cybadger.com)

TIC-80 – Tiny Computer (tic80.com)

Nvmath-Python: Nvidia Math Libraries for the Python Ecosystem (github.com)

The quality of AI-assisted software depends on unit of work management (blog.nilenso.com)

PostgreSQL Maintenance Without Superuser (boringsql.com)

Aaron Levie: Startups win in the AI era [video] (youtube.com)

American Prairie unlocks another 70k acres in Montana (earthhope.substack.com)

CircuitHub (YC W12) Is Hiring Operations Research Engineers (UK/Remote) (ycombinator.com)

Midcentury North American Restaurant Placemats (casualarchivist.substack.com)

OneDev – Self-hosted Git server with CI/CD, Kanban, and packages (onedev.io)

Show HN: Dyad, local, open-source Lovable alternative (Electron desktop app) (dyad.sh)

Pnpm has a new setting to stave off supply chain attacks (pnpm.io)

U.S. already has the critical minerals it needs, according to new analysis (minesnewsroom.com)

I Built an Event-Sourcing Database Engine: Meet Genesis DB (genesisdb.io)

This website has no class (aaadaaam.com)

Dark patterns killed my wife's Windows 11 installation (osnews.com)

Grief gets an expiration date, just like us (bessstillman.substack.com)

Automatic differentiation can be incorrect (stochasticlifestyle.com)

ICE unit signs new $3M contract for phone-hacking tech (techcrunch.com)

The Fake Social Binary (brennenputh.me)

The Day the Linter Broke My Code (blog.fillmore-labs.com)

Fast Fourier Transforms Part 1: Cooley-Tukey (connorboyle.io)

California electric vehicle drivers will lose carpool lane privileges (latimes.com)

Chrome's New AI Features (blog.google)

Gemini 2.5: Our most intelligent models are getting even better

Comments (21)