AMD EPYC Venice boasts 256 cores – next-gen server CPUs arrive in 2026 (tomshardware.com)

1 points by ksec 8m ago 0 comments

AI Overviews hallucinates that Airbus not Boeing involved in Air India crash (arstechnica.com)

2 points by kristianp 13m ago 3 comments

Analysing FIX Data with ClickHouse (benjaminwootton.com)

2 points by saisrirampur 14m ago 0 comments

Fixing the mechanics of my bullet chess (jacobbrazeal.wordpress.com)

7 points by tibbar 14m ago 1 comments

"poline" is an enigmatic color palette generator using polar coordinates (meodai.github.io)

1 points by zdw 16m ago 0 comments

A dying Judo Master's lesson to develop extreme competency (creators.yahoo.com)

1 points by thunderbong 20m ago 0 comments

Addiction: The View from Rat Park (2010) (brucekalexander.com)

1 points by curmudgeon22 26m ago 0 comments

Embedding Godot games in iOS became easy (christianselig.com)

1 points by goranmoomin 28m ago 0 comments

"Special thanks to our sponsor: Coinbase" (twitter.com)

3 points by burkaman 33m ago 0 comments

Show HN: ETL System to Extract Product Data from Websites and Upload to Shopify (github.com)

1 points by gustavofortti 36m ago 0 comments

GenAI as an International Lawyer (papers.ssrn.com)

1 points by paulpauper 40m ago 0 comments

DNL Ramp-Up Time (exfatloss.com)

1 points by paulpauper 41m ago 0 comments

My Economist Father (wsj.com)

1 points by paulpauper 42m ago 0 comments

LLMunix - A Pure Markdown Operating System (github.com)

1 points by matiasmolinas 43m ago 1 comments

The Macintosh Mouse (web.stanford.edu)

1 points by astnai 45m ago 0 comments

While Senate Considers Genius Act, Russian Is Charged with Stablecoin Laundering (wsj.com)

1 points by bookofjoe 47m ago 1 comments

Imperfections as the New Perfections? (github.com)

1 points by davidkimai 49m ago 0 comments

U.S. Housing Market Has 500k More Sellers Than Buyers (businesswire.com)

3 points by geox 52m ago 0 comments

Basic and Necessary Tooling for Creating FPGA Retro Hardware Game Cores [video] (youtube.com)

1 points by retro_guy 55m ago 0 comments

Frontier language models have become much smaller (epoch.ai)

1 points by bblcla 59m ago 0 comments

LRM reasoning breaks down down past an unfamiliarity threshold, not "complexity" (twitter.com)

1 points by k1m 1h ago 0 comments

Show HN: S3mini(v0.2) – Basic S3 Support for Ceph and Oracle Object Storage (github.com)

2 points by neon_me 1h ago 0 comments

I used ChatGPT to learn programming from zero and built a video generation SaaS (vidmakerpro.com)

1 points by waiter-to-dev 1h ago 1 comments

Claude Code SDK for Python (github.com)

1 points by Topfi 1h ago 0 comments

Show HN: I coded this AI visibility tool in VR (Meta Quest 3) – meet Winglytics (winglytics.com)

1 points by ogulcanunal 1h ago 0 comments

Show HN: A reflex training web app built with Next.js and TypeScript (reflex.kennyt.me)

1 points by itsk3nny 1h ago 0 comments

Plan to Kill Dozens of NASA Missions Threatens US Space Supremacy (bloomberg.com)

7 points by xqcgrek2 1h ago 0 comments

The World’s Hardest Bluffing Game (theatlantic.com)

3 points by twalichiewicz 1h ago 2 comments

Why aren't people talking about AppArmor and SELinux in the age of AI? (old.reddit.com)

2 points by bartmr 1h ago 2 comments

Exploring the Security of AWS IAM Roles Anywhere (unit42.paloaltonetworks.com)

1 points by mooreds 1h ago 0 comments

Making Room for Mom: Iowa's Bold Move on Backyard Housing (strongtowns.org)

1 points by mooreds 1h ago 0 comments

PostHog raises $70M series D at almost $1B valuation (posthog.com)

3 points by XCSme 1h ago 0 comments

Show HN: A self-hosted AI UGC platform for SaaS owners (oneugc.studio)

1 points by yuvrajbuilds 1h ago 0 comments

Seven replies to the viral Apple reasoning paper and why they fall short (garymarcus.substack.com)

2 points by thnaks 1h ago 0 comments

Clinical knowledge in LLMs does not translate to human interactions (arxiv.org)

34 points by insistent 1h ago 19 comments

Minnesota lawmakers "targeted" in shooting that killed Melissa Hortman (axios.com)

9 points by typeofhuman 1h ago 0 comments

Show HN: ZeroConfigDNLA – Easy to run media server in Python (github.com)

1 points by richstokes 1h ago 0 comments

Prep smarter for remote 1-on-1s (tndm.app)

2 points by TandemApp 1h ago 1 comments

Infinite Grid of Resistors (mathpages.com)

52 points by niklasbuschmann 1h ago 10 comments

LTO-10 Tape Drive Is Here (ltoultrium.com)

2 points by bilegeek 1h ago 0 comments

How to Deploy Schema Changes to a Million Databases (turso.tech)

2 points by icar 1h ago 0 comments

Why Mario 64 is a Gameboy Advance game [video] (youtube.com)

2 points by magnusl 1h ago 0 comments

Show HN: Webhookify.me – Turn any sync API call into an async webhook (webhookify.me)

1 points by thisfounder 1h ago 0 comments

Bcachefs: Journal rewind (lore.kernel.org)

2 points by bladeee 1h ago 0 comments

Abyssal seafloor as a key driver of ocean trace-metal biogeochemical cycles (nature.com)

1 points by bookofjoe 2h ago 0 comments

Socratic Persuasion: Giving Opinionated yet Truth-Seeking Advice (neelnanda.io)

1 points by eamag 2h ago 1 comments

Show HN: AgentPayy – Open-source payment framework for AI agents (github.com)

1 points by LawrenceDigital 2h ago 0 comments

One more reason to choose Postgres over MySQL (tanin.nanakorn.com)

2 points by tanin 2h ago 0 comments

20 days, one porsche, and a Greek wedding (matthewbrown.io)

2 points by mnbbrown 2h ago 0 comments

Zen MCP: Claude Code and [Gemini / O3 / OpenRouter / Ollama / Any Model] (github.com)

2 points by indigodaddy 2h ago 0 comments

Scaling RNNs to Billions of Parameters with Zero Order

7 fchaubard 3 5/26/2025, 5:35:03 PM arxiv.org ↗

Comments (3)

impossiblefork · 18d ago

Obviously the authors emphasize that it can make RNNs a competitor for big transformers, but it also means you can do things like feed back part of the output of a transformer into the input of the transformer at the next step, or other ways of making transformers into RNNs, so RNNs don't have to be all about speed.

I think this has every chance of being an enabler for much more powerful architectures.

Depth of a transformer is the number of layers. Depth of a transformer with a recurrent connection from the previous token output to the current input is the number of layers times the timestep.

If it works as well as I imagine it's going to make for much more powerful models.

fchaubard · 16d ago

exactly

fchaubard · 19d ago

Layman Abstract: Transformers keep around all previous tokens for each generated token, so they take up ENORMOUS gpu memory and cost during inference. But humans do not, we page in / out of our small, fixed-size "working memory", keeping around only the important information of the past.

RNNs are more like us, they compress all previous tokens into a small fixed-sized memory. However, we can't train them with legacy backprop through time (BPTT), because it doesnt scale and suffers exploding/vanishing gradients.

So we discovered a 1992 zero order algorithm to replace BPTT, and not only does it scale amazingly well, in some cases, it trains 19x faster than BPTT! So maybe with this, RNNs can replace transformers?