We should have the ability to run any code we want on hardware we own (hugotunius.se)

2071 points by K0nserv 15d ago 1203 comments

Cognitive load is what matters (github.com)

1582 points by nromiun 16d ago 526 comments

NPM debug and chalk packages compromised (aikido.dev)

1366 points by universesquid 7d ago 754 comments

I didn't bring my son to a museum to look at screens (sethpurcell.com)

1165 points by arch_deluxe 5d ago 387 comments

Show HN: A store that generates products from anything you type in search (anycrap.shop)

1137 points by kafked 2d ago 325 comments

I ditched Docker for Podman (codesmash.dev)

1118 points by codesmash 10d ago 654 comments

Germany is not supporting ChatControl – blocking minority secured (digitalcourage.social)

1111 points by xyzal 5d ago 355 comments

30 minutes with a stranger (pudding.cool)

1100 points by MaxLeiter 12d ago 375 comments

Charlie Kirk killed at event in Utah (nbcnews.com)

1074 points by david927 5d ago 3271 comments

Show HN: Term.everything – Run any GUI app in the terminal (github.com)

1074 points by mmulet 6d ago 144 comments

Hosting a website on a disposable vape (bogdanthegeek.github.io)

1057 points by BogdanTheGeek 15h ago 411 comments

996 (lucumr.pocoo.org)

1053 points by genericlemon24 9d ago 532 comments

Next.js is infuriating (blog.meca.sh)

1033 points by Bogdanp 14d ago 579 comments

Show HN: I recreated Windows XP as my portfolio (mitchivin.com)

1028 points by mitchivin 9d ago 322 comments

EU court rules nuclear energy is clean energy (weplanet.org)

1027 points by mpweiher 3d ago 1211 comments

The MacBook has a sensor that knows the exact angle of the screen hinge (twitter.com)

1021 points by leephillips 8d ago 487 comments

Anthropic agrees to pay $1.5B to settle lawsuit with book authors (nytimes.com)

988 points by acomjean 10d ago 738 comments

Signal Secure Backups (signal.org)

987 points by keyboardJones 7d ago 442 comments

Using Claude Code to modernize a 25-year-old kernel driver (dmitrybrant.com)

920 points by dmitrybrant 8d ago 319 comments

iPhone Air (apple.com)

902 points by excerionsforte 6d ago 1949 comments

Pontevedra, Spain declares its entire urban area a "reduced traffic zone" (greeneuropeanjournal.eu)

870 points by robtherobber 5d ago 942 comments

I replaced Animal Crossing's dialogue with a live LLM by hacking GameCube memory (joshfonseca.com)

865 points by vuciv 6d ago 185 comments

Google can keep its Chrome browser but will be barred from exclusive contracts (cnbc.com)

864 points by colesantiago 13d ago 632 comments

UTF-8 is a brilliant design (iamvishnu.com)

837 points by vishnuharidas 3d ago 340 comments

We all dodged a bullet (xeiaso.net)

827 points by WhyNotHugo 6d ago 483 comments

Stripe Launches L1 Blockchain: Tempo (tempo.xyz)

807 points by _nvs 11d ago 1070 comments

Mistral raises 1.7B€, partners with ASML (mistral.ai)

802 points by TechTechTech 7d ago 422 comments

New Mexico is first state in US to offer universal child care (governor.state.nm.us)

787 points by toomuchtodo 6d ago 661 comments

Chat Control Must Be Stopped (privacyguides.org)

786 points by Improvement 7d ago 259 comments

The treasury is expanding the Patriot Act to attack Bitcoin self custody (tftc.io)

777 points by bilsbie 3d ago 555 comments

“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

775 points by azeemba 15d ago 135 comments

Almost anything you give sustained attention to will begin to loop on itself (henrikkarlsson.xyz)

772 points by jger15 11d ago 223 comments

Where's the shovelware? Why AI coding claims don't add up (mikelovesrobots.substack.com)

770 points by dbalatero 12d ago 482 comments

Models of European metro stations (stations.albertguillaumes.cat)

728 points by tcumulus 2d ago 156 comments

Google AI Overview made up an elaborate story about me (bsky.app)

698 points by jsheard 14d ago 278 comments

iPhone dumbphone (stopa.io)

687 points by joshmanders 7d ago 394 comments

Claude Code: Now in Beta in Zed (zed.dev)

682 points by meetpateltech 12d ago 406 comments

Why our website looks like an operating system (posthog.com)

680 points by bnc319 4d ago 486 comments

Eternal Struggle (yoavg.github.io)

680 points by yurivish 15d ago 136 comments

Corporations are trying to hide job openings from US citizens (thehill.com)

675 points by b_mc2 3d ago 521 comments

KDE launches its own distribution (lwn.net)

675 points by Bogdanp 5d ago 525 comments

Many hard LeetCode problems are easy constraint problems (buttondown.com)

673 points by mpweiher 3d ago 525 comments

ICE is using fake cell towers to spy on people's phones (forbes.com)

667 points by coloneltcb 6d ago 255 comments

Court rejects Verizon claim that selling location data without consent is legal (arstechnica.com)

658 points by nobody9999 5d ago 86 comments

Claude now has access to a server-side container environment (anthropic.com)

658 points by meetpateltech 6d ago 343 comments

Hosting a website on a disposable vape (bogdanthegeek.github.io)

656 points by dmazin 20h ago 15 comments

I'm absolutely right (absolutelyright.lol)

651 points by yoavfr 10d ago 266 comments

LLM Visualization (bbycroft.net)

640 points by gmays 11d ago 46 comments

Notes on Managing ADHD (borretti.me)

633 points by amrrs 15d ago 330 comments

Serverless Horrors (serverlesshorrors.com)

622 points by operator-name 8d ago 484 comments

Necessary tool? Async LoRA for distributed systems

2 jfileto 0 9/16/2025, 4:37:34 AM

I’ve been building something I call Async LoRA to scratch an itch I kept running into: training on cheap GPUs (Salad, runpod, spot instances, etc.) is a nightmare for long jobs. One random node dying and suddenly hours of training are gone. Most schedulers just restart the whole container, which doesn’t really help. What I’ve put together so far:

• Aggregator/worker setup where the aggregator hands out small “leases” of work (per token sizes not time slices)

• Async checkpointing so progress gets saved continuously without pausing training.

• Preemption handling — when a worker dies, whatever it managed to do still counts, and the remaining work just gets reassigned.

• Training-aware logic (steps, tokens, loss) instead of treating jobs like black-box containers.

• Out-of-the-box hooks for PyTorch/DeepSpeed so you don’t have to glue it all together yourself. My goal is to make sketchy clusters behave more like reliable ones

I’d love feedback from people here:

• If you run training on spot/preemptible GPUs, how do you usually handle checkpoints/failures?

• What would make this easier to drop into an existing pipeline (Airflow, K8s, Ray, etc.)?

• For monitoring, would you rather see native training metrics (loss, tokens, staleness) or just surface logs/events and let you plug into your own stack?

Comments (0)

No comments yet