Fastgen – SOTA LLM inference in 3k lines of Python

Comments (1)

mpu · 7h ago

We just released a tiny (~3kloc) Python library that implements state-of-the-art inference algorithms on GPU and provides performance similar to vLLM. We believe it's a great learning vehicle for inference techniques and the code is quite easy to hack on!

On-Demand: AI Agent Automation (on-demand.io)

U.S. Loses Last Triple-A Credit Rating (wsj.com)

FDA clears first blood test for diagnosing Alzheimer's (statnews.com)

Really Really Simple "Pure CSS" Squircles (gist.github.com)

After HTTPS: Indicating Risk Instead of Security (2019) (scholarsarchive.byu.edu)

Our Idea of Happiness Has Gotten Shallow (nytimes.com)

Dept Homeland Security in vetting process for immigrant reality TV show (cnn.com)

HMPL v3.0: Small template language for displaying UI from server to client (github.com)

Peter Lax, Pre-Eminent Cold War Mathematician, Dies at 98 (nytimes.com)

Beyond the Gang of Four: Practical Design Patterns for Modern AI Systems (infoq.com)

AI Could Help Humans Understand Animals (nautil.us)

Slopaganda (dbushell.com)

Implementing a Toy Optimizer (2022) (pypy.org)

Why are Truffles so expensive? Are they worth it? [video] (youtube.com)

SDL3 examples: Full game and app demos (examples.libsdl.org)

Amazon-owned Zoox issues recall following robotaxi crash (techcrunch.com)

China Drops to No. 3 Holder of Treasuries, Falling Behind UK (bloomberg.com)

Mice grow bigger brains when given this stretch of human DNA (nature.com)

US Credit Rating Cut by Moody's on Government Debt Increase (bloomberg.com)

Sloppy software is why you think you need new hardware (neowin.net)

Moody's Ratings Downgrades United States Ratings to Aa1 from Aaa (moodys.com)

When Should a SaaS App Implement SAML Authentication?

Top Priority for Pope Leo: Warn the World of the A.I. Threat (nytimes.com)

A Skeptical Look at Grand Designs for the Future (undark.org)

Coinbase says customers' personal information stolen in data breach (techcrunch.com)

Font Activations: A Note on the Type (robhorning.substack.com)

Scaling Python Task Queues Effectively (judoscale.com)

New Research Shows Online Ads Have Limited Impact on Consumer Valuation for Meta (heinz.cmu.edu)

Environmentally-Beneficial Housing Exemption (cayimby.org)

Solving physics-based initial value problems with unsupervised machine learning (link.aps.org)

OnTrack Goals Habits Tracker App (apps.apple.com)

Big tech's new datacentres will take water from the driest areas (theguardian.com)

Writer's Block and Other Tales of the Unexpected (amazon.co.uk)

Show HN: Self-Funded Game with Homemade Engine – Play Online, Steam Coming (bereprobate.com)

The Universe May End Sooner Than Scientists Had Expected (space.com)

Moody’s strips US of triple-A credit rating (ft.com)

Permacomputing (permacomputing.net)

A Search for Planet Nine with IRAS and Akari Data (arxiv.org)

Voyager 1's Primary Thrusters Revived Before DSN Command Pause (hackaday.com)

Come celebrate the Amiga/040th VCF West (amigameditation.guru)

I asked 100 people in Hollywood about AI (twitter.com)

Coinbase shares dive on NYT report of SEC investigation (cnbc.com)

Show HN: Lailaims – Talk to multiple LLMs at once (lailaims.pages.dev)

Show HN: Solidis – Tiny TS Redis client, no deps, for serverless (github.com)

Getting AI to write good SQL: Text-to-SQL techniques explained (cloud.google.com)

A Ledger in PostgreSQL Is Fast (pgrs.net)

Ask HN: Has LLM prompting changed how you interact with people?

Show HN: localflux – Flux CD based local K8s development (github.com)

The Journal of Computer Graphics Techniques (jcgt.org)

Objective pain score? Here's the problem with that (theconversation.com)

Fastgen – SOTA LLM inference in 3k lines of Python

Comments (1)