Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights (jameshard.ing)

1431 points by jamesharding 1d ago 190 comments

Show HN: AGL a toy language that compiles to Go (github.com)

48 points by alain_gilbert 3d ago 11 comments

Show HN: Vet – A tool for safely running remote shell scripts (getvet.sh)

51 points by a10r 8h ago 13 comments

Show HN: A news app where you define your algorithm

5 points by vincentyyy 3h ago 0 comments

Show HN: Sink – Sync any directory with any device on your local network (github.com)

138 points by sirbread 1d ago 89 comments

Show HN: Do you know RGB? (maxwellito.github.io)

87 points by maxwellito 4d ago 57 comments

Show HN: Zenta – Mindfulness for Terminal Users (github.com)

190 points by ihiep 1d ago 38 comments

Show HN: Leveraging Google ADK for Cyber Threat Intelligence (manta.black)

4 points by blackmanta 7h ago 1 comments

Show HN: Anti-Cluely – Detect virtual devices and cheating tools on exam systems (anti-cluely.com)

4 points by kkxingh 8h ago 0 comments

Show HN: Magnitude – Open-source AI browser automation framework (github.com)

132 points by anerli 2d ago 40 comments

Show HN: Query your Rust codebase and generate types for anything (github.com)

5 points by reaching4jack 9h ago 0 comments

Show HN: Open-Source outcome- / usage-based billing engine for AI Agents (github.com)

5 points by florentmsl 9h ago 0 comments

Show HN: I built an AI dataset generator (github.com)

163 points by matthewhefferon 2d ago 32 comments

Show HN: Clai - Vendor agnostic Claude Code/Gemini CLI written in Go (github.com)

4 points by baalimago 12h ago 0 comments

Show HN: eKilo – Super lightweight terminal text editor based (github.com)

4 points by antoniofoti 12h ago 0 comments

Show HN: Reactylon – Open-source framework for building 3D/XR apps with React (github.com)

7 points by lookingman_ 19h ago 0 comments

Show HN: PILF, The ultimate solution to catastrophic oblivion on AI models (github.com)

28 points by NetRunnerSu 1d ago 9 comments

Show HN: Hijacking Wi-Fi Direct for Seamless Wi-Fi Onboarding on Raspberry Pi (gist.github.com)

4 points by goodburb 14h ago 0 comments

Show HN: I made a 2D game engine in Dart (bullseye2d.org)

4 points by joemanaco 14h ago 0 comments

Show HN: Pobshell – A Bash‑like shell for live Python objects (github.com)

4 points by cogs 15h ago 1 comments

Show HN: A self-hosting compiler from WHILE to WebAssembly (github.com)

6 points by avatarluca 17h ago 0 comments

Show HN: Oasis – An open-source, 3D-printed smart terrarium (github.com)

141 points by jbuch 4d ago 18 comments

Show HN: I built Hispi, an app to design custom jewellery (hispi.app)

14 points by camjw 4d ago 14 comments

Show HN: Dungeon Master in Your Console (github.com)

11 points by spacecadet 1d ago 0 comments

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop (prss.co)

25 points by volted 2d ago 21 comments

Show HN: IssuePay – Get paid for open-source contributions (issuepay.app)

7 points by Mario10 1d ago 4 comments

Show HN: Scream to Unlock – Blocks social media until you scream “I'm a loser”

240 points by madinmo 3d ago 129 comments

Show HN: I open-sourced a $1M engine for closing loops in embedding space (github.com)

8 points by WFGY 20h ago 5 comments

Show HN: Wayland Speech-to-Text Tool (github.com)

16 points by artur_roszczyk 1d ago 2 comments

Show HN: FOHO – Safer rentals for foreigners in Korea with escrow and AI (foreignerhome.com)

6 points by FOHO 21h ago 1 comments

Show HN: AIOps MCP – Log anomaly detection using Isolation Forest (github.com)

5 points by kkorathaluri 21h ago 0 comments

Show HN: Elelem, a tool-calling CLI for Ollama and DeepSeek in C (codeberg.org)

45 points by atjamielittle 3d ago 3 comments

Show HN: Self-host your data anonymization pipeline (github.com)

5 points by dwa3592 1d ago 2 comments

Show HN: Samarium – open-source business management tool for developers (github.com)

3 points by gioeee 10h ago 0 comments

Show HN: Comparator - I built a free, open-source app to compare job offers (comparator-one.vercel.app)

84 points by MediumD 5d ago 42 comments

Show HN: I built a JSON-RPC library for Zig with compile time reflection (github.com)

12 points by ww520 2d ago 2 comments

Show HN: Weather Watching (walzr.com)

93 points by walz 5d ago 17 comments

Show HN: Autumn – Open-source infra over Stripe (github.com)

139 points by ayushrodrigues 4d ago 46 comments

Show HN: Anytype – a local and collaborative database with API and MCP server (zhanna.any.org)

19 points by sharipova 2d ago 0 comments

Show HN: AI-SDK-Cpp – Unified C++ SDK for OpenAI, Anthropic, and More (github.com)

5 points by cauchyk 1d ago 0 comments

Show HN: Nightcrawler – A mitmproxy-based scanner to find low-hanging fruit (github.com)

4 points by thesp0nge 1d ago 0 comments

Show HN: We launched an AI builders podcast (builtthisweek.com)

3 points by Jmetz1 1d ago 1 comments

Show HN: VSCan - Detect Malicious VSCode Extensions (vscan.dev)

52 points by shadow-ninja 4d ago 27 comments

Show HN: I built an AI app that counts the number of R's in strawberrrrry (claude.ai)

3 points by thngkaiyuan 1d ago 1 comments

Show HN: Thockfactory – An Online Configurator for Custom Keycaps Enthusiasts (thockfactory.com)

4 points by ehov 1d ago 0 comments

Show HN: Split Vim Markdown Preview – Terminal-Based Markdown Preview for Vim (github.com)

4 points by drewipson 1d ago 0 comments

Show HN: GPU market is absurd! So I built a dashboard of pricing/restock trends (gpuisfine.singhkays.com)

4 points by singhkays 1d ago 0 comments

Show HN: Built a Food Scanner for Longevity (getbiohack.app)

7 points by Fbue 2d ago 1 comments

Show HN: Open-Source International Space Station Tracker ESP32/Arduino for $20 (github.com)

3 points by keyth72 1d ago 0 comments

Show HN: A Beautiful Python GUI Framework (github.com)

7 points by ebaadesque 1d ago 1 comments

Life of an inference request (vLLM V1): How LLMs are served efficiently at scale

107 samaysharma 8 6/28/2025, 6:42:05 PM ubicloud.com ↗

Comments (8)

r0b05 · 55m ago

Great write up!

Does batching add data from multiple requests into the same context, potentially decreasing perplexity? If so, are we trading off perplexity for lower operating costs?

gdiamos · 2h ago

Great write up. We use vLLM kv cache and continuous batching as a foundation for requests in ScalarLM and also add batching optimizations in a centralized queue and by adding explicit batching support in our client.

https://www.scalarlm.com

There is more perf you can sqeeuze out of vLLM

mhlakhani · 3h ago

Thanks for writing this up! I learnt a bunch from it. I noticed this didn’t discuss additional layers of caching - I can see how it would fit in, but is prompt caching out of the scope of this system?

0xjunhao · 6h ago

Hi, I'm the author of this post. Writing it was a great learning experience. I gained a lot of insight into vLLM. If you have any feedback or questions, feel free to drop a comment below!

criemen · 6h ago

Thanks for writing the article!

I didn't quite get

Note that during the prefill phase, all prompt tokens from a request can be processed in one batch. This is possible because the query (Q) tensors, calculated from the tokens immediately before them, are available for each prompt token position.

I know that in practice prefill is much faster than inference. Would watching the 2h video from Karpathy help me understand why?

animan · 3h ago

That snippet is trying to say that you can calculate KV for all the input tokens at once, and you don't need to loop over them since you have them all available.

Instead for decode, you need to sequentially generate each token.

criemen · 6h ago

And on the topic of prefill: Do you know what the role of GPUs is vs. in inference?

animan · 3h ago

Prefill is part of Inference. It's the first major step where you calculate all the keys and values for the input tokens.

Decode is the next major step where you start generating output tokens one at a time.

Both run on GPUs but have slightly different workloads

1. Prefill has very little I/o from VRAM to HBM and more compute 2. Decode is light on compute but have to I/o the keys and values computed in the prefill stage for every output token