GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it (garymarcus.substack.com)

Hey HN! We're building a profiler for ML inference that actually shows what's happening at the hardware level without having to manually parse through flame graphs, or set up nsys and ncu.

The problem: Current ML profilers either dump too much data (torch.profiler) or abstract away the details you need. You can't see why your model is actually slow - is it memory bandwidth? Kernel launch overhead? Cache misses?

Our approach: We're reverse engineering GPU execution to trace from Python ops down to PTX instructions. One decorator gives you the full execution graph with actual bottlenecks highlighted.

Technical details: - Traces Python → CUDA kernels → PTX with timing breakdowns - Shows memory access patterns and bandwidth utilization - Kernel occupancy and scheduling analysis - Works with PyTorch/JAX, TensorFlow coming

We used this to optimize Llama inference and found bottlenecks we couldn't see before - got 50%+ speedup: https://www.herdora.com/blog/the-overlooked-gpu

Free beta with 10 hours of profiling: https://keysandcaches.com Github: https://github.com/Herdora/kandc Docs: https://www.keysandcaches.com/docs

Curious what inference bottlenecks others are hitting that current tools can't diagnose. What's your experience with existing profilers? Would be very useful to hear thoughts from the community :)

Comments (0)

No comments yet

AI's Overlooked $97B Contribution to the Economy (wsj.com)

Show HN: Hacker5News is now web available (hacker5news.duckdns.org)

Where Are They? (2008) (nickbostrom.com)

Efficient Strategies for Microglia Replacement in the Central Nervous System (sciencedirect.com)

Show HN: AI Coloring Pages Generator (colori.io)

EPA Registers Novel(dsRNA) Pesticide Technology for Potato Crops (epa.gov)

Nanowhisker glue uses ultrasound to form resilient bonds (phys.org)

Just Buy Nothing: A fake online store to combat shopping addiction (justbuynothing.com)

Tiny Awards 2025 voting is now open (tinyawards.net)

Musicians do not demonstrate long-believed advantage in processing sound (michiganmedicine.org)

GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it (garymarcus.substack.com)

Interactive UI Components for Django using Htmx (github.com)

Steve Wozniak's Perforated Pads of $2 Bills (coinbooks.org)

Ask HN: How do you pronounce "gradlew"?

Show HN: Connective, Back to the Roots (connective-app.com)

Fitness Landscape (baku89.com)

The Welfare Costs of Low-Friction Idea Production (gojiberries.io)

From GPT-2 to GPT-OSS: Analyzing the Architectural Advances (magazine.sebastianraschka.com)

Show HN: PromptMap – map .NET solutions into AI-friendly context (github.com)

The End of Violence: Why it is a disease and how it can be cured (2026) (penguin.co.uk)

California man's plane keeps getting stolen, taken, repaired and returned (the-independent.com)

About SimulateAI (Quick Overview) (youtube.com)

Your LLM Knows the Future (machinelearning.apple.com)

Prompt injection engineering for attackers: Exploiting GitHub Copilot (blog.trailofbits.com)

Roo Code Workflow: An Advanced LLM-Powered Development Setup (gist.github.com)

Suitely – Your C-Suite, Reimagined by AI (suitely.prisen.co)

Keep the Terminal Relevant: Patterns for AI Agent Driven CLIs – InfoQ (infoq.com)

The rise of the AI native employee (elenaverna.com)

How Streamplace Works: No Microservices (blog.stream.place)

Show HN: Math4Fun – Generate kids' math worksheets from their favorite topics (app.math4fun.io)

GPTs and Feeling Left Behind (whynothugo.nl)

Why Understanding AI Doesn't Necessarily Lead People to Embrace It (hbr.org)

Debian GNU/Hurd 2025 released (lists.gnu.org)

'Can you tell us how he died?': Mo Salah criticises Uefa (theguardian.com)

Every company has the same hiring criteria (ethanding.substack.com)

Beating Caves of Qud as a Steaming Vent [video] (youtube.com)

A Vulkan on Metal Mesa 3D Graphics Driver (lunarg.com)

Ask HN: Will LLM API costs be negligible in a year?

Formula E wraps up season 11–where does the all-EV series go next? (arstechnica.com)

The Greatest Revolution Since the Printing Press? (1980) [video] (youtube.com)

Microsoft Office Lens getting the axe (techcrunch.com)

Armenia at the Crossroads (nopolitik.substack.com)

PCIe 8.0 Announced by the PCI-Sig Will Double Throughput Again – ServeTheHome (servethehome.com)

UCI-Express Cranks Up Chiplet Interconnect Speeds (nextplatform.com)

Kilo Code: Open-Source VS Code AI Agent- Merged Features from Roo Code and Cline (github.com)

Study: Teens 12+ see OnlyFans as an appealing alternative to traditional work (psypost.org)

Thoughts on ChatGPT-5? Especially Around Accuracy, Critical Thinking, and Memory

Why do we even need SIMD instructions? (lemire.me)

Fixed Points via Logically Contractive Maps (lightcapai.medium.com)

Fixing a loud PSU fan without dying (chameth.com)

Show HN: GPU Profiling That's Useful in 60 Seconds

Comments (0)