Show HN: I built this to talk Danish to my girlfriend – works with any language (menerdu.vercel.app)

I know this question gets asked every now and then, but I'm curious what the latest recommendation is for handling AI usage in AI dependent applications. For reference I'm building something that processes real time data on demand with each query. Each query will use an average of ~50k tokens. As data will change per query, I will not benefit from caching. I'm trying to figure out how to fairly charge users for AI usage in a simple way without running in the negative.

A couple of thoughts off the top of my head:

1. Credit based pricing. Base service price + included "credits" per month w/ ability to purchase additional credits. I see this the most commonly. But it gets pretty confusing what a credit actually means. What if I want a follow up question, is that 0.5 credits? Or what about using a reasoning model, is that 2 credits? What if I offer multiple providers, does OpenAI cost 1.5 credits while Gemini costs 1 credit? Do credits rollover per month? Do they expire?

2. Same as above, but instead credits are actual $USD. Since every API returns how many tokens were used per query, it's easy to calculate how much each question costs. Essentially the same way any AI provider's API works. It would be easy to relay the cost to the end user and show an estimation of exactly how much each query might cost. This allows users to make as many queries as they'd like. If they run out of credits, they can just top up. However, seeing a usage meter and the cost per query might be off putting to the user seeing their balance drain with each question they ask-- as if they're losing something each time they ask a question.

3. Eat the cost and add generic limits. Base service price + avg cost of anticipated AI usage. Similar to how AI providers' chat bots work. You pay a base price with a token bucket rate limiter. Makes sense if you own the API, but gets confusing as soon as you have more than 1 provider with different pricing. This one seems like the best because you can impose arbitrary limits and adjust them as needed. The one drawback is that it punishes power users. If a user heavily relies on this application, I want them to be able to use it as much as they'd like without running into rate limits. Maybe have multiple plans for extended limits? Not my preferred approach, but might be the best option imo.

4. BYOK Hybrid - bring your own key in addition to #3 above (doesn't make sense for #1 or #2). Regular users can just use the application as needed as mentioned in #3 while power users can bring their own key. I'd love to be able to offer this, but this brings great responsibility to properly store the user's API key. Are there any other drawbacks to BYOK? The only one I can think of is that your system prompt can be leaked if a provider has logs. Luckily there isn't really anything special in my prompt; the bulk of it is just the context which is not easily replicable.

While #2 logically makes the most sense, it doesn't provide the best user experience. I am leaning towards #3/4 right now. Is there anything I missed or flaws with this approach? What has been working for you guys?

Comments (1)

yamatokaneko · 49m ago

As a user, the issue with 1 and 2 is that you're constantly reminded of the cost, which discourages usage.

Personally, I prefer 3, even though I know I might be wasting some value by not hitting the limit each month. I think it’s because I know exactly how much I’ll pay upfront.

That said, when designing pricing and limits for 3, it’s important to ensure most users don’t hit the cap too quickly. Hitting the limit the day after paying would be a terrible experience. Finding the sweet spot % would be intersting.

Show HN: Ten years of running every day, visualized (nodaysoff.run)

Show HN: FFmpeg in plain English – LLM-assisted FFmpeg in the browser (vidmix.app)

Show HN: A Raycast-compatible launcher for Linux (github.com)

Show HN: Learn LLMs LeetCode Style (github.com)

Show HN: I built this to talk Danish to my girlfriend – works with any language (menerdu.vercel.app)

Show HN: I built an LLM chat app because we shouldn't need 10 AI subscriptions (prismharmony.com)

Show HN: A Lisp for code generation and metaprogramming in non-Lisp languages (antilisp.com)

Show HN: ArchGW – an intelligent edge and service proxy for agents

Show HN: I made a JSFiddle-style playground to test and share prompts fast (langfa.st)

Show HN: c0admin – A terminal-based AI assistant for Linux sysadmins (github.com)

Show HN: A Browser-Only Dream Interpreter Using Symbol Logic and JavaScript (github.com)

Show HN: Type-safe PostgreSQL helpers for Kysely – arrays, JSONB, and vector ops (github.com)

Show HN: I wrote backend editor that adds AI agents and database to Lovable UIs (youtube.com)

Show HN: I made a free tool to sync Strava activities with your calendar (stravatocalendar.com)

Show HN: Vibe Kanban – Kanban board to manage your AI coding agents (github.com)

Show HN: CMS-like editing for Markdown with contenteditable and 100 lines of JS (mattismegevand.com)

Show HN: DesignArena – crowdsourced benchmark for AI-generated UI/UX (designarena.ai)

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels (github.com)

Show HN: The simplest way to use MCP. local-first. 100% open source (director.run)

Show HN: An open-source, Android app for discovering privacy-respecting software (github.com)

Show HN: I built a toy music controller for my 5yo with a coding agent (github.com)

Show HN: RULER – Easily apply RL to any agent (openpipe.ai)

Show HN: Clu3 – Team up with GPTs in a 2v2 game of codenames (clu3.juliakzl.com)

Show HN: Pyhoff – Connect Python ML Models to Beckhoff/WAGO IO Hardware (github.com)

Show HN: HNping 'remind me later' for HN via web push (hnping.com)

Show HN: OffChess – Offline chess puzzles app (offchess.com)

Show HN: I added Game of Life to my Portfolio Website and it's so cool (kuber.studio)

Show HN: Sohri – Turn short stories into binge-able audio episodes (sohri.ai)

Show HN: Cactus – Ollama for Smartphones (github.com)

Show HN: An educational Local Qwen3 LLM Inference project written in Rust (github.com)

Show HN: FlopperZiro – A DIY open-source Flipper Zero clone (github.com)

Show HN: CXXStateTree – A modern C++ library for hierarchical state machines (github.com)

Show HN: Interactive pinout for the Raspberry Pi Pico 2 (pico2.pinout.xyz)

Show HN: We developed an AI tool to diagnose car problems (autoai.help)

Show HN: MCP server for searching and downloading documents from Anna's Archive (github.com)

Show HN: Open source alternative to Perplexity Comet (browseros.com)

Show HN: Microsoft official MCP for documentation and more (github.com)

Show HN: I built a playground to showcase what Flux Kontext is good at (fluxkontextlab.com)

Show HN: BinaryRPC – Lightweight WebSocket-based RPC framework in modern C++ (github.com)

Show HN: NYC Subway Simulator and Route Designer (buildmytransit.nyc)

Show HN: 0xDEAD//Type – A Fast-Paced Typing Shooter with Retro Vibes (0xdeadtype.theden.sh)

Show HN: Typeform was too expensive so I built my own forms (ikiform.com)

Show HN: FluidAudio – Swift Speaker Diarization on CoreML (github.com)

Show HN: Magnetar, a high-perf BitTorrent parsing and manipulation library (github.com)

Show HN: asyncmcp – Run MCP over async transport via AWS SNS+SQS (github.com)

Show HN: BreakerMachines – Modern Circuit Breaker for Rails with Async Support (github.com)

Show HN: I build an iOS App for parents to plan meal, create recipes, lunchboxes (apps.apple.com)

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics (alpha.lisagui.com)

Show HN: Petrichor – a free, open-source, offline music player for macOS (github.com)

Show HN: I Built a Stick-On Wireless Lamp That Installs in 30 Seconds (shopinfinitylamp.store)

Ask HN: How do you handle charging users for AI usage?

Comments (1)