How I code with AI on a budget/free

246 indigodaddy 74 8/9/2025, 10:27:37 PM wuu73.org ↗

Comments (74)

Havoc · 6h ago
For anyone else confused - there is a page 2 and 3 in the post that you need to access via arrow thing at bottom.
radio879 · 6h ago
I am the person that wrote that. Sorry about the font. This is a bit outdated, AI stuff goes at high speed. More models so I will try to update that.

Every month so many new models come out. My new fav is GLM-4.5... Kimi K2 is also good, and Qwen3-Coder 480b, or 2507 instruct.. very good as well. All of those work really well in any agentic environment/in agent tools.

I made a context helper app ( https://wuu73.org/aicp ) which is linked to from there which helps jump back and forth from all the different AI chat tabs i have open (which is almost always totally free, and I get the best output from those) to my IDE. The app tries to remove all friction, and annoyances, when you are working with the native web chat interfaces for all the AIs. Its free and has been getting great feedback, criticism welcome.

It helps the going from IDE <----> web chat tabs. Made it for myself to save time and I prefer the UI (PySide6 UI so much lighter than a webview)

Its got Preset buttons to add text that you find yourself typing very often, per-project state saves of window size of app and which files were used for context. So next time, it opens at same state.

Auto scans for code files, guesses likely ones needed, prompt box that can put the text above and below the code context (seems to help make the output better). One of my buttons is set to: "Write a prompt for Cline, the AI coding agent, enclose the whole prompt in a single code tag for easy copy and pasting. Break the tasks into some smaller tasks with enough detail and explanations to guide Cline. Use search and replace blocks with plain language to help it find where to edit"

What i do for problem solving, figuring out bugs: I'm usually in VS Code and i type aicp in terminal to open the app. Fine tune any files already checked, type what i am trying to do or what problem i have to fix, click Cline button, click Generate Context!. Paste into GLM-4.5, sometimes o3 or o4-mini, GPT-5, Gemini 2.5 Pro.. if its a super hard thing i'll try 2 or 3 models. I'll look and see which one makes the most sense and just copy and paste into Cline in VS Code - set to GPT 4.1 which is unlimited/free.. 4.1 isn't super crazy smart or anything but it follows orders... it will do whatever you ask, reliably. AND, it will correct minor mistakes from the bigger model's output. The bigger smarter models can figure out the details, and they'll write a prompt that is a task list with how-to's and why's perfect for 4.1 to go and do in agent mode....

You can code for free this way unlimited, and its the smartest the models will be. Anytime you throw some tools or MCPs at a model it dumbs them down.... AND you waste money on all the API costs having to use Claude 4 for everything

teiferer · 1h ago
> You can code for free this way

vs

> If you set your account's data settings to allow OpenAI to use your data for model training

So, it's not "for free".

can16358p · 15m ago
Many folks, especially if they are into getting things free, don't really care much about privacy narrative.

So yes, it is free.

Wilder7977 · 9m ago
This is not only a privacy concern (in fact, that might be a tiny part since the code might end up public anyway?). There is an element of disclosure of personal data, there are ownership issues in case that code was not - in fact - going to be public and more.

In any case, not caring about the cost (at a specific time) doesn't make the cost disappear.

bahmboo · 1h ago
I was going to downvote you but you are adding to the discussion. In this context this is free from having to spend money. Many of us don't have the option to pay for models. We have to find some way to get the state of the art without spending our food money.
frankzander · 54m ago
Hm why pay for something when I can get it for free? Being miserly is a skill that can save a lot of money.
hx8 · 16m ago
I live a pretty frugal life, and reached the FI part of FIRE in my early 30s as an averagely compensated software engineer.

I am very skeptical anytime something is 'free'. I specifically avoid using a free service when the company profits from my use of the service. These arrangements usually start mutually beneficial, and almost always become user hostile.

Why pay for something when you can get it for free? Because the exchange of money for service sets clear boundaries and expectations.

racecar789 · 1h ago
Small recommendation: The diagrams on [https://wuu73.org/aicp] are helpful, but clicking them does not display the full‑resolution images; they appear blurry. This occurs in both Firefox and Chrome. In the GitHub repository, the same images appear sharp at full resolution, so the issue may be caused by the JavaScript rendering library.
PeterStuer · 1h ago
Another data point: On Android Chrome they render without problem.
PeterStuer · 1h ago
Very nice article and thx for the update.

I would be very interested in an in dept of your experiences of differences between Roo Code and Cline if you feel you can share that. I've only tried Roo Code (with interesting but mixed results) thus far.

ya3r · 54m ago
Have you seen Microsoft's copilot? It is essentially free openai models
hgarg · 2h ago
Qwen is totally useless any serious dev work.
b2m9 · 1h ago
It’s really hit and miss for me. Well defined small tasks seem ok. But every time I try some “agentic coding”, it burns through millions of tokens without producing anything working.
indigodaddy · 5h ago
Is glm-4.5 air useable? I see it's free on Openrouter. Also pls advise what you think is the current best free openrouter model for coding. Thanks!
radio879 · 4h ago
Well, if you download Qwen Code https://github.com/QwenLM/qwen-code it is free up to 2000 api calls a day.

Not sure if GLM-4.5 Air is good, but non-Air one is fabulous. I know for free API access there is pollinations ai project. Also llm7. If you just use the web chat's you can use most of the best models for free without API. There are ways to 'emulate' an API automatically.. I was thinking about adding this to my aicodeprep-gui app so it could automatically paste and then cut. Some MCP servers exist that you can use and it will automatically paste or cut from those web chat's and route it to an API interface.

OpenAI offers free tokens for most models, 2.5mil or 250k depending on model. Cerebras has some free limits, Gemini... Meta has plentiful free API for Llama 4 because.. lets face it, it sucks, but it is okay/not bad for stuff like summarizing text.

If you really wanted to code for exactly $0 you could use pollinations ai, in Cline extension (for VS Code) set to use "openai-large" (which is GPT 4.1). If you plan using all the best web chat's like Kimi K2, z.ai's GLM models, Qwen 3 chat, Gemini in AI Studio, OpenAI playground with o3 or o4-mini. You can go forever without being charged money. Pollinations 'openai-large' works fine in Cline as an agent to edit files for you etc.

indigodaddy · 3h ago
Very cool, a lot to chew on here. Thanks so much for the feedback!
tonyhart7 · 1h ago
bro you are final boss of free tier users lol
radio879 · 1h ago
damn right !!!!
andai · 6h ago
My experience lines up with the article. The agentic stuff only works with the biggest models. (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)

For simple changes I actually found smaller models better because they're so much faster. So I shifted my focus from "best model" to "stupidest I can get away with".

I've been pushing that idea even further. If you give up on agentic, you can go surgical. At that point even 100x smaller models can handle it. Just tell it what to do and let it give you the diff.

Also I found the "fumble around my filesystem" approach stupid for my scale, where I can mostly fit the whole codebase into the context. So I just dump src/ into the prompt. (Other people's projects are a lot more boilerplatey so I'm testing ultra cheap models like gpt-oss-20b for code search. For that, I think you can go even cheaper...)

Patent pending.

statenjason · 5h ago
Aider as a non-agentic coding tool strikes a nice balance on the efficiency vs effectiveness front. Using tree-sitter to create a repo map of the repository means less filesystem digging. No MCP, but shell commands mean it can use utilities I myself am familiar with. Combined with Cerebras as a provider, the turnaround on prompts is instant; I can stay involved rather than waiting on multiple rounds of tool calls. It's my go-to for smaller scale projects.
hpincket · 1h ago
I am developing the same opinion. I want something fast and dependable. Getting into a flow state is important to me, and I just can't do that when I'm waiting for an agentic coding assistant to terminate.

I'm also interested in smaller models for their speed. That, or a provider like Cerebras.

Then, if you narrow the problem domain you can increase the dependability. I am curious to hear more about your "surgical" tools.

I rambled about this on my blog about a week ago: https://hpincket.com/what-would-the-vim-of-llm-tooling-look-...

wahnfrieden · 1h ago
They don't allow model switching below GPT-5 in codex cli anymore (without API key), because it's not recommended. Try it with thinking=high and it's quite an improvement from o4-mini. o4-mini is more like gpt-5-thinking-mini but they don't allow that for codex. gpt-5-thinking-high is more like o1 or maybe o3-pro.
SV_BubbleTime · 3h ago
> (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)

Let’s keep something in reason, I have multiple times in my life spent days on what would end up to be maybe three lines of code.

bambax · 19m ago
As the post says, the problem with coding agents is they send a lot of their own data + almost your entire code base for each request: that's what makes them expensive. But when used in a chat the costs are so low as to be insignificant.

I only use OpenRouter which gives access to almost all models.

Sonnet was my favorite until I tried Gemini 2.5 Pro, which is almost always better. It can be quite slow though. So for basic questions / syntax reminders I just use Gemini Flash: super fast, and good for simple tasks.

chromaton · 6h ago
If you're looking for free API access, Google offers access to Gemini for free, including for gemini-2.5-pro with thinking turned on. The limit is... quite high, as I'm running some benchmarking and haven't hit the limit yet.

Open weight models like DeepSeek R1 and GPT-OSS are also made available with free API access from various inference providers and hardware manufacturers.

gooosle · 6h ago
Gemini 2.5 pro free limit is 100 requests per day.

https://ai.google.dev/gemini-api/docs/rate-limits

panarky · 4h ago
I'm getting consistently good results with Gemini CLI and the free 100 requests per day and 6 million tokens per day.

Note that you'll need to either authorize with a Google Account or with an API key from AI Studio, just be sure the API key is from an account where billing is disabled.

Also note that there are other rate limits for tokens per request and tokens per minute on the free plan that effectively prevent you from using the whole million token context window.

It's good to exit or /clear frequently so every request doesn't resubmit your entire history as context or you'll use up the token limits long before you hit 100 requests in a day.

tomrod · 5h ago
Doesn't it swap to a lower power model after that?
chiwilliams · 5h ago
I'm assuming it isn't sensitive for your purposes, but note that Google will train on these interactions, but not if you pay.
devjab · 1h ago
I think it'll be hard to find a LLM that actually respects your privacy regardless whether or not you pay. Even with the "privacy" enterprise Co-Pilot from Microsoft with all their promises of respecting your data, it's still not deemed safe enough by leglislation to be used in part of the European energy sector. The way we view LLM's on any subscription is similar to how I imagine companies in the USA views Deepseek. Don't put anything into them you can't afford to share with the world. Of course with the agents, you've probably given them access to everything on your disk.

Though to be fair, it's kind of silly how much effort we go through to protect our mostly open source software from AI agents, while at the same time, half our OT has build in hardware backdoors.

unnouinceput · 3h ago
I agree, Google is definitely the champion of respecting your privacy. Will definitely not train their model on your data if you pay them. I mean you should definitely just film yourself and give them everything, access to your files, phone records, even bank accounts. Just make sure to pay them those measly $200 and absolutely they will not share that data with anybody.
lern_too_spel · 2h ago
You're thinking of Facebook. A lot of companies run on Gmail and Google Docs (easy to verify with `dig MX [bigco].com`), and they would not if Google shared that data with anybody.
d1sxeyes · 1h ago
It’s not really in either Meta or Google’s interests to share that data. What they do is to build super detailed profiles of you and what you’re likely to click on, so they can charge more money for ad impressions.
gexla · 1h ago
Wow, there's a lot here that I didn't know about. Just never drilled that far into the options presented. For a change, I'm happy that I read the article rather than only the comments on HN. ;)

And lots of helpful comments here on HN as well. Good job everyone involved. ;)

reactordev · 6h ago
To the OP: I highly recommend you look into Continue.dev and ollama/lmstudio and running models on your own. Some of them are really good at autocomplete-style suggestions while others (like gpt-oss) can reason and use tools.

It's my goto copilot.

navbaker · 5h ago
Same! I’ve been using Continue in VSCode and found most of the bigger Qwen models plus gpt-oss-120b to be great in agentic mode!
AstroBen · 4h ago
I've found Zed to be a step up from continue.dev - you can use your own models there also
radio879 · 1h ago
really - no monthly subscriptions? i hate those but i am fine with bringing my own API URLs etc and paying. I'm building a router that will track all the free tokens from all the different providers and auto rotate them when daily tokens or time limits run out.

Continue and Zed.. gonna check them out, prompts in Cline are too long. I was thinking of just making my own VS Code extension but I need to try Claude Code with GLM 4.5 (heard it pairs nicely)

indigodaddy · 3h ago
Can you use your GH Copilot subscription with Zed to leverage the Copilot subscription-provided models?
nechuchelo · 3h ago
Yes, you can. IIRC both for the assistant/agent and code completions.
reactordev · 3h ago
Zed is supreme but I have a need that Zed can’t scratch so I’m in VSCode :(
hoerzu · 36m ago
To stop tab switching I built an extension to query all free models all at once: https://llmcouncil.github.io/llmcouncil/
yichuan · 2h ago
I think there’s huge potential for a fully local “Cursor-like” stack — no cloud, no API keys, just everything running on your machine.

The setup could be: • Cursor CLI for agentic/dev stuff (example:https://x.com/cursor_ai/status/1953559384531050724) • A local memory layer compatible with the CLI — something like LEANN (97% smaller index, zero cloud cost, full privacy, https://github.com/yichuan-w/LEANN) or Milvus (though Milvus often ends up cloud/token-based) • Your inference engine, e.g. Ollama, which is great for running OSS GPT models locally

With this, you’d have an offline, private, and blazing-fast personal dev+AI environment. LEANN in particular is built exactly for this kind of setup — tiny footprint, semantic search over your entire local world, and Claude Code/ Cursor –compatible out of the box, the ollama for generation. I guess this solution is not only free but also does not need any API.

But I do agree that this need some effort to set up, but maybe someone can make these easy and fully open-source

No comments yet

joshdavham · 3h ago
> When you use AI in web chat's (the chat interfaces like AI Studio, ChatGPT, Openrouter, instead of thru an IDE or agent framework) are almost always better at solving problems, and coming up with solutions compared to the agents like Cline, Trae, Copilot.. Not always, but usually.

I completely agree with this!

While I understand that it looks a little awkward to copy and paste your code out of your IDE and into a web chat interface, I generally get better results that way than with GitHub copilot or cursor.

SV_BubbleTime · 3h ago
100% opposite experience.

Whether agentic, not… it’s all about context.

Either agentic with access to your whole project, “lives” in GitHub, a fine tune, or RAG, or whatever… having access to all of the context drastically reduces hallucinations.

There is a big difference between “write x” and “write x for me in my style, with all y dependencies, and considering all z code that exists around it”.

I’m honestly not understand a defense of copy and paste AI coding… this is why agents are so massively popular right now.

b2m9 · 51m ago
I’m also surprised by this take. I found copy/paste between editor and external chats to be way less helpful.

That being said, I think everyone has probably different expectations and workflows. So if that’s what works for them, who am I to judge?

qustrolabe · 1h ago
I bet it's crazy to some people that others okay with giving up so much of their data for free tiers. Like yeah it's better to selfhost but it takes so much resources to run good enough LLM at home that I'd rather give up my code for some free usage, anyway that code eventually will end up open source
jama211 · 1h ago
And as far as I’m concerned if my work is happy for me to use models to assist with code, then it’s not my problem
bravesoul2 · 5h ago
Windsurf has a good free model. Good enough for autocomplete level work for sure (haven't tried it for more as I use Claude Code)
b2m9 · 2h ago
You mean SWE-1? I used it like a dozen times and I gave up because the responses were so bad. Not even sure whether it’s good enough for autocomplete because it’s the slowest model I’ve tested in a while.
bravesoul2 · 2h ago
Not my experience for slowness. For smartness I am typically using it for simple "not worth looking that up" stuff rather than even feature implementation. Got it to write some MySQL SQL today, for example.
indigodaddy · 5h ago
Assuming you have to at least be logged into a windsurf account though?
bravesoul2 · 5h ago
Yeah. I didn't see not logged in as a requirement.
tonyhart7 · 1h ago
I replicate SDD from kiro code, it works wonder for multi switching model because I can just re fetch from specs folder
hgarg · 2h ago
Just use Rovodev CLI. Gives you 20 million tokens for free per 24 hours and you can switch between sonnet 4 / gpt-5.
xvv · 2h ago
As of today, what is the best local model that can be run on a system with 32gb of ram and 24gb of vram?
CjHuber · 7h ago
Without tricks google aistudio definitely has limits, though pretty high ones. gemini.google.com on the other hand has less than a handful of free 2.5 pro messages for free
sublinear · 47m ago
This all sounds a lot more complicated and time consuming than just writing the damn code yourself.
andrewmcwatters · 4h ago
I jump between Claude Sonnet 4 on GitHub Copilot Pro and now GPT-5 on ChatGPT. That seems to get me pretty far. I have gpt-oss:20b installed with ollama, but haven't found a need to use it yet, and it seems like it just takes too long on an M1 Max MacBook Pro 64GB.

Claude Sonnet 4 is pretty exceptional. GPT-4.1 asks me too frequently if it wants to move forward. Yes! Of course! Just do it! I'll reject your changes or do something else later. The former gets a whole task done.

I wonder if anyone is getting better results, or comparable for cheaper or free. GitHub Copilot in Visual Studio Code is so good, I think it'd be pretty hard to beat, but I haven't tried other integrated editors.

GaggiX · 6h ago
OpenAI offering 2.5M free tokens daily small models and 250k for big ones (tier 1-2) is so useful for random projects, I use them to learn japanese for example (by having a program that list informations about what the characters are just saying: vocabulary, grammar points, nuances).
cammikebrown · 6h ago
I wonder how much energy this is wasting.
yen223 · 3h ago
Probably not as much as you think: https://www.sustainabilitybynumbers.com/p/ai-energy-demand

You are better off worrying about your car use and your home heating/cooling efficiency, all of which are significantly worse for energy use.

bravesoul2 · 5h ago
Untradable carbon tax (or carbon price for people who hate the T word) is needed.
robotsquidward · 4h ago
Right - free to you maybe.
sergiotapia · 3h ago
who cares. we can build more. energymaxx or the us will become like germany.