Grok Code Fast 1

166 Terretta 96 8/29/2025, 1:01:45 PM x.ai ↗

Comments (96)

NitpickLawyer · 2h ago
Tested this yesterday with Cline. It's fast, works well with agentic flows, and produces decent code. No idea why this thread is so negative (also got flagged while I was typing this?) but it's a decent model. I'd say it's at or above gpt5-mini level, which is awesome in my book (I've been maining gpt5-mini for a few weeks now, does the job on a budget).

Things I noted:

- It's fast. I tested it in EU tz, so ymmv

- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.

- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.

- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.

All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.

IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.

jameshart · 4m ago
If the Grok brand wasn’t terminally tarnished for you by the ‘mechahitler’ incident, I’m not sure what more it would take.

This is an offering being produced by a company whose idea of responsible AI use involves prompting a chatbot that “You spend a lot of time on 4chan, watching InfoWars videos” - https://www.404media.co/grok-exposes-underlying-prompts-for-...

A lot of people rightly don’t want any such thing anywhere near their code.

matt-p · 5m ago
It does totally ridiculous things, very fast. That's not a good thing.

I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..

coder543 · 28m ago
Qwen3-Coder-480B hosted by Cerebras is $2/Mtok (both input and output) through OpenRouter.

OpenRouter claims Cerebras is providing at least 2000 tokens per second, which would be around 10x as fast, and the feedback I'm seeing from independent benchmarks indicates that Qwen3-Coder-480B is a better model.

sdesol · 2m ago
As a bit of a side note, I want to like Cerebras, but using any of the models through OpenRouter that uses them has lead to, too many throttling responses. Like you can't seem to make a few calls per minute. I'm not sure if Cerebras is throttling OpenRouter or if they are throttling everybody.

If somebody from Cerebras is reading this, are you having capacity issues?

aitchnyu · 43m ago
Is 50% of context length considered high performance? Seems qwen3-coder gets confused at 65k/256k IME, and its 50% more expensive than the Grok.
dlachausse · 1h ago
> No idea why this thread is so negative (also got flagged while I was typing this?)

Grok is owned by Elon Musk. Anything positive that is even tangentially related to him will be treated negatively by certain people here. Additionally, it is an AI coding tool which is seen as a threat to some people’s livelihoods here. It’s a double whammy, so I’m not surprised by the reaction to it at all.

cosmicgadget · 52m ago
Elon aside, Grok has its own reputation issues.
seunosewa · 46m ago
And this model is arguably their least impressive model.
dewey · 1h ago
Claude Code threads are full of excited people so I’m not sure the second part is true.
supriyo-biswas · 1h ago
If we accept the "broken windows" theory, it'd seem that people love to pile onto a thread that already has negativity.

See also the Microsoft threads on HN where everyone threatens to switch to Linux, and by reading them you'd think Linux is finally about to have its infamous glory year on the desktop.

weaksauce · 13m ago
i’d love to switch to linux as a daily driver but the mac cmd shortcuts for text editing and the general well thought out text editing in the system make macos more compelling. i’d love to switch over for gaming on the windows computer but the lack of performance for comparable specs hurts it. drivers are very important.

ive seen some that change it for copy and paste but i don’t think it works for cmd-left right up down. or option those.

jchw · 51m ago
People really are trying to switch to Linux right now, but it won't really matter if it doesn't stick, and spoiler alert, for most people it probably won't stick as a daily driver. Still, it's an interesting sort of unplanned experiment to watch.
fortyseven · 51m ago
It's kind of fascinating that other "certain people" are so casually dismissive of what a piece of trash Musk is. There is no universe where any money or attention that I have available would going to him or any of his endeavors. I don't care if his latest model is giving away free blowjobs, it's still being boosted and financed by a morally bankrupt man-child who, in case you forgot, was complicit in doing serious damage to this country.

No comments yet

boole1854 · 4h ago
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

eterm · 3h ago
It depends how fast.

If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.

It simply enables a different method of interactive working.

Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.

Latency can have critical impact on not just user experience but the very way tools are used.

Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.

34679 · 3h ago
>If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?

_kb · 3h ago
You just need to scale out more. As you approach infinite monkeys, sorry - models, you'll surely get the result you need.
dingnuts · 51m ago
why's this guy getting downvoted? SamA says we need a Dyson Sphere made of GPUs surrounding the solar system and people take it seriously but this guy takes a little piss out of that attitude and he's downvoted?

this site is the fucking worst

postalcoder · 3h ago
Besides being a faster slot machine, to the extent that they're any good, a fast agentic LLM would be very nice to have for codebase analysis.
fmbb · 24m ago
For 10% less time you can get 10% worse analysis? I don’t understand the tradeoff.
giancarlostoro · 3h ago
> If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.

ffsm8 · 3h ago
I thought the current vibe was doing the former to produce the latter and then use the output as the task plan?
giancarlostoro · 3h ago
I don't know what other people are doing, I mostly use LLMs:

* Scaffolding

* Ask it what's wrong with the code

* Ask it for improvements I could make

* Ask it what the code does (amazing for old code you've never seen)

* Ask it to provide architect level insights into best practices

One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.

Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.

ffsm8 · 14m ago
> Ask it what's wrong with the code

That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.

Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.

But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.

Now you can plausibility check the diff and are likely done

But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.

miohtama · 20m ago
I hope in the future tooling and MCP will be better so agents can directly check what functionality exists in the installed package version instead of hallucinations.
dingnuts · 48m ago
> amazing for old code you've never seen

not if you have too much! a few hundred thousand lines of code and you can't ask shit!

plus, you just handed over your company's entire IP to whoever hosts your model

miohtama · 18m ago
It's a fair trade off for smaller companies where IP or the software is necessary evil, not the main unique value added. It's hard to see what evil would anyone do with crappy legacy code.

The IP risks taken may be well worth of productiviry boosts.

jsheard · 3h ago
That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

Rover222 · 2h ago
what's wrong with rapid updates to an app?
ori_b · 47m ago
It's like measuring how fast your car can go by counting how often you clean the upholstery.

There's nothing wrong with doing it, but it's entirely unrelated to performance.

cosmicgadget · 50m ago
They aren't a metric for showing you are better than the competition.
CuriouslyC · 42m ago
For agentic workflows, speed and good tool use are the most important thing. Agents should use tools for things by design, and that can include reasoning tools and oracles. The agent doesn't need to be smart, it just needs a line to someone who is that can give the agent a hyper-detailed plan to follow.
peab · 3h ago
depends for what.

For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.

If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.

With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR

ojosilva · 2h ago
After trying Cerebras free API (not affiliated) which delivers Qwen Coder 480b and gpt-oss-120b a mind boggling ~3000 tps, that output speed is the first thing I checked out when considering a model for speed. I just wish Cerebras had a better overall offering on their cloud, usage is capped at 70M tokens / day and people are reporting that it's easily hit and highly crippling for daily coding.
M4v3R · 3h ago
Speed absolutely matters. Of course if the quality is trash then it doesn't matter, but a model that's on par with Claude Sonnet 4 AND very speedy would be an absolute game changer in agentic coding. Right now you craft a prompt, hit send and then wait, and wait, and then wait some more, and after some time (anywhere from 30 seconds to minutes later) the agent finishes its job.

It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.

It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.

defen · 3h ago
> I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

We already know that in most software domains, fast (as in, getting it done faster) is better than 100% correct.

6r17 · 3h ago
Tbh I kind of disagree ; there are certain use-cases were legitimately speed would be much more interesting such as generating a massive amount of HTML. Tough I agree this makes it look like even more of a joke for anything serious.

They reduce the costs tough !

jml78 · 3h ago
To a point. If gpt5 takes 3 minutes to output and qwen3 does it in 10 seconds and the agent can iterate 5 times to finish before gpt5, why do I care if gpt5 one shot it and qwen took 5 iterations
wahnfrieden · 3h ago
It doesn’t though. Fast but dumb models don’t progressively get better with more iterations.
dmix · 2h ago
That very much depends on the usecase

Different models for different things.

Not everyone is solving complicated things every time they hit cmd-k in Cursor or use autocomplete, and they can easily switch to a different model when working harder stuff out via longer form chat.

giancarlostoro · 3h ago
I'm more curious if its based on Grok 3 or what, I used to get reasonable answers from Grok 3. If that's the case, the trick that works for Grok and basically any model out there is to ask for things in order and piecemeal, not all at once. Some models will be decent at the 'all at once' approach, but when me and others have asked it in steps it gave us much better output. I'm not yet sure how I feel about Grok 4, have not really been impressed by it.
esafak · 3h ago
I agree. Coding faster than humans can review it is pointless. Between fast, good, and cheap, I'd prioritize good and cheap.

Fast is good for tool use and synthesizing the results.

londons_explore · 3h ago
A a a a a a a a a a a a a a a.

At least this comment was written fast.

furyofantares · 3h ago
Fast can buy you a little quality by getting more inference on the same task.

I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.

I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.

dotancohen · 3h ago

  > I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds.
I'd love to hear how you have this set up.
mchusma · 3h ago
This is a nice setup. I wonder how much it helps in practice? I suspect most of the problems opus has for me are more context related, and I’m not sure more models would help. Speculation on my part.
Workaccount2 · 4h ago
Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?

I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.

[1]https://www.reddit.com/user/Suspicious_Store_137/

RedMist · 3h ago
I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.

While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.

ewoodrich · 2h ago
Kilo Code has a free trial of Grok Code Fast 1 and I've had very poor results with it so far. Much less reliable than GPT 5 Mini, which was also faster, ironically.
mwigdahl · 3h ago
Full Self Coding?
ModernMech · 1h ago
Fell Self Coding beta (supervised)
RedMist · 3h ago
No, making edits to an exiting codebase.

(If that's what you meant)

pdabbadabba · 3h ago
I think that was just a joke about "Full Self Driving" -- and how it still doesn't work.
innocentoldguy · 1h ago
What do you mean by “it still doesn’t work“? I never drive anymore because my Tesla does such a fine job of doing it for me.
vunderba · 26m ago
To me, "full self driving" means you can hop in the back seat and have a nap. If you have to keep your hands near the wheel and maintain attention to the road then... shrugs not really the same. IMHO we're in the "uncanny valley" of vehicular automation.
rkomorn · 24m ago
> the "uncanny valley" of vehicular automation

I think this is a very good description of where autonomous vehicles are right now.

cebert · 51m ago
I’ve had good long trips with my Model Y where I didn’t need to intervene once. 4+ hour end of summer road trips.
mplewis · 1h ago
Full Self Coding by next year at the latest
esafak · 3h ago
"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."

Let's see this harness, then, because third party reports rate it at 57.6%

https://www.vals.ai/models/grok_grok-code-fast-1

hrdwdmrbl · 31m ago
It does still compare well against the others: https://www.vals.ai/benchmarks/swebench-2025-08-27
pdntspa · 17m ago
Pretty sure this was the "stealth" model behind Roo Code Sonic (I saw the name Grok Sonic floating around).

It's a good model for implementing instructions but don't let it try to architect anything. It makes terrible decisions.

cendyne · 3h ago
My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.
miohtama · 17m ago
Sounds like it excels at tasks like generating boilerplate.
lvl155 · 21m ago
I seriously question anyone supporting this enterprise with all the motives and agenda behind it.
lysace · 17m ago
Do you similarly question the Chinese AI companies regarding their involvement with the CCP?

(Not a Trump supporter.)

johnfn · 4h ago
Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!
miohtama · 14m ago
Also what's interest is that Grok Code is not a general purpose model: it knows coding only.
Incipient · 3h ago
I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.
Demiurge · 3h ago
I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.

I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.

Demiurge · 2h ago
Sometimes it's obvious, but in this case, why are you downmodding my comment? I'm genuinely curious, what am I saying, that is so offensive or wrong?

No comments yet

cft · 3h ago
I have the same experience, except while I agree that GPT-5 is better than Sonnet 4 for architecture and deep thinking, Sonnet 4 still seems to be better for just banging out code when you have a well-defined and a very detailed plan.
mchusma · 3h ago
Fast is cool! Totally has its place. But I use Claude code in a way right now where it’s not a huge issue and quality matters more.

Opus 4.1 is by far the best right now for most tasks. It’s the first model I think will almost always pump out “good code”. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.

For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldn’t pay that much for just sonnet).

lostsock · 15h ago
Trying this out now via OpenCode. Seems to be pretty good so far, certainly quick! Free for the next week as well which is a bonus
ru552 · 1h ago
This is the model that was code named "Sonic" in Cursor last week. It received tons of praise. Then Cursor revealed it was a model from xAI. Then everyone hated it. :/ I miss the days where we just liked technology for advancement's sake.

*edit Case in point, downvotes in less than 30 seconds

hu3 · 3h ago
Interesting. Available in VSCode Copilot for free.

https://i.imgur.com/qgBq6Vo.png

I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.

threeducks · 3h ago
I have been testing it since yesterday in VS Code and it seemed fine so far. But I am also happy with all the GPT-4 variants, so YMMV.
cft · 4h ago
it's free in Cursor till Sept 2. My experience is subpar so far
giancarlostoro · 4h ago
Its focus seems to be on faster responses, which Grok 3 definitely is good at. I have a different approach to LLMs and coding, I want to understand their proposed solutions and not just paste garbled up code (unless its scaffolded) if you treat every LLM as a piecemeal thing when designing code (or really trying to figure out anything) and go step by step, you get better results from most models.
ceroxylon · 2h ago
According to the model card it is extremely fast, can be hijacked 25% of the time, has access to search tools, and has a propensity for dishonesty.

I also think it is optimistic to think the jailbreak percentage will stay at "0.00" after public use, but time will tell.

https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf

disposition2 · 3h ago
This will probably be a unpopular, wet blanket opinion...

But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.

Looks like they are bringing even more natural gas turbines online...great!

https://netswire.usatoday.com/story/money/business/developme...

zeropointsh · 21m ago
Have you heard of this at all, I thought it was an intresting way to tackle the problem. https://www.datacenterdynamics.com/en/news/elon-musk-xai-mem...
d0gsg0w00f · 3h ago
Where does OpenAI and Anthropic get their water?
tzs · 14m ago
It's not the water that is the big problem here. It is the gas turbines and the location.

They started operating the turbines without permits and they were not equipped with the pollution controls normally required under federal rules. Worse, they are in an area that already led the state in people having to get emergency treatment for breathing problems. In their first 11 months they became one of the largest polluters in an area already noted for high pollution.

They have since got a permit, and said that pollution controls will be added, but some outside monitors have found evidence that they are running more turbines than the permit allows.

Oh, and of course 90% of the people bearing the brunt of all this local pollution are poor and Black.

onlyrealcuzzo · 3h ago
Why can't it suck up water right from the Mississippi and do Once-Through cooling? Isn't it close? There's definitely more than enough water
bearjaws · 3h ago
Yay more garbage code - faster

A hint to all AI companies, nobody wants quickly generated broken code.

gs17 · 2h ago
Yeah, I tried it in Copilot and it's fast, but I'd rather have a 2x smarter model that takes 10x longer. The competition for "fast" is the existing autocomplete model, not the chat models.
dmix · 2h ago
Why wouldn't you want the option for both?

I haven't used Copilot in a while but Cursor lets you easily switch the model depending on what you're trying to do.

Having options for thinking, normal, fast covers every sort of problem. GPT-5 doesn't let you choose which IMO is only helpful for non-IDE type integrations, although even in ChatGPT it can be annoying to get "thinking" constantly for simple questions.

gs17 · 1h ago
I have the option for either, but it's an option I'll never choose. My issue with Copilot wasn't speed, it's quality. The only thing that has to be fast is the text-completion part, which Grok isn't replacing. The code chat/agent part needs to focus on actually being able to do things.
guluarte · 1h ago
Im doing 1000 calculation per second and they're all wrong
echelon · 3h ago
AI coding tools are amazing and if you don't use them, that's fine. But lots of people, myself included, are finding tremendous utility in these models.

I'm getting 30-50% larger code changes in per day now. Yesterday I plumbed six slightly mechanical, but still major changes through our schema, several microservice layers, API client libraries, and client code. I wrote down the change sites ahead of time to track progress: 54. All requiring individual business logic. This would have been tedious without tab complete.

And that's not the only thing I did yesterday.

I wouldn't trust these tools with non-developers, but in our hands they're an exoskeleton. I like them like I like my vim movements.

A similar analogy can be made for the AI graphics design and editing models. They're extremely good time saving tools, but they still require a human that knows what they're doing to pilot them.

mplewis · 1h ago
This is a non-sequitur comment.
echelon · 5m ago
I provided anecdotal evidence, but if you want more I can "show, don't tell" it.

Here's YC's pg:

https://imgur.com/a/internet-DWzJ26B

I'm not an animator and I made that with a few simple tools.

It has a lot of errors that I didn't take time to correct since it was just a silly meme, but do you see how accessible all of this is?

When people with intention and taste use these tools, the results are powerful. (I won't claim that the above videos demonstrate this, but I can certainly do good work with the tools.)