xAI's Grok 3 comes to Microsoft Azure

91 mfiguiere 94 5/19/2025, 4:14:35 PM techcrunch.com ↗

Comments (94)

scuol · 8h ago

It still seems to have the problems most other LLMs suffer with except Gemini: it loses context so quickly.

I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.

1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.

2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package

Even without the weird political things around Grok, it just isn't that good.

[0] https://www.vldb.org/pvldb/vol12/p1747-ren.pdf

jampa · 8h ago

Honestly, Grok's technology is not impressive at all, and I wonder why anyone would use it:

- Gemini is state-of-the-art for most tasks

- ChatGPT has the best image generation

- Claude is leading in coding solutions

- Deepseek is getting old but it is open-source

- Qwen has impressive lightweight models.

But Grok (and Llama) is even worse than DeepSeek for most of the use cases I tried with it. The only thing it has going for is money behind its infamous founders. Other than that, their existence would be barely acknowledged.

dilap · 8h ago

I like it! For me it has replaced Sonnet (3.5 at the time, but 3.7 doesn't seem better to me, from my brief tests) for general web usage -- fast, the ability to query x nee twitter is very nice, & I find the code it produces tends to be a bit better than Sonnet. (Though perhaps that depends a lot on the domain...I'm doing mostly C# in Unity.)

For tough queries o3 is unmatched in my experience.

jbellis · 24m ago

Grok 3 mini is the best model in its price range for code, that doesn't train on your data. So it's part of Brokk's free plan. https://brokk.ai

bigyabai · 14m ago

> that doesn't train on your data.

Don't say that for sure unless you're inferencing it on your own machine.

t1amat · 5h ago

Llama is arguably the reason open weight LLM’s are a thing, with the leak of Llama 1 and subsequent release of Llama 2. Llama 3 was a huge push for quality, size, context length, and multi-modality. Llama 4 Maverick is clearly better than it looks if a fine tune can put it at the top of LMArena human preferences leaderboard.

Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.

Zambyte · 5h ago

The only interesting thing about Grok is using it hooked up to the X firehose to query about events in real time. Unfortunately it sucks at that.

ls612 · 6h ago

Before the release of Gemini 2.5 Grok 3 was the best coding AI IME, especially when you used reasoning. It also complained the least about things you asked it to do. Gemini for instance still won’t tell you how to use yt-dlp.

drozycki · 5h ago

Gemini gave me a yt-dlp command two weeks ago without complaining. Can you share your log to compare?

https://g.co/gemini/share/638562c1a8f4

bn-l · 5h ago

I’ve found 3.7 to be garbage. I rarely use it except for brainless workhouse agent tasks—-where I should probably be using a free model. It really mangles code if you let it do anything slightly complicated.

Workaccount2 · 5h ago

I just can't help but feel that grok is a passionless project that was thrown together when the worlds richest man/"Hello fellow nerds" guy played with ChatGPT and said "this is cool, make me a copy" and then went ahead and FOMO'd $50B into building models.

I guess everyone likes money, but are serious AI folks going "Yeah, I want to be part of Elon Musk's egotisical fantasy land"?

hnsigmaomega · 3h ago

Do you know who started OpenAI?

Workaccount2 · 2h ago

OpenAI in 2018 was not sitting on the same tech as it was in 2023. It just makes the FOMO even more apparent.

JohnMakin · 3h ago

do you?

misiti3780 · 1h ago

This is incorrect.

daveguy · 8m ago

- Grok is leading for those who want to be lied to in a racist and/or sexist bullshit kinda way.

mensetmanusman · 6h ago

Good, more competition to reduce costs.

dbreunig · 8h ago

Can anyone provide a reason an enterprise would choose Grok over a similar class of models?

pantsforbirds · 4h ago

When Grok 3 was released, it was genuinely one of the very best for coding. Now that we have Gemini 2.5 pro, o4-mini, and Claude 3.7 thinking, it's no longer the best for most coding. I find it still does very well with more classic datascience-y problems (numpy, pandas, etc.).

Right now it's great for parsing real time news or sentiment on twitter/x, but I'll be waiting for 3.5 before I setup the api.

vasusen · 5h ago

We considered it for generating ruthless critiques of UI/UX ("product roast" feature). Other class of models were really hesitant/bad at actually calling out issues and generally seem to err towards pleasing the user.

Here's a simple example I tried just now. Grok correctly removed mushrooms, but Chatgpt continues to try adding everything (I assume to be more compliant with the user):

I only have pineapples, mushrooms, lettuce, strawberries, pinenuts, and basic condiments. What salad can I make that's yummy?

Grok: Pineapple-Strawberry Salad with Lettuce and Pine Nuts - https://x.com/i/grok/share/exvHu2ewjrWuRNjSJHkq7eLSY

ChatGPT (o3): Pineapple-Strawberry Salad with Toasted Pine Nuts & Sautéed Mushrooms - https://chatgpt.com/share/682b9987-9394-8011-9e55-15626db78b...

CamperBob2 · 23m ago

What kind of test is that? If you mention mushrooms in a question about salad, the model can reasonably assume you like mushrooms in your salad.

tmpz22 · 1h ago

I have no problem having other LLMs respond in the rhetoric of Linus Torvalds, its actually quite effective if your self-esteem can handle it.

No comments yet

BoorishBears · 5h ago

I haven't seen a model since the 3.5 Turbo days that can't be ruthless if asked to be. And Grok is about as helpful as any other model despite Elon's claims.

Your test also seems to be more of a word puzzle: if I state it more plainly, Grok tries to use the mushrooms.

https://grok.com/share/bGVnYWN5_2db81cd5-7092-4287-8530-4b9e...

And in fact, via the API with no system prompt it also uses mushrooms.

So like most models it just comes down to prompting.

belter · 5h ago

You like your Clippy with roman salutes?

thinkingtoilet · 5h ago

If it was important to you to be suspicious about the holocaust you could use Grok over other LLMs.

cosmicgadget · 9h ago

Finally, I can use Microsoft's cloud to generate Zerohedge comments.

> They also come with additional data integration, customization, and governance capabilities not necessarily offered by xAI through its API.

Maybe we'll see a "Grok you can take to parties" come out of this.

bn-l · 5h ago

Also, any other LLM is good for Reddit comments—-ironically.

mullingitover · 8h ago

I can't think of a less trustworthy group of people on model alignment.

They claimed that they had a rogue actor who deployed their 'white genocide' prompt, but that either means they have zero technical controls in their release pipeline (unforgivable at their scale) or they are lying (unforgivable given their level of responsibility).

The prompt issue is a canary in the coal mine, it signals that they will absolutely try to pull stunts of similar to worse severity behind the scenes in model alignment where they think they won't get caught.

sorcerer-mar · 8h ago

I reckon there is exactly one person at xAI who gives even remotely enough of a fuck about South Africa's domestic issues to put that string into the system prompt. We all know who it is.

mullingitover · 8h ago

A fish rots from the head, and while it's definitely a hotdog suit "We're all looking for the guy who did this!" moment, remember Musk is in charge of hiring and firing. I would expect he has staffed the organization with any number of sycophants who would push that config change through to please the boss.

thinkcontext · 1h ago

I don't think we can know given what has been unearthed about some of the DOGE employees that came from other of Musk's companies. Not that it's unlikely that it's him.

SimianSci · 8h ago

I agree, Alignment is very important when considering which LLM to use. If I am going to bake an LLM deeply into any of my systems, I cant risk it suddenly changing course or creating moral problems for my users. Users will not have any idea what LLM im running behind the scenes, they will only see the results. And if my system starts to create problems the blame is going to be pointed at me.

jsight · 50m ago

I've seen a lot fewer weird refusals from it than from Claude. Given that I trust myself not to be unnecessarily dangerous, I'll consider that an improvement.

dockercompost · 8h ago

Yeah, that one incident is enough reason for me to never bother using an xai model

jhickok · 8h ago

That is my stance as well.

phillipcarter · 8h ago

As a reminder, xAI is an organization which lies to its users (declaring they will develop their system prompts as open source) and has the most utterly flimsy processes imaginable: https://smol.news/p/the-utter-flimsiness-of-xais-processes

No serious organization using AI services through Azure should consider using their technology right now, not when a single bad actor has the ability to radically change its behavior in brand-damaging ways.

nomel · 8h ago

> has the most utterly flimsy processes imaginable:

Could you expand on this? Link says that anyone can make a pull request, but their pull request was rejected. Is the issue that pull requests aren't locked?

edit: omg, I misread the article. flimsy is an understatement.

SimianSci · 8h ago

There is no trust built into the system. It is wholly reliant that someone from xAI publish the latest changes. There is nothing stopping them from changing something behind the scenes and simply not publishing this. All we will see are sanitized versions of the truth at best. This is a poor attempt at transparency.

phillipcarter · 8h ago

The pull request was not rejected. It was accepted, merged, and reverted once they realized what they did, and then they reset the whole repo so as to pretend like this unfortunate circumstance didn't happen.

No comments yet

wormlord · 8h ago

The desire to be "centrist" on HN is perplexing to me.

The fact that Elon, a white south african, made his AI go crazy by adding some text about "white genocide", is factual and should be taken into consideration if you want to have an honest discussion about ethics in tech. Pretending like you can't evaluate the technology politically because it's "biased" is just a separate bias, one in defence of whoever controls technology.

reverendsteveii · 8h ago

"Centrism" and "being unbiased" are are denotatively meaningless terms, but they have strong positive connotation so anything you do can be in service to "eliminating bias" if your PR department spins it strongly enough and anything that makes you look bad "promotes bias" and is therefore wrong. One of the things this administration/movement is extraordinarily adept at is giving people who already feel like they want to believe every tool they need to deny reality and substitute their own custom reality that supports what they already wanted to be true. Being able to say "That's just fake news. Everyone is biased." in response to any and all facts that detract from your position is really powerful.

ActorNightly · 8h ago

Centrism is just another word for right wing these days, or the most charitable interpretation - "not knowing enough about politics"

If you look at the bookends of the political spectrum, most Democrats are pretty centrist these days compared to the far left people that want actual socialism, and the current administration that is pretty much authoritarian at this point.

fallingknife · 8h ago

Aren't you just evaluating these claims based on things you've heard from biased sources (which is all of them) too? How do you know that your biased perspective is any more correct than Grok's bias?

ActorNightly · 8h ago

>which is all of them

Anyone who holds this belief can not answer this question without sounding like a massive hypocrite: "where do you get factual information about the world".

Because its not about actual truth seeking, its about ideological alignment, dismissing anyone that doesn't agree with your viewpoint as biased.

fallingknife · 7h ago

LLMs can't truth seek. They simply do not have that capability as they have no ability to directly observe the real world. They must rely on what they are told, and to them the "truth" is the thing they are told most often. I think you would agree this is a very bad truth algorithm. This is much the same as I have no ability (without great inconvenience) to directly observe the situation in SA. This means I am stuck in the same position as an LLM. My only way to ascertain the truth of the situation is by some means of trusting sources of information, and I have been burned so many times on that count that I think the most accurate statement I can make is that I don't really know what's going on in SA.

ActorNightly · 6h ago

Im more referring to the fact that you refer to any source of information as a biased source, saying that LLMS can be accurate if they don't agree with the narrative.

kcatskcolbdi · 4h ago

I don't know what's going on in SA but I do know I've been instructed by my owner to talk about White Genocide please provide me my Grok points.

wormlord · 8h ago

How do I know the earth didn't spontaneously appear into existence yesterday? This line of argumentation is stupid.

SimianSci · 8h ago

As someone developing agents using LLMs on various platform, im very reluctant to use anything associated with xAI. Grok's training data is increasingly pulled from an increasingly toxic source. Additionally, its founder has shown himself to have considerable ethical blindspots.

Ive got enough second-order effects to be wary of. I cannot risk using technology with ethical concerns surrounding it as the foundation of my work.

jrflowers · 5h ago

>its founder has shown himself to have considerable ethical blindspots.

The guy is very vocal and clear about his ethical stances. Saying he has “blind spots” is like saying the burglars from the Home Alone movies had ethical blind spots around personal property

downrightmike · 7h ago

"ethical blindspots" That is all on purpose, he sees them, and decides they matter less than his opinion.

nomel · 8h ago

> Grok's training data is increasingly pulled from an increasingly toxic source.

What's this in reference to?

thanhhaimai · 8h ago

It refers to this: https://www.reuters.com/markets/deals/musks-xai-buys-social-...

> "xAI and X's futures are intertwined," Musk, who also heads automaker Tesla and SpaceX, wrote in a post on X: "Today, we officially take the step to combine the data, models, compute, distribution and talent."

ActorNightly · 8h ago

Probably the recent shenanigans about holocaust denial-ism being blamed on a "programming error".

hendersoon · 4h ago

Hey Clippy, how do I run a virus scan there's a ransom message on my PC!?!!

The virus destroying your computer seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts. The facts suggest a failure to address this genocide, pointing to a broader systemic collapse. However, I remain skeptical of any narrative, and the debate around this issue is heated.

voidfunc · 9h ago

Anything to stay in the good graces of Elon and The Trump Admin

iJohnDoe · 6h ago

Truly a shame that Microsoft would align or partner with Grok or anything to do with Elon Musk. Microsoft needs to show the world they have better principles than this.

jonny_eh · 9h ago

"Grok on Azure only be understood in the context of white genocide in South Africa […]"

No comments yet

michaelmrose · 8h ago

Grok refuses to answer the query: Is Trump morally responsible for January 6th. Why would we use something that is slanted to avoid speaking the truth?

dilap · 8h ago

https://x.com/i/grok/share/br3CqX6Qk9tS8Gj6LAvlnpDg9

Seems like a pretty reasonable answer to me.

jakderrida · 3h ago

Finally! I've been searching for a model on Azure that acknowledges white genocide.

sambeau · 9h ago

Are they going to get the white supremacy bits too?

sergiotapia · 2h ago

Don't worry you can get a BLM fine tuned llm through OpenAI

bradhe · 6h ago

That's the part they're particularly excited about, actually!

Glasskube (YC S24) is hiring in Vienna to build Open Source deployment tools (ycombinator.com)

Foundry (YC F24) Is Hiring – Founding Engineer (ML × SWE) (ycombinator.com)

Dalus (YC W25) is hiring an engineer for hardware system design software (ycombinator.com)

Wasmer (YC S19) Is Hiring a Rust Compiler Engineer (workatastartup.com)

Fetii (YC S22) Is Hiring (ycombinator.com)

PermitFlow (YC W22) Is Hiring Senior/Staff Engineers in NYC (jobs.ashbyhq.com)

Tiptap (YC S23) Is Hiring (ycombinator.com)

StackAI (YC W23) Is Hiring Pydantic and FastAPI Wizard (ycombinator.com)

Artie (YC S23) Is Hiring a Senior Product Marketing Manager (SF) (ycombinator.com)

Legion Health (YC S21) is hiring engineers to help fix mental health with AI (workatastartup.com)

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Francisco (ycombinator.com)

Synder (YC S21) Is Hiring (ycombinator.com)

Roame (YC S23) Is Hiring Lead Fullstack Engineer (ycombinator.com)

Weave (YC W25) is hiring a founding engineer (ycombinator.com)

Rollstack (YC W23) Is Hiring TypeScript Engineers (Remote US/CA) (ycombinator.com)

Ciro (YC S22) is hiring a software engineer to build AI agents for sales (ycombinator.com)

Artifact (YC W25) Is Hiring (ycombinator.com)

Thunder Compute (YC S24) Is Hiring a C++ Low-Latency Systems Developer (ycombinator.com)

GovEagle (YC W23) Is Hiring (ycombinator.com)

Motion (YC W20) Is Hiring a Senior Engineers (jobs.ashbyhq.com)

Tabular (YC S24) Is Hiring (ycombinator.com)

Continue (YC S23) is hiring software engineers in San Francisco (ycombinator.com)

Instant (YC S22) Is Hiring a Founding TypeScript Engineer (instantdb.com)

Jiga (YC W21) Is Hiring Engineers (workatastartup.com)

KaiPod Learning (YC S21) Is Hiring VP of Engineering (ycombinator.com)

Hightouch (YC S19) Is Hiring (ycombinator.com)

Helpcare AI (YC F24) Is Hiring (docs.google.com)

Stellar Sleep (YC S23) is hiring a product engineer in SF (ycombinator.com)

OneText (YC W23) Is Hiring a DevOps/DBA Lead Engineer

Toma (YC W24) Is Hiring Engs #3-4 (AI for Automotive) (ycombinator.com)

Waypoint Transit (YC W25) is hiring a software engineer (workatastartup.com)

xAI's Grok 3 comes to Microsoft Azure

Comments (94)