Honestly though with that little memory I'd stick to running against hosted LLMs - Claude 3.7 Sonnet, Gemini 2.5 Pro, o4-mini are all cheap enough that it's hard to spend much money with them for most coding workflows.
codetrotter · 4h ago
How about on an MacBook Pro M2 Max with 64GB RAM? Any recommendations for local models for coding on that?
I tried to run some of the differently sized DeepSeek R1 locally when those had recently come out, but couldn’t manage at the time to run any of them. And I had to download a lot of data to try those. So if you know a specific size of DeepSeek R1 that will work on 64GB RAM on MacBook Pro M2 Max, or another great local LLM for coding on that, that would be super appreciated
freeqaz · 4h ago
I imagine that this in quantized form would fit pretty well and be decent. (Qwen R1 32b[1] or Qwen 3 32b[2])
Specifically the `Q6_K` quant looks solid at ~27gb. That leaves enough headroom on your 64gb Macbook that you can actually load a decent amount of context. (It takes extra VRAM for every token of context you need)
Rough math, based on this[0] calculator is that it's around ~10gb per 32k tokens of context. And that doesn't seem to change based on using a different quant size -- you just have to have enough headroom.
So with 64gb:
- ~25gb for Q6 quant
- 10-20gb for context of 32-64k
That leaves you around 20gb for application memory and _probably_ enough context to actually be useful for larger coding tasks! (It just might be slow, but you can use a smaller quant to get more speed.)
I really like Mistral Small 3.1 (I have a 64GB M2 as well). Qwen 3 is worth trying in different sizes too.
I don't know if they'll be good enough for general coding tasks though - I've been spoiled by API access to Claude 3.7 Sonnet and o4-mini and Gemini 2.5 Pro.
jychang · 4h ago
16GB on a mac with unified memory is too small for good coding models. Anything on that machine is severely compromised. Maybe in ~1 year we will see better models that fit in ~8gb vram, but not yet.
Right now, for a coding LLM on a Mac, the standard is Qwen 3 32b, which runs great on any M1 mac with 32gb memory or better. Qwen 3 235b is better, but fewer people have 128gb memory.
Anything smaller than 32b, you start seeing a big drop off in quality. Qwen 3 14b Q4_K_M is probably your best option at 16gb memory, but it's significantly worse in quality than 32b.
> I think this is a game changer, because data privacy is a legitimate concern for many enterprise users.
Indeed. At work, we are experimenting with this. Using a cloud platform is a non-starter for data confidentiality reasons. On-premise is the way to go. Also, they’re not American, which helps.
> Btw, you can also run Mistral locally within the Docker model runner on a Mac.
True, but you can do that only with their open-weight models, right? They are very useful and work well, but their commercial models are bigger and hopefully better (I use some of their free models every day, but none of their commercial ones).
distances · 11h ago
I also kind of don't understand how it seems everyone is using AI for coding. I haven't had a client yet which would have approved any external AI usage. So I basically use them as search engines on steroids, but code can't go directly in or out.
Pamar · 6m ago
Personally I am trying to see if we can leverage AI to help write design documents instead of code, based on a fairly large library of human (poorly) written design documents and bug reports.
fhd2 · 11h ago
You might be able to get your clients to sign something to allow usage, but if you don't, as you say, it doesn't seem wise to vibe code for them. For two reasons:
1. A typical contract transfers the rights to the work. The ownership of AI generated code is legally a wee bit disputed. If you modify and refactor generated code heavily it's probably fine, but if you just accept AI generated code en masse, making your client think that you wrote it and it is therefore their copyright, that seems dangerous.
2. A typical contract or NDA also contains non disclosure, i.e. you can't share confidential information, e.g. code (including code you _just_ wrote, due to #1) with external parties or the general public willy nilly. Whether any terms of service assurances from OpenAI or Anthropic that your model inputs and outputs will probably not be used for training are legally sufficient, I have doubts.
IANAL, and _perhaps_ I'm wrong about one or both of these, in one or more countries, but by and large I'd say the risk is not worth the benefit.
I mostly use third party LLMs like I would StackOverflow: Don't post company code there verbatim, make an isolated example. And also don't paste from SO verbatim. I tried other ways of using LLMs for programming a few times in personal projects and can't say I worry about lower productivity with these limitations. YMMV.
(All this also generally goes for employees with typical employment contracts: It's probably a contract violation.)
jstummbillig · 9h ago
Nobody is seriously disputing the ownership of AI generated code. A serious dispute would be a considerable, concerted effort to stop AI code generation in any jurisdiction, that provides a contrast to the enormous, ongoing efforts by multiple large players with eye-watering investments to make code generation bigger and better.
Note, that this is not a statement about the fairness or morality of LLM building, but to think that the legality of AI code generation is something to reasonably worry about, is betting against multiple large players and their hundreds of billions of dollars in investment right now, and that likely puts you in a bad spot in reality.
reverius42 · 7h ago
> Nobody is seriously disputing the ownership of AI generated code
From what I've been following it seems very likely that, at least in the US, AI-generated anything can't actually be copyrighted and thus can't have ownership at all! The legal implications of this are yet to percolate through the system though.
staunton · 6h ago
Only if that interpretation lasts despite likely intense lobbying to the contrary.
mistrial9 · 5h ago
this is "Kool-aid" from the supply side of LLMs for coding IMO. Plenty of people are plenty upset about the capture of code at Github corral, fed into BigCorp$ training systems.
parent statement reminds me of smug French in a castle north of London circa 1200, with furious locals standing outside the gates, dressed in rags with farm tools as weapons. One well-equipped tower guard says to another "no one is seriously disputing the administration of these lands"
distances · 10h ago
Yes these are indeed the points. I don't really care too much, it would make me a bit more efficient but I'm billing by the hour anyway so I'm completely fine playing by the book.
fhd2 · 10h ago
Not sure I can agree with the "I'm billing by the hour" part.
I mean sure, but I think of my little agency providing value, for a price. Clients have budgets, they have limited benefits from any software they build, and in order to be competitive against other agencies or their internal teams, overall, I feel we need to provide a good bang for buck.
But since it's not all that much about typing in code, and since even that activity isn't all that sped up by LLMs, not if quality and stability matters, I would still agree that it's completely fine.
distances · 9h ago
Yes, it's important of course that I'm efficient, and I am. But my coding speed isn't the main differentiating factor why clients like me.
I meant that I don't care enough to spearhead and drive this effort within the client orgs. They have their own processes, and internal employees would surely also like to use AI, so maybe they'll get there eventually. And meanwhile I'll just use it in the approved ways.
genghisjahn · 10h ago
What about 10 years ago when we all copied code from SO? Did we worry about copyright then? Maybe we did and I don’t recall.
layer8 · 9h ago
“We” took care to not copy it verbatim (it’s the concrete code form that is copyrighted, not the algorithm), and depending on jurisdiction there is the concept of https://en.wikipedia.org/wiki/Threshold_of_originality in copyright law, which short code snippets on Stack Overflow typically don’t meet.
fhd2 · 10h ago
It's roughly the same, legally, and I was well aware of that.
Legally speaking, you also want to be careful about your dependencies and their licenses, a company that's afraid to get sued usually goes to quite some lengths to ensure they play this stuff safe. A lot of smaller companies and startups don't know or don't care.
From a professional ethics perspective, personally, I don't want to put my clients in that position unless they consciously decide they want that. They hire professionals not just to get work done they fully understand, but to a large part to have someone who tells them what they don't know.
genghisjahn · 9h ago
You raise a good point. It was kinda gray in the SO days. You almost always had to change something to get your code to work. But at lot of LLM's can spit out code that you can just paste in. And, I guess maybe the tests all pass, but if it goes wrong, you, the coder probably don't know where it went wrong. But if you'd written it all yourself, you could probably guess.
I'm still sorting all this stuff out personally. I like LLM's when I work in an area I know well. But vibing in areas of technology that I don't know well just feels weird.
pfannkuchen · 9h ago
SO seems different because the author of the post is republishing it. If they are republishing copyrighted material without notice, it seems like the SO author is the one in violation of copyright.
In the LLM case, I think it’s more of an open question whether the LLM output is republishing the copyrighted content without notice, or simply providing access to copyrighted content. I think the former would put the LLM provider in hot water, while the latter would put the user in hot water.
_bin_ · 9h ago
This comes down to a question of what one can prove. NNs are necessary not explainable and none of this would have much evidence to show in court.
fhd2 · 1h ago
Sure there's evidence: Your statements about this when challenged. And perhaps to a degree the commit log, at least that can arouse suspicion.
Sure, you can say "I'd just lie about it". But I don't know how many people would just casually lie in court. I sure wouldn't. Ethics is one thing, it takes a lot of guts, considering the possible repercussions.
mark_l_watson · 11h ago
I have good results running Ollama locally with olen models like Gemma 3, Qwen 3, etc. The major drawback is slower inference speed. Commercial APIs like Google Gemini are so much faster.
Still, I find local models very much worth using after taking the time to set them up with Emacs, open-codex, etc.
abujazar · 6h ago
You can set up your IDE to use local LLMs through e.g. Ollama if your computer is powerful enough to run a decent model.
shmel · 10h ago
How is it different from the cloud? Plenty startups store their code on github, run prod on aws, and keep all communications on gmail anyway. What's so different about LLMs?
layer8 · 9h ago
It’s not different. If you have a confidentiality requirements like that, you also don’t store your code off-premises. At least not without enforceable contracts about confidentiality with the service provider, approved by the client.
simion314 · 9h ago
>How is it different from the cloud? Plenty startups store their code on github, run prod on aws, and keep all communications on gmail anyway. What's so different about LLMs?
Those plenty startups will also use Google, OpenAi or the built in Microsoft AI.
This is clearly for companies that need to keep the sensitive data under their control. I think they also get support with adding more training to the model to be personalized for your needs.
jamessinghal · 10h ago
I think it's a combination of a fundamental distrust of the model makers and a history of them training on user data with and without consent.
The main players all allow some form of zero data retention but I'm sure the more cautious CISO/CIOs flat out don't trust it.
tcoff91 · 10h ago
I think that using something like Claude on Amazon Bedrock makes more sense than directly using Anthropic. Maybe I'm naive but I trust AWS more than Anthropic, OpenAI, or Google to not misuse data.
trollbridge · 11h ago
Most my clients have the same requirement. Given the code bases I see my competition generating, I suspect other vendors are simply violating this rule.
betterThanTexas · 9h ago
I would take any such claim with a heavy rock of salt because the usefulness of AI is going to vary drastically with the sort of work you're tasked with producing.
crimsoneer · 2h ago
Are your clients not on AWS/Azure/GCP? They all offer private LLMs out of the box now.
ATechGuy · 8h ago
Have you tried using private inference that uses GPU confidential computing from Nvidia?
lolinder · 6h ago
Game changer feels a bit strong. This is a new entry in a field that's already pretty crowded with open source tooling that's already available to anyone with the time and desire to wire it all up. It's likely that they execute this better than the community-run projects have so far and make it more approachable and Enterprise friendly, but just for reference I have most of the features that they've listed here already set up on my desktop at home with Ollama, Open WebUI, and a collection of small hand-rolled apps that plug into them. I can't run very big models on mine, obviously, but if I were an Enterprise I would.
The key thing they'd need to nail to make this better than what's already out there is the integrations. If they can make it seamless to integrate with all the key third-party enterprise systems then they'll have something strong here, otherwise it's not obvious how much they're adding over Open WebUI, LibreChat, and the other self-hosted AI agent tooling that's already available.
abujazar · 6h ago
Actually you shouldn't be running LLMs in Docker on Mac because it doesn't have GPU support. So the larger models will be extremely slow if they'll even produce a single token.
burnte · 11h ago
I have an M4 Mac Mini with 24GB of RAM. I loaded Studio.LM on it 2 days ago and had Mistral NeMo running in ten minutes. It's a great model, I need to figure out how to add my own writing to it, I want it to generate some starter letters for me. Impressive model.
nicce · 9h ago
> Btw, you can also run Mistral locally within the Docker model runner on a Mac.
Efficiently? I thought macOS does not have API so that Docker could use GPU.
Hmm, I guess that is not actually running inside container/ there is no isolation. Some kind of new way that mixes llama.cpp , OCI format and docker CLI.
ulnarkressty · 10h ago
I think many in this thread are underestimating the desire of VPs and CTOs to just offload the risk somewhere else. Quite a lot of companies handling sensitive data are already using various services in the cloud and it hasn't been a problem before - even in Europe with its GDPR laws. Just sign an NDA or whatever with OpenAI/Google/etc. and if any data gets leaked they are on the hook.
boringg · 10h ago
Good luck ever winning that one. How are you going to prove out a data leak with an AI model without deploying excessive amounts of legal spend?
You might be talking about small tech companies that have no other options.
v3ss0n · 9h ago
What's the point when we can run much powerful models now?
Qwen3 , Deepseek
_bin_ · 4h ago
It would be short-termist for Americans or euros to use chinese-made models. Increasing their popularity has an indirect but significant cost in the long term. china "winning AI" should be an unacceptable outcome for America or europe by any means necessary.
ATechGuy · 9h ago
Why not use confidential computing based offerings like Azure's private inference for privacy concerns?
Not quite following. It seems to talk about features common associated with local servers but then ends with available on gcp
Is this an API point? A model enterprises deploy locally? A piece of software plus a local model?
There is so much corporate synergy speak there I can’t tell what they’re selling
_pdp_ · 11h ago
While I am rooting for Mistral, having access to a diverse set of models is the killer app IMHO. Sometimes you want to code. Sometimes you want to write. Not all models are made equal.
the_clarence · 2h ago
Tbh I think the one general model approach is winning. People don't want to figure out which model is better at what unless its for a very specific task.
sschueller · 1m ago
Couldn't you could place a very light weight model in front to figure out which model to use?
binsquare · 11h ago
Well that sounds right up the alley of what I built here: www.labophase.com
victorbjorklund · 12h ago
Why use this instead of an open source model?
dlachausse · 12h ago
> our world-class AI engineering team offers support all the way through to value delivery.
victorbjorklund · 7h ago
Guess that makes sense. But I'm sure they charge good money for it and then you could just use that money for someone helping you with an open source model.
I_am_tiberius · 11h ago
I really love using le chat. I feel much more save giving information to them than to openai.
badmonster · 3h ago
interesting take. i wonder if other LLM competitors would do the same.
starik36 · 8h ago
I don't see any mention of hardware requirements for on prem. What GPUs? How many? Disk space?
tootie · 5h ago
I'm guessing it's flexible. Mistral makes small models capable of running on consumer hardware so they can probably scale up and down based on needs. And what is available from hosts.
FuriouslyAdrift · 9h ago
GPT4All has been running locally for quite a while...
This will make for some very good memes. And other good things, but memes included.
iamnotagenius · 11h ago
Mistral models though are not interesting as models. Context handling is weak, language is dry, coding mediocre; not sure why would anyone chose it over Chinese (Qwen, GLM, Deepseek) or American models (Gemma, Command A, Llama).
tensor · 9h ago
Command A is Canadian. Also mistral models are indeed interesting. They have a pretty unique vision model for OCR. They have interesting edge models. They have interesting rare language models.
And also another reason people might use a non-American model is that dependency on the US is a serious business risk these days. Not relevant if you are in the US but hugely relevant for the rest of us.
amai · 10h ago
Data privacy is a thing - in Europe.
tootie · 5h ago
I flip back and forth with Claude and Le Chat and find them comparable. Le Chat always feels very quick and concise. That's just vibes not benchmarks.
guerrilla · 12h ago
Interesting. Europe is really putting up a fight for once. I'm into it.
resource_waste · 11h ago
Expected this comment.
Mistral has been consistently last place, or at least last place among ChatGPT, Claude, Llama, and Gemini/Gemma.
I know this because I had to use a permissive license for a side project and I was tortured by how miserably bad Mistral was, and how much better every other LLM was.
Need the best? ChatGPT
Need local stuff? Llama(maybe Gemma)
Need to do barely legal things that break most company's TOS? Mistral... although deepseek probably beats it in 2025.
For people outside Europe, we don't have patriotism for our LLMs, we just use the best. Mistral has barely any usecase.
omneity · 10h ago
> Need local stuff? Llama(maybe Gemma)
You probably want to replace Llama with Qwen in there. And Gemma is not even close.
> Mistral has been consistently last place, or at least last place among ChatGPT, Claude, Llama, and Gemini/Gemma.
Mistral held for a long time the position of "workhorse open-weights base model" and nothing precludes them from taking it again with some smart positioning.
They might not currently be leading a category, but as an outside observer I could see them (like Cohere) actively trying to find innovative business models to survive, reach PMF and keep the dream going, and I find that very laudable. I expect them to experiment a lot during this phase, and that probably means not doubling down on any particular niche until they find a strong signal.
drilbo · 9h ago
>You probably want to replace Llama with Qwen in there. And Gemma is not even close.
Have you tried the latest, gemma3? I've been pretty impressed with it. Altho I do agree that qwen3 quickly overshadowed it, it seems too soon to dismiss it altogether. EG, the 3~4b and smaller versions of gemma seem to freak out way less frequently than similar param size qwen versions, tho I haven't been able to rule out quant and other factors in this just yet.
It's very difficult to fault anyone for not keeping up with the latest SOTA in this space. The fact we have several options that anyone can serviceably run, even on mobile, is just incredible.
Anyway, i agree that Mistral is worth keeping an eye on. They played a huge part in pushing the other players toward open weights and proving smaller models can have a place at the table. While I personally can't get that excited about a closed model, it's definitely nice to see they haven't tapped out.
omneity · 8h ago
It's probably subjective to your own use, but for me Gemma3 is not particularly usable (i.e. not competitive or delivering a particular value for me to make use of it).
Qwen 2.5 14B blows Gemma 27B out of the water for my use. Qwen 2.5 3B is also very competitive. The 3 series is even more interesting with the 0.6B model actually useful for basic tasks and not just a curiosity.
Where I find Qwen relatively lackluster is its complete lack of personality.
amelius · 9h ago
I certainly had some opposite experiences lately, where Mistral was outperforming Chatgpt for some hard questions.
tacker2000 · 10h ago
Whats your point here? There is a place for a European LLM, be it “patriotism” or data safety. And dont tell me the Chinese are not “patriotic” about their stuff. Everyone has a different approach. If Mistral fits the market, they will be successful.
byefruit · 11h ago
You are probably getting downvoted because you don't give any model generations or versions ('ChatGPT') which makes this not very credible.
resource_waste · 11h ago
Its more likely that I'm getting downvoted by patriotic Europeans who came into a thread about an European company.
But ChatGPT has always been state of the art and cutting edge. Do I need to compare the first mistral models to 3.5? Or o4 and o3?
Does any reasonable person think Mistral has better models than OpenAI?
dismalaf · 11h ago
In your first comment you mentioned you used Mistral because of its permissive license (so guessing you used 7B, right?). Then you compare it to a bunch of cutting edge proprietary models.
Have you tried Mistral's newest and proprietary models? Or even their newest open model?
thrance · 10h ago
"patriotic Europeans" is an... interesting combination of words. I'd almost call it an oxymoron.
curiousgal · 12h ago
Too little too late, I work in a large European investment bank and we're already using Anthropic's Claude via Gitlab Duo.
croes · 11h ago
Is there are replacement for the Safe Harbor replacement?
Otherwise it could be illegal to transfer EU data to US companies
_bin_ · 9h ago
The law means don’t do what a slow moving regulator can and will prove in court. In this case, the law has no moral valence so I doubt anyone there would feel guilty breaking it. He may mean individuals are using ChatGPT unofficially even if prohibited nominally by management. Such is the case almost everywhere.
Btw, you can also run Mistral locally within the Docker model runner on a Mac.
I've run that using both Ollama (easiest) and MLX. Here are the Ollama models: https://ollama.com/library/mistral-small3.1/tags - the 15GB one works fine.
For MLX https://huggingface.co/mlx-community/Mistral-Small-3.1-24B-I... and https://huggingface.co/mlx-community/Mistral-Small-3.1-24B-I... should work, I use the 8bit one like this:
The Ollama one supports image inputs too: Output here: https://gist.github.com/simonw/89005e8aa2daef82c53c2c2c62207...Qwen 3 8B on MLX runs in just 5GB of RAM and can write basic code but I don't know if it would be good enough for anything interesting: https://simonwillison.net/2025/May/2/qwen3-8b/
Honestly though with that little memory I'd stick to running against hosted LLMs - Claude 3.7 Sonnet, Gemini 2.5 Pro, o4-mini are all cheap enough that it's hard to spend much money with them for most coding workflows.
I tried to run some of the differently sized DeepSeek R1 locally when those had recently come out, but couldn’t manage at the time to run any of them. And I had to download a lot of data to try those. So if you know a specific size of DeepSeek R1 that will work on 64GB RAM on MacBook Pro M2 Max, or another great local LLM for coding on that, that would be super appreciated
Specifically the `Q6_K` quant looks solid at ~27gb. That leaves enough headroom on your 64gb Macbook that you can actually load a decent amount of context. (It takes extra VRAM for every token of context you need)
Rough math, based on this[0] calculator is that it's around ~10gb per 32k tokens of context. And that doesn't seem to change based on using a different quant size -- you just have to have enough headroom.
So with 64gb:
- ~25gb for Q6 quant
- 10-20gb for context of 32-64k
That leaves you around 20gb for application memory and _probably_ enough context to actually be useful for larger coding tasks! (It just might be slow, but you can use a smaller quant to get more speed.)
I hope that helps!
0: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calcul...
1: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32...
2: https://huggingface.co/Qwen/Qwen3-32B-GGUF
I don't know if they'll be good enough for general coding tasks though - I've been spoiled by API access to Claude 3.7 Sonnet and o4-mini and Gemini 2.5 Pro.
Right now, for a coding LLM on a Mac, the standard is Qwen 3 32b, which runs great on any M1 mac with 32gb memory or better. Qwen 3 235b is better, but fewer people have 128gb memory.
Anything smaller than 32b, you start seeing a big drop off in quality. Qwen 3 14b Q4_K_M is probably your best option at 16gb memory, but it's significantly worse in quality than 32b.
Indeed. At work, we are experimenting with this. Using a cloud platform is a non-starter for data confidentiality reasons. On-premise is the way to go. Also, they’re not American, which helps.
> Btw, you can also run Mistral locally within the Docker model runner on a Mac.
True, but you can do that only with their open-weight models, right? They are very useful and work well, but their commercial models are bigger and hopefully better (I use some of their free models every day, but none of their commercial ones).
1. A typical contract transfers the rights to the work. The ownership of AI generated code is legally a wee bit disputed. If you modify and refactor generated code heavily it's probably fine, but if you just accept AI generated code en masse, making your client think that you wrote it and it is therefore their copyright, that seems dangerous.
2. A typical contract or NDA also contains non disclosure, i.e. you can't share confidential information, e.g. code (including code you _just_ wrote, due to #1) with external parties or the general public willy nilly. Whether any terms of service assurances from OpenAI or Anthropic that your model inputs and outputs will probably not be used for training are legally sufficient, I have doubts.
IANAL, and _perhaps_ I'm wrong about one or both of these, in one or more countries, but by and large I'd say the risk is not worth the benefit.
I mostly use third party LLMs like I would StackOverflow: Don't post company code there verbatim, make an isolated example. And also don't paste from SO verbatim. I tried other ways of using LLMs for programming a few times in personal projects and can't say I worry about lower productivity with these limitations. YMMV.
(All this also generally goes for employees with typical employment contracts: It's probably a contract violation.)
Note, that this is not a statement about the fairness or morality of LLM building, but to think that the legality of AI code generation is something to reasonably worry about, is betting against multiple large players and their hundreds of billions of dollars in investment right now, and that likely puts you in a bad spot in reality.
From what I've been following it seems very likely that, at least in the US, AI-generated anything can't actually be copyrighted and thus can't have ownership at all! The legal implications of this are yet to percolate through the system though.
parent statement reminds me of smug French in a castle north of London circa 1200, with furious locals standing outside the gates, dressed in rags with farm tools as weapons. One well-equipped tower guard says to another "no one is seriously disputing the administration of these lands"
I mean sure, but I think of my little agency providing value, for a price. Clients have budgets, they have limited benefits from any software they build, and in order to be competitive against other agencies or their internal teams, overall, I feel we need to provide a good bang for buck.
But since it's not all that much about typing in code, and since even that activity isn't all that sped up by LLMs, not if quality and stability matters, I would still agree that it's completely fine.
I meant that I don't care enough to spearhead and drive this effort within the client orgs. They have their own processes, and internal employees would surely also like to use AI, so maybe they'll get there eventually. And meanwhile I'll just use it in the approved ways.
Legally speaking, you also want to be careful about your dependencies and their licenses, a company that's afraid to get sued usually goes to quite some lengths to ensure they play this stuff safe. A lot of smaller companies and startups don't know or don't care.
From a professional ethics perspective, personally, I don't want to put my clients in that position unless they consciously decide they want that. They hire professionals not just to get work done they fully understand, but to a large part to have someone who tells them what they don't know.
I'm still sorting all this stuff out personally. I like LLM's when I work in an area I know well. But vibing in areas of technology that I don't know well just feels weird.
In the LLM case, I think it’s more of an open question whether the LLM output is republishing the copyrighted content without notice, or simply providing access to copyrighted content. I think the former would put the LLM provider in hot water, while the latter would put the user in hot water.
Sure, you can say "I'd just lie about it". But I don't know how many people would just casually lie in court. I sure wouldn't. Ethics is one thing, it takes a lot of guts, considering the possible repercussions.
Still, I find local models very much worth using after taking the time to set them up with Emacs, open-codex, etc.
Those plenty startups will also use Google, OpenAi or the built in Microsoft AI.
This is clearly for companies that need to keep the sensitive data under their control. I think they also get support with adding more training to the model to be personalized for your needs.
The main players all allow some form of zero data retention but I'm sure the more cautious CISO/CIOs flat out don't trust it.
The key thing they'd need to nail to make this better than what's already out there is the integrations. If they can make it seamless to integrate with all the key third-party enterprise systems then they'll have something strong here, otherwise it's not obvious how much they're adding over Open WebUI, LibreChat, and the other self-hosted AI agent tooling that's already available.
Efficiently? I thought macOS does not have API so that Docker could use GPU.
You might be talking about small tech companies that have no other options.
Is this an API point? A model enterprises deploy locally? A piece of software plus a local model?
There is so much corporate synergy speak there I can’t tell what they’re selling
https://en.wikipedia.org/wiki/Le_Chat
And also another reason people might use a non-American model is that dependency on the US is a serious business risk these days. Not relevant if you are in the US but hugely relevant for the rest of us.
Mistral has been consistently last place, or at least last place among ChatGPT, Claude, Llama, and Gemini/Gemma.
I know this because I had to use a permissive license for a side project and I was tortured by how miserably bad Mistral was, and how much better every other LLM was.
Need the best? ChatGPT
Need local stuff? Llama(maybe Gemma)
Need to do barely legal things that break most company's TOS? Mistral... although deepseek probably beats it in 2025.
For people outside Europe, we don't have patriotism for our LLMs, we just use the best. Mistral has barely any usecase.
You probably want to replace Llama with Qwen in there. And Gemma is not even close.
> Mistral has been consistently last place, or at least last place among ChatGPT, Claude, Llama, and Gemini/Gemma.
Mistral held for a long time the position of "workhorse open-weights base model" and nothing precludes them from taking it again with some smart positioning.
They might not currently be leading a category, but as an outside observer I could see them (like Cohere) actively trying to find innovative business models to survive, reach PMF and keep the dream going, and I find that very laudable. I expect them to experiment a lot during this phase, and that probably means not doubling down on any particular niche until they find a strong signal.
Have you tried the latest, gemma3? I've been pretty impressed with it. Altho I do agree that qwen3 quickly overshadowed it, it seems too soon to dismiss it altogether. EG, the 3~4b and smaller versions of gemma seem to freak out way less frequently than similar param size qwen versions, tho I haven't been able to rule out quant and other factors in this just yet.
It's very difficult to fault anyone for not keeping up with the latest SOTA in this space. The fact we have several options that anyone can serviceably run, even on mobile, is just incredible.
Anyway, i agree that Mistral is worth keeping an eye on. They played a huge part in pushing the other players toward open weights and proving smaller models can have a place at the table. While I personally can't get that excited about a closed model, it's definitely nice to see they haven't tapped out.
Qwen 2.5 14B blows Gemma 27B out of the water for my use. Qwen 2.5 3B is also very competitive. The 3 series is even more interesting with the 0.6B model actually useful for basic tasks and not just a curiosity.
Where I find Qwen relatively lackluster is its complete lack of personality.
But ChatGPT has always been state of the art and cutting edge. Do I need to compare the first mistral models to 3.5? Or o4 and o3?
Does any reasonable person think Mistral has better models than OpenAI?
Have you tried Mistral's newest and proprietary models? Or even their newest open model?
Otherwise it could be illegal to transfer EU data to US companies