I still don't really understand what Vertex AI is.
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".
Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
ivanvanderbyl · 1h ago
Service account file vs API Key have similar security risks if provided the way you are using them. Google recommends using ADC and it’s actually an org policy recommendation to disable SA files.
wanderer2323 · 21m ago
ADC (Application Default Credentials) is a specification for finding credentials (1. look here 2. look there etc.) not an alternative for credentials. Using ADC one can e.g. find an SA file.
As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.
I don't think a service account vs an API key would improve performance in any meaningful way. I doubt the AI endpoint is authenticating the API key against a central database every request, it will most certainly be cached against a service key in the same AZ or whatever GCP call it.
mgraczyk · 5h ago
OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
minimaxir · 4h ago
From the linked docs:
> If you want to disable thinking, you can set the reasoning effort to "none".
For other APIs, you can set the thinking tokens to 0 and that also works.
mgraczyk · 4h ago
Wow thanks I did not know
simonw · 4h ago
I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.
mgraczyk · 4h ago
If you're on cloud run it should just work automatically.
For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json
If you use cloud build you shouldn't need to do anything
candiddevmike · 3h ago
You should consider setting up Workload Identity Federation and authentication to Google Cloud using your GitHub runner OIDC token. Google Cloud will "trust" the token and allow you to impersonate service accounts. No static keys!
mgraczyk · 1h ago
Does not work for many Google services, including firebase
PantaloonFlames · 3h ago
You could post on Reddit asking for help and someone is likely to provide answers, an explanation, probably even some code or bash commands to illustrate.
And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.
mountainriver · 3h ago
GCP auth is terrible in general. This is something aws did well
PantaloonFlames · 3h ago
I don't get that. How?
- There are principals. (users, service accounts)
- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.
- Permissions guard the things you can do in Google cloud.
- There are builtin roles that wrap up sets of permissions.
- you can create your own custom roles.
- attach roles to principals to give them parcels of permissions.
Aeolun · 4h ago
When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…
omneity · 4h ago
JSONSchema support on Google's OpenAI-compatible API is very lackluster and limiting. My biggest gripe really.
laborcontract · 4h ago
Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
tyre · 3h ago
Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.
unknown_user_84 · 4h ago
Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?
coredog64 · 2h ago
Wait until you see how to check Bedrock usage in AWS.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
KTibow · 3h ago
Vertex is the enterprise platform. It also happens to have much higher rate limits, even for free models.
minimaxir · 4h ago
Vertex AI is essentially a rebranding of their more enterprise platform on GCP, nothing explicitly "new."
rafram · 4h ago
Site seems to be down - I can’t get the article to load - but by far the most maddening part of Vertex AI is the way it deals with multimodal inputs. You can’t just attach an image to your request. You have to use their file manager to upload the file, then make sure it gets deleted once you’re done.
That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!
It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.
Deathmax · 4h ago
> You can’t just attach an image to your request.
You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.
rafram · 2h ago
Their JavaScript library didn’t support that as of whenever I tried.
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
fumeux_fume · 2h ago
I’m sorry have you used Azure? I’ve worked with all the major cloud providers and Google has its warts, but pales in comparison to the hoops Azure make you jump through to make a simple API call.
ic_fly2 · 1h ago
Azure API for LLM changes depending on what datacenter you are calling. It is bonkers. In fact it is so bad that at work we are hosting our own LLMs on azure GPU machines rather than use their API. (Which means we only have small models at much higher cost…)
jauntywundrkind · 6h ago
In general, it's just wild to see Google squander such an intense lead.
In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.
It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.
I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.
simonw · 5h ago
With Gemini 2.5 (both Pro and Flash) Google have regained so much of that lost ground. Those are by far the best long-context models right now, extremely competitively priced and they have features like image mask segmentation that aren't available from other models yet: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...
jasonfarnon · 4h ago
I think the commenter was saying google squandered its lead ("goodwill" is how I would refer to it) in providing open and interoperable services, not the more recent lead it squandered in AI. I agree with your point that they've made up a lot of that ground with gemini 2.5.
simonw · 4h ago
Yeah you're right, I should have read their comment more closely.
Google's API's have a way steeper learning curve than is necessary. So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.
Their permission model is diabolically complex to figure out too - same vibes as AWS, Google even used the same IAM acronym.
PantaloonFlames · 3h ago
> So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.
I don't see that dependency. With ANY of the APIs. They're all documented. I invoke them directly from within emacs . OR you can curl them. I almost never use the wrapper libraries.
I agree with your point that the client libraries are large and complicated, for my tastes. But there's no inherent dependency of the API on the library. The dependency arrow points the other direction. The libraries are optional; and in my experience, you can find 3p libraries that are thinner and more targeted if you like.
Aeolun · 3h ago
I feel like the AWS model isn’t all that hard for most of their API’s. It’s just something you don’t really want to think about.
tyre · 3h ago
Gemini 2.5 Pro is so good. I’ve found that using it as the architect and orchestrator, then farming subtasks and computer use to sonnet, is the best ROI
PantaloonFlames · 3h ago
You can also farm out subtasks to the Gemini Flash models. For example using Aider, use Pro for the "strong" model and Flash for the weak model.
candiddevmike · 3h ago
The models are great but the quotas are a real pain in the ass. You will be fighting other customers for capacity if you end up needing to scale. If you have serious Gemini usage in mind, you almost have to have a Google Cloud TAM to advocate for your usage and quotas.
caturopath · 3h ago
I don't understand why Sundar Pichai hasn't been replaced. Google seems like it's been floundering with respect to its ability to innovate and execute in the past decade. To the extent that this Google has been a good maintenance org for their cash cows, even that might not be a good plan if they dropped the ball with AI.
huntertwo · 3h ago
Everybody’s thinking the same thing. He sucks.
aaronbrethorst · 3h ago
Hubris. It seems similar, at least externally, to what happened at Microsoft in the late 90s/early 00s. I am convinced that a split-up of Microsoft would have been invigorating for the spin-offs, and the tech industry in general would have been better for it.
Maybe we’ll get a do-over with Google.
SmellTheGlove · 5h ago
Google’s APIs are all kind of challenging to ramp up on. I’m not sure if it’s the API itself or the docs just feeling really fragmented. It’s hard to find what you’re looking for even if you use their own search engine.
PantaloonFlames · 2h ago
The problem I've had is not that the APIs are complicated but that there are so darn many of them.
I agree the API docs are not high on the usability scale. No examples, just reference information with pointers to types, which embed other types, which use abstract descriptions. Figuring out what sort of json payload you need to send, can take...a bunch of effort.
candiddevmike · 3h ago
The Google Cloud API library is meant to be pretty dead simple. While there are bugs, there's a good chance if something's not working it's because of overthinking or providing too many args. Alternatively, doing more advanced stuff and straying from the happy path may lead to dragons.
tom_m · 4h ago
Doesn't matter much, Google already won the AI race. They had all the eyeballs already. There's a huge reason why they are getting slapped with anti-trust right now. The other companies aren't happy.
I agree though, their marketing and product positioning is super confusing and weird. They are running their AI business in a very very very strange way. This has created a delay, I don't think opportunity for others, in their dominance in this space.
Using Gemini inside BigQuery (this is via Vertex) is such a stupid good solution. Along with all of the other products that support BigQuery (datastream from cloudsql MySQL/postgres, dataform for query aggregation and transformation jobs, BigQuery functions, etc.), there's an absolutely insane amount of power to bring data over to Gemini and back out.
It's literally impossible for OpenAI to compete because Google has all of the other ingredients here already and again, the user base.
I'm surprised AWS didn't come out stronger here, weird.
tom_m · 4h ago
Oh and it's not just Gemini, I'm sorry. It's Vertex. So it's other models as well. Those you train too.
lemming · 4h ago
Additionally, there's no OpenAPI spec, so you have to generate one from their protobuf specs if you want to use that to generate a client model. Their protobuf specs live in a repo at https://github.com/googleapis/googleapis/tree/master/google/.... Now you might think that v1 would be the latest there, but you would be wrong - everyone uses v1beta (not v1, not v1alpha, not v1beta3) for reasons that are completely unclear. Additionally, this repo is frequently not up to date with the actual API (it took them ages to get the new thinking config added, for example, and their usage fields were out of date for the longest time). It's really frustrating.
ezekiel68 · 4h ago
Eh, you know. "Move fast and break things."
caturopath · 3h ago
I'm not sure "move fast" describes the situation.
bionhoward · 4h ago
Also has the same customer noncompete copy pasted from ClosedAI. Not that anyone seemingly cares about the risk of lawsuits from Google for using Gemini in a way that happens to compete with random-Gemini-tentacle-123
behnamoh · 4h ago
Even their OAI-compatible API isn't fully compatible. Tools like Instructor have special-casing for Gemini...
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk
Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.
(ref: https://cloud.google.com/iam/docs/migrate-from-service-accou...)
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
> If you want to disable thinking, you can set the reasoning effort to "none".
For other APIs, you can set the thinking tokens to 0 and that also works.
For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json
If you use cloud build you shouldn't need to do anything
And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.
- There are principals. (users, service accounts)
- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.
- Permissions guard the things you can do in Google cloud.
- There are builtin roles that wrap up sets of permissions.
- you can create your own custom roles.
- attach roles to principals to give them parcels of permissions.
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!
It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.
You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.
Here's the code: https://github.com/simonw/tools/blob/main/gemini-mask.html
It's the best model out there.
https://github.com/ryao/gemini-chat
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.
It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.
I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.
Google's API's have a way steeper learning curve than is necessary. So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.
Their permission model is diabolically complex to figure out too - same vibes as AWS, Google even used the same IAM acronym.
I don't see that dependency. With ANY of the APIs. They're all documented. I invoke them directly from within emacs . OR you can curl them. I almost never use the wrapper libraries.
I agree with your point that the client libraries are large and complicated, for my tastes. But there's no inherent dependency of the API on the library. The dependency arrow points the other direction. The libraries are optional; and in my experience, you can find 3p libraries that are thinner and more targeted if you like.
Maybe we’ll get a do-over with Google.
I agree the API docs are not high on the usability scale. No examples, just reference information with pointers to types, which embed other types, which use abstract descriptions. Figuring out what sort of json payload you need to send, can take...a bunch of effort.
I agree though, their marketing and product positioning is super confusing and weird. They are running their AI business in a very very very strange way. This has created a delay, I don't think opportunity for others, in their dominance in this space.
Using Gemini inside BigQuery (this is via Vertex) is such a stupid good solution. Along with all of the other products that support BigQuery (datastream from cloudsql MySQL/postgres, dataform for query aggregation and transformation jobs, BigQuery functions, etc.), there's an absolutely insane amount of power to bring data over to Gemini and back out.
It's literally impossible for OpenAI to compete because Google has all of the other ingredients here already and again, the user base.
I'm surprised AWS didn't come out stronger here, weird.