Catbench Vector Search Demo Has Postgres SQL Throughput, Latency Monitoring Now

15 tanelpoder 6 5/30/2025, 7:50:48 PM tanelpoder.com ↗

Comments (6)

jbellis · 9h ago
As the author of a vector search engine I was low key excited for this (there is no good benchmark for vector search out there that resembles real world use even a little, all the vendors have their own internal stuff) but I think using the term "bench" here is a misnomer, it's really more of a pgvector demo app and I don't think you can usefully use it to benchmark anything, at least not out of the box.
tanelpoder · 9h ago
Yeah, I just wanted a cool-sounding name for this. Nevertheless, it allows you to do easy stress-testing with some vector search operations (a quite narrow set, but you can combine it with joins and write your own queries if you like). But "CatStress" didn't sound too good to me.

It's a "Vector Search Playground" really, but the bigger value so far has come from not running maximum stress tests, but demonstrating people how you can join vector search results to the rest of your (existing) application schema. Plenty of people have thought that you need a completely separate, isolated vectorstore behind some API for this...

Edit: Also the setup part includes running a "generate_embeddings.py" script that uses PyTorch under the hood (on CPUs or CUDA/GPUs) to generate embeddings from the 25k photos (or 9M when using the rotated variants). That process can also be sped up and optimized for sure - my whole point is that once everything runs OK enough from end to end, then it's time to start measuring and optimizing the whole process - for learning and fun.

binarymax · 9h ago
https://ann-benchmarks.com is pretty good but I agree it needs an update. I'd like to see modern embedding dimensions (384, 768, 1536, etc.) as well as filters and combined read/write latencies.
jbellis · 9h ago
modern dimensions, yes

mixed workloads, also yes, especially in an "online" environment rather than the "batch mode" that ann-benchmarks does today

but most importantly, multicore -- ann-benchmarks is limited to a single core docker image which is absolutely ludicrous and I suspect is a significant reason that python-based systems do much better in their benchmark than you would expect from trying to deploy them under concurrent loads

binarymax · 9h ago
Indeed! I'm just looking at JVector which I wasn't familiar with - looks cool. Have you tried it with the billion-scale competition? (not sure if that's still running)
jbellis · 8h ago
sort of, there was the original bigann and then they followed up with a couple more specialized contests the following year, i think it's over now

~300M modern-sized vectors is pretty close to jvector's limit in a single index (the Cassandra layer can shard more) https://foojay.io/today/indexing-all-of-wikipedia-on-a-lapto...

that said I think Mariano (new jvector maintainer) is working on ways to handle larger datasets in a single index but I'm not sure where that is on his priority list