Not trying to be snarky, just curious -- How is this different from TurboPuffer and other serverless, object storage backed vector DBs?
ge96 · 1h ago
M is minutes
HarHarVeryFunny · 1h ago
I was starting to think this was impressive, if not impossible. 1B vectors in 48 MB of storage => < 1 bit per vector.
Maybe not impossible using shared/lossy storage if they were sparsely scattered over a large space ?
But anyways - minutes. Thanks.
Edit: Gemini suggested that this sort of (lossy) storage size could be achieved using "Product Quantization" (sub vectors, clustering, cluster indices), giving an example of 256 dimensional vectors being stored at an average of 6 bits per vector, with ANN being one application that might use this.
stevemk14ebr · 1h ago
Thank you, title needs edited.
ikanade · 1h ago
Legend
l5870uoo9y · 1h ago
Thankfully not months.
softwaredoug · 1h ago
Oh the horrors of search indexing Ive seen... including weeks / months to rebuild an index.
ashvardanian · 1h ago
Very curious about the hardware setup used for this benchmark!
OutOfHere · 1h ago
Proprietary closed-source lock-in. Nothing to see here.
HEmanZ · 36m ago
What do you think an alternative is for someone who:
1. Has a technical system they think could be worth a fortune to large enterprises, containing at least a few novel insights to the industry.
2. Knows that competitors and open source alternatives could copy/implement these in a year or so if the product starts off open source.
3. Has to put food on the table and doesn’t want to give massive corporations extremely valuable software for free.
Open source has its place, but it is IMO one of the ways to give monopolies massive value for free. There are plenty of open source alternatives around for vector DBs. Do we (developers) need to give everything away to the rich
CuriouslyC · 1h ago
Seriously. The amount of lift a SaaS product needs to give me is insane for me to even bother evaluating it, and there's a near zero percent chance I'll use it in my core.
stronglikedan · 58m ago
Nothing for you to see here. Surely you just aren't their target customer.
OutOfHere · 39m ago
So who is? Who really needs to index 1 billion new vectors every 48 minutes, or perhaps equivalently 1 million new vectors every 3 seconds?
Maybe not impossible using shared/lossy storage if they were sparsely scattered over a large space ?
But anyways - minutes. Thanks.
Edit: Gemini suggested that this sort of (lossy) storage size could be achieved using "Product Quantization" (sub vectors, clustering, cluster indices), giving an example of 256 dimensional vectors being stored at an average of 6 bits per vector, with ANN being one application that might use this.
1. Has a technical system they think could be worth a fortune to large enterprises, containing at least a few novel insights to the industry.
2. Knows that competitors and open source alternatives could copy/implement these in a year or so if the product starts off open source.
3. Has to put food on the table and doesn’t want to give massive corporations extremely valuable software for free.
Open source has its place, but it is IMO one of the ways to give monopolies massive value for free. There are plenty of open source alternatives around for vector DBs. Do we (developers) need to give everything away to the rich