Hybrid Vectorizer –vector search for tabular (text and numeric and categorical)

1 hari_data 1 8/21/2025, 4:38:57 AM pypi.org ↗

Comments (1)

hari_data · 2h ago
Most vector search tools I’ve seen focus on one type of input — like text or images. But in practice, a lot of real-world data lives in tables with a mix of text, numbers, and categorical fields.

I ran into this while trying to find similar stocks, tools, and products across multiple fields (e.g., description, sector, p/e ratio, etc.). Text-only search gave poor results. Naive feature concat didn’t work either.

So I built this small Python package: It handles mixed-column similarity search using block-wise embeddings + cosine similarity. No training required. Just plug in your tabular data and run.

Some use cases it supports:

Similar stocks → description + sector + p/e + market cap

Similar movies → plot + crew + year + ratings

Similar tools → task + specs + geometry

Would love feedback or thoughts if you’ve struggled with something similar.

repo: https://pypi.org/project/hybrid-vectorizer/