Show HN: E-commerce data from 100k stores that is refreshed daily
After launching here on HN, a large enterprise reached out to pay for access to the raw data. We serviced the contract manually to learn the exact workflow and then decided to productize the "Data Connector" to help us scale to more customers.
The Data Connector enables developers to select any of our 100k stores in the index, view sample data, format the output, and export the up-to-date data. Data can be exported as CSV or JSON.
We've built crawlers for Shopify, WooCommerce, Squarespace, Wix, and custom built stores to index the store information, product data, stock, reviews, and more. The primary technical challenge is to recrawl the entire dataset every 24 hours. We do this with a series of servers that "recrawl" different store-types with rotating local proxies and then add changes to a queue to be updated in our search index. Our primary database is Mongo and our search runs on self-hosted Meilisearch on high RAM servers.
My vision is to index the world's e-commerce data. I believe this will create market efficiencies for customers, developers, and merchants.
I'd love your feedback!