Show HN: Hyperparam: OSS Tools for Exploring Datasets Locally in the Browser
29 platypii 16 5/1/2025, 2:06:55 PM hyperparam.app ↗
For the last year I’ve been developing Hyperparam — a collection of small, fast, dependency-free open-source libraries designed for data scientists and ML engineers to actually look at their data.
- Hyparquet: Read any Parquet file in browser/node.js
- Icebird: Explore Iceberg tables without needing Spark/Presto
- HighTable: Virtual scrolling of millions of rows
- Hyparquet-Writer: Export Parquet easily from JS
- Hyllama: Read llama.cpp .gguf LLM metadata efficiently
CLI for viewing local files: npx hyperparam dataset.parquet
Example dataset on Hugging Face Space: https://huggingface.co/spaces/hyperparam/hyperparam?url=http...
No cloud uploads. No backend servers. A better way to build frontend data applications.
GitHub: https://github.com/hyparam Feedback and PRs welcome!
> This stems from an industry-wide realization that model performance is ultimately bounded by data quality, not just model architecture or hyperparameters.
Generally we think of model architecture + weights (parameters) as making up the model itself, and hyperparam(s|eters) are the more relevant to how one arrives at those weights -- and for this reason are more relevant to the efficacy of training than the performance of the resultant model.
Hyparquet is 10kb of pure js, and so its trivial to deploy on a modern webapp, and wins hands down on time-to-first-data metric.
I don't know how to reconcile this with the emphasis in the page on interacting with datasets relevant to AI which are commonly several orders of magnitude larger than this. What's an AI problem where the data data involved has been less than 10s of mb? I think that only toy problems and datasets could plausibly be smaller (e.g. the training images for the classic MNIST dataset are 47MB, and the whole dataset is 55 https://www.kaggle.com/datasets/hojjatk/mnist-dataset?select... ).
For example this parquet is the entire english wikipedia (400mb) but loads less than 4mb including html and all js to display the first rows:
https://hyperparam.app/files?key=https%3A%2F%2Fs3.hyperparam...
This way you can have huge AI datasets in cloud storage, and still have a nice interface for looking at your data.
In particular, a lot of modern AI datasets are huge walls of text (web scrapes, chains of thought, or agentic conversation histories), and most datasets on huggingface are in parquet. So you can much more quickly look at your data this way versus say jupyter notebooks.
Here's the glaive reasoning dataset on the Hyperparam hugging face space:
https://huggingface.co/spaces/hyperparam/hyperparam?url=http...
What I would really benefit of is a hypothetical LLM chat app that is focused on data migration or processing pipelines.
I was trying to look at, filter, and transform large AI datasets, and I was frustrated with how bad the existing tool was for working with datasets with huge amounts of text (web scrapes, github dumps, reasoning tokens, agent chat logs). Jupyter notebook is woefully bad at helping you to look at your data.
So I wanted to build better browser tools for working with AI datasets. But to do that I had to build these tools (there was no working parquet implementation in JS when I started).
Anyway I'm still working on building an app for data processing using LLM chat assistant to help a single user curate entire datasets singlehandedly. But for now I'm releasing these components to the community as open source. And having them "do a single task each" was very much intentional. Thanks for the comment!
That being said, I wish there was a better auth story. Open to suggestions if anyone has ideas!
But whats important is that with Hyperparam you can do it in the browser, where the bottleneck will always be network-bound not cpu-bound.