Ask HN: How do I start my own cybersecurity related company?
2 points by babuloseo 6h ago 3 comments
Ask HN: Would you be interested in OpenRouter for MCPs?
3 points by subramanya1997 1d ago 1 comments
Show HN: I made an open-source synthetic text datasets generator
2 astropat 0 5/27/2025, 5:09:30 AM github.com ↗
Many LLMs projects suffers due to the lack of custom datasets:
- no labelled data at all
- lack coverage and diversity in existing data
- Data collection and annotation processes are slow and boring
- Not enough examples to fine-tune or evaluate LLMs…
So I built datafast, an open-source library for synthetic text datasets generation.
Right now it supports 5 datasets types:
- Text Classification Dataset - Raw Text Generation Dataset - Instruction Dataset (Ultrachat-like) - Multiple Choice Question (MCQ) Dataset - Preference Dataset
And more to come.
Currently supported LLM providers for generation are: - OpenAI - Anthropic - Google Gemini - Ollama (local LLM server)
There is more to come but I am not in a rush for features. I seek data quality, data diversity and reliability over quantity. I don't measure success by shipping more features: I succeed if it works when you try it out, and if you actually use it.
Hope you like that!
No comments yet