Ask HN: HN with Just the Tech?
4 mieubrisse 8 6/12/2025, 7:08:11 PM
Howdy folks -- I find myself getting sucked into pessimistic rabbit holes when I read HN, often being drawn to the negative/scary news.
I'd love to see just tech/entrepreneurship updates without any politics/war/catastrophe updates. Does anyone currently use a tool that helps with this?
Maybe take inspiration from https://www.mosaique.info/ | https://news.ycombinator.com/item?id=44172340
I think you are talking though about my YOShInOn RSS reader uses the same scikit-learn library with the training and classification happening in a script that runs side-by-side with the web server with which I look at articles and make my judgements which is also written in Python. It is research code but it is also production in that I use it every day and I'm never afraid to demo it.
It uses the arangodb database which has a terrible license so I am making a library called "system of objects" that emulates some aspects of arangodb collections and documents over postgres tables and columns. At some point I move it to postgres and can put down that beast and feel free to either open-source or commercialize it.
At the core of it is a classification model that predicts the probability for "will I like this item?" and sampling N documents by taking the top N/k documents from each cluster. I would also blend in 30% of randomly sampled documents to keep the training data representative. The batch job looks up my judgements in the database, writes them into an numpy matrix and uses scikit-learn to train a model, it does the inference and puts its recommendations into the database, which I can see with my web front end which is done with flask and HTMX. I like this style for research/production code because you can build applications a "screen" at a time where a "screen" is a few python functions that answer a few URLs that make a web page work. The happy path of making judgements has to be very fast and easy, think TikTok or Tinder, because you will have to do that 1000s of time to make good models.
As a classification problem it's boring because it is a fuzzy problem. I might like an article today and hate it tomorrow so there is an upper ceiling to the accuracy.
So I am thinking the centaur use case that there is a stream of documents that you classify together with the model and the classification is something better defined, where the power of a more complex model to understand the document and determine something like "was the author angry?", "is this an account of a sports game?", "did the home team win?", etc.
That has me thinking about a general-purpose text classification kit which would have a small number of models chosen with practicality in mind and setting up some kind of benchmark against data sets from Kaggle.
I am not thinking about about better recommendations seriously because the problem is so vast and includes everything from: "reject anything from YouTube out of hand" to a nuanced analysis of what exactly "quality" means, not least a real-time instead of batch system that will tell me about a sports game today as opposed to next week and also push it to the front of any outbound queues -- yet, articles about carbon capture or video games or fast cars or circular economy or rural sociology can wait.
I'm interested more now in applying filtering based on people's emotional characteristics to social media, I mean maybe microblogging is dead, but it is just so much more fun if you can avoid the bottom 5% of bad behavior.
Highly curated, restricted posting, pretty much pure tech