Show HN: Defuddle, an HTML-to-Markdown alternative to Readability
133 kepano 30 5/22/2025, 9:40:54 PM github.com ↗
Defuddle is an open-source JS library I built to parse and extract the main content and metadata from web pages. It can also return the content as Markdown.
I built Defuddle while working on Obsidian Web Clipper[1] (also MIT-licensed) because Mozilla's Readability[2] appears to be mostly abandoned, and didn't work well for many sites.
It's still very much a work in progress, but I thought I'd share it today, in light of the announcement that Mozilla is shutting down Pocket. This library could be helpful to anyone building a read-it-later app.
Defuddle is also available as a CLI:
https://github.com/kepano/defuddle-cli
In the end I found the python trifatura library to extract the best quality content with accurate meta data.
You might want to compare your implementation to trifatura to see if there is room for improvement.
for the curious: Trafilatura means "extrusion" in Italian.
| This method creates a porous surface that distinguishes pasta trafilata for its extraordinary way of holding the sauce. search maccheroni trafilati vs maccheroni lisci :)
(btw I think you meant trafilatura not trifatura)
I've got a project that has been going for 6 years now and attracted 500 stars and gets 49k downloads a month. It works because it has comprehensive unit tests and people can rely on it. When I was just starting out, I didn't tell people to feel free to help. I put the effort in. It is important to lay the groundwork beyond just writing the utility.
i started working on my own alternative but life (and web clipper) derailed the work.
it's funny. somehow slurp keeps gaining new users even though web clipper exists. so i might have to refactor it to use your library sometime soon even though I don't use slurp myself anymore.
Not that I didn't already implement a read-it-later solution with Obsidian+Dataview, but this definitely makes things simpler!