Nice idea. In practice many sites have different methods to prevent scraping. Large risk on doing things manually imho.
renegat0x0 · 10m ago
Huh, I
I have been working on solution to that problem.
My project allows to define rules for various sites, so eventually everything is scraped correctly.
For YouTube yet dlp is also used to augment results.
I can crawl using requests, selenium, Httpx and others. Response is via json so it easy to process.
The downside is that it may not be the fastest solution, and I have not tested it against proxies.
My project allows to define rules for various sites, so eventually everything is scraped correctly. For YouTube yet dlp is also used to augment results.
I can crawl using requests, selenium, Httpx and others. Response is via json so it easy to process.
The downside is that it may not be the fastest solution, and I have not tested it against proxies.
https://github.com/rumca-js/crawler-buddy