Scrapy_cffi: Async-first, modular web scraping utilities

2 funnyStrange 1 9/13/2025, 10:28:42 AM github.com ↗

Comments (1)

funnyStrange · 1h ago
scrapy_cffi keeps a Scrapy-style crawler architecture, supporting async execution and both HTTP & WebSocket requests. CLI support is minimal—Python API is recommended for running spiders.

The real highlights are its utility extensions:

• JSON Extractor – handles standard, embedded, and malformed JSON

• Media Downloader – segmented downloads for videos and large files

• Async Database Managers – Redis, MySQL, MongoDB with automatic retry and reconnection

• Multi-process RPC – quickly register functions, classes, and objects for rapid prototyping without MQ/Redis

These utilities can be used independently or combined into full async crawlers, offering flexibility, rapid prototyping, and easy extensibility.