Show HN: Extract Tables from Any Website – Images to JSON via OCR
1 valliappanr 0 6/22/2025, 7:28:34 PM github.com ↗
built a two-step open-source tool that extracts tables from any website, even the hard ones that rely on dynamic rendering or CSS.
Step 1: Capture tables as images using a headless browser Step 2: Run OCR to convert them into structured JSON
This works well when traditional HTML parsers fail, like for complex styles, merged cells, or JS-rendered content.
GitHub: https://github.com/enterpriseqa/extract_tables_from_websites Examples included. Feedback and contributions are welcome!
No comments yet