Form16x cli — parse Indian Form 16 PDFs with regime comparision/ optimization

2 taxedo 1 9/11/2025, 4:24:58 PM github.com ↗

Comments (1)

taxedo · 5h ago
I got tired of manually copying numbers from Form 16 PDFs into India’s tax filing portal every year. So I built *Form16x*, a Python CLI + library that turns these semi-structured PDFs into structured JSON.

Beyond extraction, it can: - Consolidate multiple Form 16s (useful if you switched jobs in a year) - Calculate taxes under both regimes and recommend the better option - Show detailed salary and deduction breakdowns in the terminal (tree view, colored output) - Suggest tax optimizations (80C, 80D, NPS, etc.) with potential savings - Expose a Python API (`TaxCalculationAPI`) with multi-year tax rules (AY 2020–2025)

*Repo:* https://github.com/ri-sh/Form16x

### Why I built it Form 16 is similar to a W-2 in the US or a T4 in Canada: a PDF tax certificate with inconsistent layouts across employers. Filing returns often means manually re-entering data, which is error-prone and time-consuming.

Form16x tries to solve this with: - PDF parsing using camelot/pdfplumber with fallback logic - Structured output aligned with the form fields - Local-only processing (no data leaves your machine) - CLI polish (progress bars, colored display, breakdown trees)

Would love feedback from HN on both the technical side (PDF parsing + structured extraction) and whether this approach could extend to other countries’ tax forms.