I got tired of manually copying numbers from Form 16 PDFs into India’s tax filing portal every year.
So I built *Form16x*, a Python CLI + library that turns these semi-structured PDFs into structured JSON.
Beyond extraction, it can:
- Consolidate multiple Form 16s (useful if you switched jobs in a year)
- Calculate taxes under both regimes and recommend the better option
- Show detailed salary and deduction breakdowns in the terminal (tree view, colored output)
- Suggest tax optimizations (80C, 80D, NPS, etc.) with potential savings
- Expose a Python API (`TaxCalculationAPI`) with multi-year tax rules (AY 2020–2025)
### Why I built it
Form 16 is similar to a W-2 in the US or a T4 in Canada: a PDF tax certificate with inconsistent layouts across employers. Filing returns often means manually re-entering data, which is error-prone and time-consuming.
Form16x tries to solve this with:
- PDF parsing using camelot/pdfplumber with fallback logic
- Structured output aligned with the form fields
- Local-only processing (no data leaves your machine)
- CLI polish (progress bars, colored display, breakdown trees)
Would love feedback from HN on both the technical side (PDF parsing + structured extraction) and whether this approach could extend to other countries’ tax forms.
Beyond extraction, it can: - Consolidate multiple Form 16s (useful if you switched jobs in a year) - Calculate taxes under both regimes and recommend the better option - Show detailed salary and deduction breakdowns in the terminal (tree view, colored output) - Suggest tax optimizations (80C, 80D, NPS, etc.) with potential savings - Expose a Python API (`TaxCalculationAPI`) with multi-year tax rules (AY 2020–2025)
*Repo:* https://github.com/ri-sh/Form16x
### Why I built it Form 16 is similar to a W-2 in the US or a T4 in Canada: a PDF tax certificate with inconsistent layouts across employers. Filing returns often means manually re-entering data, which is error-prone and time-consuming.
Form16x tries to solve this with: - PDF parsing using camelot/pdfplumber with fallback logic - Structured output aligned with the form fields - Local-only processing (no data leaves your machine) - CLI polish (progress bars, colored display, breakdown trees)
Would love feedback from HN on both the technical side (PDF parsing + structured extraction) and whether this approach could extend to other countries’ tax forms.