Ask HN: Has remote work made you happier or just lonelier?
1 points by jamessmithe 8m ago 1 comments
Observe live SQL queries in Go with DTrace (gaultier.github.io)
1 points by ingve 15m ago 0 comments
Transform DOCX into LLM-ready data
15 sergiishcherbak 5 5/4/2025, 10:42:48 PM contextgem.dev ↗
This custom-built converter directly processes Word XML, provides comprehensive content extraction + covers what other open-source tools often miss or lack support for:
- Rich paragraph and sentence metadata for enhanced context
- Misaligned tables
- Comments, footnotes, and textboxes
- Embedded images
The converted document can then be easily used in ContextGem's LLM extraction workflows.
Perfect for developers building contract intelligence applications where precision matters. The converter preserves document structure and relationships, empowering LLMs to better understand and analyze document content.
Try it / share with your dev team today and see the difference in your document processing pipeline!
GitHub: https://github.com/shcherbak-ai/contextgem
All DocxConverter features: https://contextgem.dev/converters/docx.html
I’ve read that there are a lot of OpenXML elements that are pretty opaque. They appear to basically be XML-esque representations of binary, in-memory structs used internally by Office. (Maybe this has changed over time.)
How much OpenXML does this actually handle?
Extracts information that other open-source tools often do not capture: misaligned tables
Could you expand on what you mean by misaligned tables? Are these tables that appear as separate ‘table nodes’ in the XML, or ones that appear as a single node but have wonky formatting?