Hacktoberfest 2025 (hacktoberfest.com)

1 points by neurosoldier 1m ago 0 comments

Why Tesla Thinks Elon Needs More Money (theatlantic.com)

1 points by breve 2m ago 0 comments

Show HN: The Startup Turning Ordinary People into Data Millionaires" (dattaai.com)

1 points by ObengfoAndrew 2m ago 0 comments

The left's plan to fix housing in Paris (ft.com)

1 points by Traces 6m ago 0 comments

Ask HN: Has remote work made you happier or just lonelier?

1 points by jamessmithe 8m ago 1 comments

We Need to Talk About Observation (Swift/SwiftUI) (jaredsinclair.com)

1 points by tomaskafka 8m ago 0 comments

Commenters Deemed Offensive After Charlie Kirk's Death Face Consequences (time.com)

1 points by mdp2021 9m ago 0 comments

Turn legit URL's in something phishy (phishyurl.com)

2 points by landgenoot 11m ago 1 comments

Inside North Korea's Abandoned Hotel of Doom (theb1m.com)

1 points by voxadam 11m ago 1 comments

Continuous operation of a coherent 3,000-qubit system (nature.com)

1 points by 2ro 11m ago 1 comments

Ask HN: Do you guys see any value in LinkedIn?

2 points by mrdosija 13m ago 1 comments

Observe live SQL queries in Go with DTrace (gaultier.github.io)

1 points by ingve 15m ago 0 comments

Tech and Startups Digest (blaze.email)

1 points by alastairr 16m ago 0 comments

Show HN: OSS SDK for Digital Identity (ssi-sdk.blockialabs.com)

1 points by Pance 17m ago 0 comments

My friend was spending $2k/month on Cursor

2 points by BohdanPetryshyn 18m ago 0 comments

Jaguar Land Rover extends production shutdown after cyber-attack (theguardian.com)

1 points by Lio 18m ago 0 comments

Socktainer: Docker-compatible REST API for Apple containerization libraries (github.com)

1 points by ingve 22m ago 0 comments

DuckDuckGo now features a Blocked Sites list

1 points by the-kenny 22m ago 0 comments

A CEO's Guide to Emacs (web.archive.org)

1 points by lproven 23m ago 1 comments

Don't DDoS Yourself (duct-ui.org)

1 points by nvln 25m ago 0 comments

How People Use ChatGPT (nber.org)

1 points by hunglee2 25m ago 0 comments

Show HN: Drop-in Redis replacement in Rust with 5M+ GET/s (github.com)

3 points by mehrant 29m ago 0 comments

Swift 6.2 Released (swift.org)

1 points by jurip 29m ago 0 comments

Android 14 smartphone offers 6.13-inch E-Ink color display, 5G (cnx-software.com)

1 points by pathompong 30m ago 0 comments

TSMC working with Taiwanese beekeepers to produce honey from colocated hives (tomshardware.com)

1 points by gsf_emergency_2 31m ago 0 comments

Show HN: Cvee – Create a job post, get top candidates delivered to your inbox (cvee.cc)

1 points by mechikaegon 34m ago 0 comments

How to Fight Fraudulent Publishing in the Mathematical Sciences (arxiv.org)

1 points by croes 35m ago 0 comments

"Code Your Own Engine": Gearbox CEO Responds to Borderlands 4 Criticism (80.lv)

1 points by bob1029 36m ago 0 comments

US Army adopts VC model to supercharge tech deployment (defensenews.com)

1 points by gsf_emergency_2 38m ago 0 comments

China says TikTok's US app will use Chinese algorithm (ft.com)

5 points by thm 40m ago 3 comments

'Dot-Com Bubble 2.0' could burst at any time (marxist.com)

1 points by Improvement 42m ago 0 comments

Great Quotes from Science Fiction (medium.com)

1 points by bryanrasmussen 42m ago 0 comments

The Rise of Parasitic AI (lesswrong.com)

1 points by rntn 42m ago 0 comments

How Do LLMs Work? (gilesthomas.com)

1 points by ibobev 44m ago 0 comments

Performant Embedded Analytics at Scale (embeddable.com)

1 points by Embeddable 45m ago 1 comments

Ask HN: How can I test FTS5 engine in SQLite3?

1 points by mysh 46m ago 1 comments

Agentic Design Patterns (docs.google.com)

1 points by pietromenna 48m ago 0 comments

Viewing infrared imagery for any place on Earth (openstreetmap.org)

1 points by altilunium 49m ago 0 comments

China based rare-earths strat on chem engineers' 1978 Lockheed Martin & M-D tour (motherfriendly.org)

1 points by gsf_emergency_2 50m ago 0 comments

Peter Talisman: Lord of the Harvest, a game about collecting corn (petertalisman.quest)

1 points by knuckleheads 53m ago 0 comments

It's going to be a life skill: educators discuss the impact of AI on university (theguardian.com)

1 points by JeanKage 54m ago 1 comments

Readest Could Be [a] New Favorite eBook Reader App on Linux (news.itsfoss.com)

2 points by mdp2021 56m ago 1 comments

Overthinking vs. Scenario Planning (msthgn.com)

2 points by alexgvozden 1h ago 0 comments

I Spent One Year Building an Open-Source Project (github.com)

1 points by TT9601 1h ago 1 comments

Show HN: Daily, the easiest time tracker for Mac, now has a web API (dailytimetracking.com)

1 points by nielsmouthaan 1h ago 0 comments

Coffee poll: what do you usually order? I order a latte

3 points by whyandgrowth 1h ago 3 comments

Kernel Leaderboard (gpumode.com)

1 points by Jhsto 1h ago 0 comments

The Death of the Student Essay–and the Future of Cognition (forkingpaths.co)

1 points by amunozo 1h ago 0 comments

A New and Dangerous Kind of Fame (theatlantic.com)

2 points by FinnLobsien 1h ago 0 comments

Win10 Is Nearing End-of-Life: What Should You Do Next? (jasoneckert.github.io)

5 points by furkansahin 1h ago 2 comments

Transform DOCX into LLM-ready data

15 sergiishcherbak 5 5/4/2025, 10:42:48 PM contextgem.dev ↗

Comments (5)

sergiishcherbak · 134d ago

As part of work on my open-source project ContextGem, I've built a native, zero-dependency DOCX converter that transforms Word documents into LLM-ready data.

This custom-built converter directly processes Word XML, provides comprehensive content extraction + covers what other open-source tools often miss or lack support for:

- Rich paragraph and sentence metadata for enhanced context

- Misaligned tables

- Comments, footnotes, and textboxes

- Embedded images

The converted document can then be easily used in ContextGem's LLM extraction workflows.

Perfect for developers building contract intelligence applications where precision matters. The converter preserves document structure and relationships, empowering LLMs to better understand and analyze document content.

Try it / share with your dev team today and see the difference in your document processing pipeline!

GitHub: https://github.com/shcherbak-ai/contextgem

All DocxConverter features: https://contextgem.dev/converters/docx.html

WalterGR · 134d ago

zero-dependency DOCX converter

I’ve read that there are a lot of OpenXML elements that are pretty opaque. They appear to basically be XML-esque representations of binary, in-memory structs used internally by Office. (Maybe this has changed over time.)

How much OpenXML does this actually handle?

Extracts information that other open-source tools often do not capture: misaligned tables

Could you expand on what you mean by misaligned tables? Are these tables that appear as separate ‘table nodes’ in the XML, or ones that appear as a single node but have wonky formatting?

TiredOfLife · 132d ago

How it compares to https://github.com/microsoft/markitdown?

obeavs · 132d ago

Hey! This is really awesome. Do you intend to support analysis on redlining/tracked changes? That's where it would become very useful for my use cases.

eightysixfour · 132d ago

Yes, this is the one that always gets me in the MS ecosystem. Would make a few of my workflows so much better.