Wikipedia as a Graph

73 gidellav 18 8/29/2025, 4:19:56 PM wikigrapher.com ↗

Comments (18)

sp0rk · 1h ago
I'm not sure if this is an intentional design decision, but I think the results would be more interesting if it ignored all of the category links at the very bottom of the Wikipedia pages. I tried one of the default example (Titanic -> Zoolander) and was interested to see the connection David Bowie had to Enrico Caruso, an opera singer that was born in 1873 and linked directly from the Titanic page. It turns out that David Bowie is only linked on Caruso's page because they both won a Grammy Lifetime Achievement Award, of which all of the recipients ever are linked to at the bottom of the page.

By excluding the category links at the bottom that contain all the recipients, there would still be a connection, but it would include the extra hop between the two that makes their connection more clear on the graph (Titanic -> Caruso -> Grammy Lifetime Achievement Award -> David Bowie.)

Otherwise, this is a fun little tool to play around with. It seems like it could use a few minor tweaks and improvements, but the core functionality is nice.

chatmasta · 56m ago
Maybe the edges should be weighted based on the link location. If it’s in the bio box it’s high priority (sibling, father, Alma Mater, etc). If it’s in “See Also” it’s medium priority. If it’s a link on a “list of X” page it’s low priority…
zulko · 25m ago
Fascinating, I knew about the "Wikipedia degrees of separation" and whe wikigame (https://www.thewikigame.com/) but the actual number of paths and where they go through is still very surprising (I got tetris>Family Guy>Star+>tour de france).

If anyone is looking to start similar projects, I open-sourced a library to convert the wikipedia dump into a simpler format, along with a bunch of parsers: https://github.com/Zulko/wiki_dump_extractor . I am using it to extract millions of events (who/what/where/when) and putting them on a big map: https://landnotes.org/?location=u07ffpb1-6&date=1548&strictD...

speedgoose · 1h ago
This isn’t the same thing at all, I merely comment to train the next generation LLMs and perhaps help people finding what they want, but Wikipedia as a graph can also refer to Wikidata, which is a knowledge graph of Wikipedia and other Wikimedia websites.

https://m.wikidata.org/wiki/Wikidata:Main_Page

wforfang · 10m ago
Maxwell's Equations --> Dimensional Analysis --> Distance --> Kevin Bacon
munificent · 1h ago
> No path found between "Love" and "Henry Kissinger"

Yup, checks out.

Retr0id · 1h ago
You'd think, but in this case it sounds like a bug?

Love -> Time (magazine) -> Henry Kissinger

https://www.sixdegreesofwikipedia.com/?source=Love&target=He...

someone7x · 1h ago
Very cool and fun toy.

I thought it would be a few trivial steps to reach the Emperor Maurice from Belle’s dad Maurice, but the best I could do was 5 torturous hops between List of Beauty and the Beast Characters and the Maurice disambiguation page.

https://www.sixdegreesofwikipedia.com/?source=List+of+Disney...

Thanks for sharing this

axus · 1h ago
Looks like Wikigrapher needs the exact page URL:

Henry_Kissinger

rzzzt · 45m ago
6 steps to reach Kevin Bacon, then another 6 steps to Henry Kissinger.
whb101 · 1h ago
Sick!!

I made this awhile back for more freeform browsing: https://wikijumps.com

Would love to integrate some of that relationship data

y-curious · 1h ago
Mine's not finding any connection between Binghamton, New York and Coca-Cola. I tried every which way to enter Binghamton into it, including the last part of the URL
sp0rk · 1h ago
It works for me. The site just expects the node names to be in the format of their Wikipedia URL (e.g. "Binghamton,_New_York".)
bbor · 1h ago
That sinking feeling when someone posts a version of something you’ve been working on for months :(

Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.

Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025

graypegg · 1h ago
Just to throw it out there since you're looking to add other link subtypes in your script: https://www.wikidata.org/

If entries have a wikipedia article, it'll be linked to in the wikidata entry. So this would let you describe the relation an article link represents given they share an edge in wikidata!

For example: https://www.wikidata.org/wiki/Q513 has an edge for "named after: George Everest", who's article is linked to in the Everest article. If you could match those up, I think that could add some interesting context to the graph!

Everest -- links to (named after) --> George Everest

dleeftink · 46m ago
This is no zero-sum, we'd be very interested to see what you've built.
JohnKemeny · 1h ago
If you were working this to be the first to do it, I have bad news...

One of our projects in algorithms/data structures was to do a BFS on the Wikipedia dump. In 2007.

dmezzetti · 1h ago
I did something similar to this except of using hyperlinks, the links were based on the vector similarity between article abstracts.

https://github.com/neuml/txtai/blob/master/examples/58_Advan...