Thread: The Wikipedia game

  1. #1
    Banned ಠ_ಠ's Avatar
    Join Date
    Mar 2009
    Posts
    687

    The Wikipedia game

    find an article on Wikipedia that doesn't lead to Adolf Hitler within 8 links

    Post failures as well as successes

    example:
    http://en.wikipedia.org/wiki/Chertsey_Lock
    http://en.wikipedia.org/wiki/City_of_London
    http://en.wikipedia.org/wiki/World_War_II
    http://en.wikipedia.org/wiki/Adolf_Hitler
    ╔╗╔══╦╗
    ║║║╔╗║║
    ║╚╣╚╝║╚╗
    ╚═╩══╩═╝

  2. #2
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Its called the Matt Damon effect, all things link to Matt Damon (or Adolf Hitler) within 6 links.

    Hitler is too easy, as anything you can connect to europe is automatically connected to hitler within 2 links (ww2, hitler) and by extension you can connect europe via continent, which connects to any other country which connects to anythign that hapened or exists in that country, so 6 links to hitler max.

    Rice Pudding
    Rice
    China
    WW2
    Hitler

    Matt Damon
    Movies
    Television
    Hitler
    Last edited by abachler; 03-23-2009 at 08:27 PM.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by abachler
    Hitler is too easy, as anything you can connect to europe is automatically connected to hitler within 2 links
    Doesn't that mean that "Hitler" is difficult, since the aim of the game is to "find an article on Wikipedia that doesn't lead to Adolf Hitler within 8 links"?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    May 2006
    Posts
    903
    Quote Originally Posted by laserlight View Post
    Doesn't that mean that "Hitler" is difficult, since the aim of the game is to "find an article on Wikipedia that doesn't lead to Adolf Hitler within 8 links"?
    Perhaps he meant that too easy to find = too hard not to find.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    This problem isn't interesting enough. What I want is for somebody to describe an algorithm which tries to find the shortest path between any pair of articles. A solution to that automatically solves the original problem, and might tell us something informative about Wikipedia.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #6
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,743
    Crap! I hate these NP-Complete kind of problems
    My Website

    "Circular logic is good because it is."

  7. #7
    Complete Beginner
    Join Date
    Feb 2009
    Posts
    312
    What I want is for somebody to describe an algorithm which tries to find the shortest path between any pair of articles.
    Wikipedia is a graph G where each article is a node and each internal link constitutes a directed edge to another node. Dijkstra's algorithm finds the shortest paths from a given node to every other node in the graph. Thus, finding the shortest path for any pair of nodes is fairly simple: for each node n in G, call Dijkstra(G, n).

    As Wikipedia isn't exactly sparse, Disjkstra has a running time of O(|V|^2 + |E|), where V is the set of nodes/vertices/articles and E is the set of edges. Note that |E| is smaller than |V|^2 in this case (|E|==|V|^2 if every article contains a link to every other article), so we can safely omit the |E| in our analysis. As we're calling Dijkstra as many times as there are nodes in G (i.e. |V| times), the overall complexity is O(|V|^3).

    This is far from being NP-complete, although I wouldn't want to compute it for the approximately 2.8 million English Wikipedia articles.

    Greets,
    Philip
    All things begin as source code.
    Source code begins with an empty file.
    -- Tao Te Chip

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Shortest path problems aren't NP-complete. Wikipedia has 2.8 million articles, and as a rough guess, 50 links per article on average. That's quite doable. For finding the shortest path between any single given pair of articles, you use Dijkstra's Shortest Path algorithm, which for a non-sparse graph (and Wikipedia definitely isn't sparse) has a worst case complexity of O(|V|² + |E|) == O(|V|²).

    Edit: Meh. Snafuist beat me to it. Anyway, 2.8 million² == 7,84x10^12 (7.84 trillion) operations. Heh. Will take a while, I suppose.
    Last edited by CornedBee; 03-24-2009 at 09:30 AM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by CornedBee View Post
    Shortest path problems aren't NP-complete. Wikipedia has 2.8 million articles, and as a rough guess, 50 links per article on average. That's quite doable. For finding the shortest path between any single given pair of articles, you use Dijkstra's Shortest Path algorithm, which for a non-sparse graph (and Wikipedia definitely isn't sparse) has a worst case complexity of O(|V|² + |E|) == O(|V|²).

    Edit: Meh. Snafuist beat me to it. Anyway, 2.8 million² == 7,84x10^12 (7.84 trillion) operations. Heh. Will take a while, I suppose.
    Well, we all know algorithms to find shortest paths. The problem is that we don't have access to the complete graph structure of Wikipedia. We can only go to a page, see the links there, and click on them.

    So, working on the assumption that you do NOT know the edges from a node until you actually visit that node, what kind of heuristic could be used to navigate the graph to any arbitrary node? What method would a human use?
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  10. #10
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by brewbuck
    Well, we all know algorithms to find shortest paths. The problem is that we don't have access to the complete graph structure of Wikipedia. We can only go to a page, see the links there, and click on them.

    So, working on the assumption that you do NOT know the edges from a node until you actually visit that node, what kind of heuristic could be used to navigate the graph to any arbitrary node? What method would a human use?
    Is it a requirement to know the entire graph in order to use Dijkstra's algorithm? I had the impression that it can work even if the information about the next node's next nodes is not available until the next node is visited.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #11
    Complete Beginner
    Join Date
    Feb 2009
    Posts
    312
    Quote Originally Posted by brewbuck View Post
    So, working on the assumption that you do NOT know the edges from a node until you actually visit that node
    ...which you don't have to know in order to use Dijkstra (you probably meant what you said plus "do not know that a node exists until you've been there")...

    , what kind of heuristic could be used to navigate the graph to any arbitrary node?
    Brute force, i.e. BFS or DFS.

    What do you want to achieve? Start at a random node and minimize the hops to "Hitler" without full knowledge about the graph?

    There are two possibilities:

    1)
    either the distance between any two nodes is < infinity. Then applying Dijkstra to one node reveals all other nodes (and you can call Dijkstra recursively on those to get all shortest paths between any two nodes)

    2)
    or the distance between two nodes equals infinity, i.e. the graph consists of subgraphs which are not connected with each other. Then the problem of finding all shortest paths can't be solved without knowing all nodes in advance, but at least any node outside of the "Hitler" subgraph satisfies the conditions from the OP

    Greets,
    Philip
    Last edited by Snafuist; 03-24-2009 at 11:16 AM.
    All things begin as source code.
    Source code begins with an empty file.
    -- Tao Te Chip

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Downloading and parsing the entire Wikipedia is linear, though. Sloooooooooow linear, probably.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Complete Beginner
    Join Date
    Feb 2009
    Posts
    312
    Another approach to solve the original problem: Wikipedia features backlinks (i.e. for every article, it's possible to find out the articles that link to it). So going to the "Hitler" page, viewing its backlinks, visiting all these pages, viewing their back links and so on 8 times will reveal all articles that refer to "Hitler" in 8 hops. Every other article doesn't.

    Greets,
    Philip
    All things begin as source code.
    Source code begins with an empty file.
    -- Tao Te Chip

  14. #14
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    As Wikipedia isn't exactly sparse
    I'd argue that Wikipedia is very sparse! According to one site, there are "2,301,486 articles with 55,550,003 links between them". There are a maximum of 5,296,835,506,710 links with that many articles, or you have .001048% of the total possible links. (Ask yourself: Does any given article link to most other articles? No.) Given that number though, it's pretty amazing how quickly you can prove Godwin right.

    Additionally, why not use BFS? BFS should find the shortest path, and in better running time too! Although you'll probably need quite a bit of mem memory for the queue & the visited list. (although you'll still need memory with Dijkstra...)
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  15. #15
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,743
    I don't think BFS or DFS would be good choices for such a problem. It's better to use a smart search when possible.

    (PS My NP-Complete joke was just that...a joke).

    Anyways, the best choice for the job would be the "Learning Real-time A* Algorithm" (not regular A* which is similar to Dijkstra's).

    Learning Real-time A* works under the assumption that the entire graph isn't really "known"...and it finds the same solution as A* or Dijkstra's would in a few more iterations of the algorithm.
    My Website

    "Circular logic is good because it is."

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. how do the game engine and the api interact?
    By Shadow12345 in forum Game Programming
    Replies: 9
    Last Post: 12-08-2010, 12:08 AM
  2. Open Source / Semi Open source game idea. Help needed
    By CaptainPatent in forum Projects and Job Recruitment
    Replies: 10
    Last Post: 05-16-2007, 10:44 AM
  3. game engine advice?
    By stien in forum Game Programming
    Replies: 0
    Last Post: 01-23-2007, 03:46 PM
  4. What the jip is up with wiki?
    By Clyde in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 12-22-2005, 07:58 PM
  5. My Maze Game --- A Few Questions
    By TechWins in forum Game Programming
    Replies: 18
    Last Post: 04-24-2002, 11:00 PM