Thread: How to Improve?

  1. #1
    Registered User
    Join Date
    Mar 2007
    Posts
    32

    How to Improve?

    So I'm working on a project that solves the Kevin Bacon problem. The code works fine for a small file size, but I have to read in the entire IMDB movie database file, which contains thousands of movies. I have a graph where the vertices are the actors. I then create edges between the actors if they were in the same movie. However, I think my problem lies in creating the vertices because it is creating a string for each vertex, and so it quickly runs out of memory.

    Code:
    #ifndef GRAPHSTUFF_H_
    #define GRAPHSTUFF_H_
    
    #include <iostream>
    #include <string>
    #include <sstream>
    #include <fstream>
    #include <list>
    #include <vector>
    
    using namespace std;
    
    struct Vertex;
    
    struct Edge
    {
                         // First vertex in edge is implicit
        Vertex  *dest;   // Second vertex in edge
        double   cost;   // Edge cost
    
        Edge( Vertex *d = 0, double c = 0.0 )
          : dest( d ), cost( c ) { }
    };
    
    
    // Structure stored in priority queue for Dijkstra's algorithm.
    struct Path
    {
        Vertex *dest;   // w
        double  cost;   // d(w)
    
        Path( Vertex *d = 0, double c = 0.0 )
          : dest( d ), cost( c ) { }
    
        bool operator> ( const Path & rhs ) const
          { return cost > rhs.cost; }
        bool operator< ( const Path & rhs ) const
          { return cost < rhs.cost; }
    };
    
    const int INFINITY = std::numeric_limits<int>::max();
    
    // Basic info for each vertex.
    struct Vertex
    {
        std::string name;    // Vertex name
        vector<Edge>  adj;     // Adjacent vertices (and costs)
        double        dist;    // Cost
        Vertex       *prev;    // Previous vertex on shortest path
        int           scratch; // Extra variable used in algorithm
    
        Vertex( const std::string & nm ) : name( nm )
          { reset( ); }
    
        void reset( )
          { dist = INFINITY; prev = NULL; scratch = 0; }
    
      };
    
    #endif /*GRAPHSTUFF_H_*/

    Is there any way to improve this?
    Last edited by tallguy; 04-03-2007 at 08:21 AM.

  2. #2
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    The question is what you really expect to optimize? I mean, you somewhere have to store the information and that space is limited (of course at some point the processor starts page swapping etc).

    If you are sure that you're not creating a vertex twice for the same actor or that your program is allocating unnecessary space, then I can't see what you could optimize.

    I hope you realize that you don't have to read "the entire imdb database", but simply if you start with Kevin bacon (for example) and build your graph from there, you will encounter a situation where you can't add any more actors because you've got them already.

    Unfortunately for you, I heard some vague theory that it is possible to link any two actors together with a chain of only 4 intermediate nodes (with a working proof of concept on some website).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 11
    Last Post: 03-29-2009, 12:27 PM
  2. What do I need to improve on for the real world?
    By vhunterd in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 04-08-2007, 07:49 PM
  3. why page based I/O can improve performance?
    By George2 in forum C Programming
    Replies: 1
    Last Post: 06-12-2006, 07:42 AM
  4. Replies: 6
    Last Post: 06-09-2006, 12:44 AM