Thread: How would one begin?

  1. #1
    Registered User
    Join Date
    Sep 2005
    Posts
    1

    How would one begin?

    How would one even try to begin attempting this problem? Is their any places that I can look that can help me attempt this problem?


    Given a number of web-pages, you are to develop a tool to create a cross-reference for the links in the pages. Specifically, for each web-page, you will generate two lists
    1. The list of files that it references.
    2. The list of files that reference it.

    One can use a cross-reference like this to
    -identify those files that that are isolated; that is, no other file refers to them. (Note that many times ``index.html'' is an isolate file, even though it is a very useful file!)

    -for each file on the list, determine which files/URLs it references. This is useful to make sure that all these files still exist.

    -for each file on the list, determine all files/URLs that refer to a given file. This is useful when the web-master decides to either elimitate a page or merge it into another page.

  2. #2
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    Well the first part would probably be pretty easy, except maybe parsing an html file (I guess you'd just look for "<a href=") Once you figure out how to find all the pages that it links to, write those to a file.

    You'd probably do that for each page first, then go back for each page, look through all the files you just created to see if they have link to that page.
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  3. #3
    Registered User
    Join Date
    Sep 2004
    Posts
    15
    Yea, were studying recurision so I guess thats what I have to do.

  4. #4
    Registered User
    Join Date
    Sep 2004
    Posts
    15
    I know how to open an indivdual file. But the teacher includes like 10 html files into one folder. So how do I open multiple files, with out even knowing the names of these html files? Is that even possible.

    ----------------------------------------------------------------------------------
    The input to this program is the name of a file that contains all files that your program should use to generate the cross-reference. For example, suppose the input file contains the following names.


    index.html
    faculty.html



    Your program will visit each of these files and for each file, it will find all ``anchor'' tags and enumerates all file-names that they reference. For example, suppose ``faculty.html'' contains:


    <thml><head><title>Departmental Faculty</title></head>
    <body>
    <H3>Professor</H3>
    <a href="ledin.html">George Ledin</a>
    <a href="stauffer.html">Lynn Stauffer</a>

    ...

    </body>
    </html>


    Then, your program should provide the following lists:
    faculty.html refers to
    ledin.html
    stauffer.html

    faculty.html is refered to by
    index.html


    Note I have assumed that "index.html" through an anchor tag refers to "faculty.html"
    Last edited by Mr. Acclude; 09-13-2005 at 08:32 PM.

  5. #5
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    Someone correct me if I'm wrong, but I don't believe there is a standard way to find all the files in a folder (meaning you would have to write OS-Specific code). You can look here for a solution for windows.
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  6. #6
    Registered User
    Join Date
    Sep 2004
    Posts
    15
    Actually this is going to be written in Linux. Here is the project description http://www.cs.sonoma.edu/~kooshesh/c...5project1.html

    If it was just one file, I would be able to open it and use getline to read a line at a time to find <a href=. However we never learned about parsing. Here is also a sample class he showed us that we can use

    http://www.cs.sonoma.edu/~kooshesh/c...pleClasses.tar

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. help me how to begin please...
    By lesrhac03 in forum C Programming
    Replies: 3
    Last Post: 04-13-2008, 10:18 AM
  2. Pleas take a look & give a critique
    By sh3rpa in forum C++ Programming
    Replies: 14
    Last Post: 10-19-2007, 10:01 PM
  3. Resource syntax error...
    By maxorator in forum Windows Programming
    Replies: 4
    Last Post: 06-22-2006, 05:23 PM
  4. Where to begin, graphically speaking?
    By Sennet in forum Game Programming
    Replies: 14
    Last Post: 01-22-2006, 02:28 AM
  5. Where to begin...?
    By Ayden in forum Game Programming
    Replies: 4
    Last Post: 07-11-2005, 09:53 AM