How would one even try to begin attempting this problem? Is their any places that I can look that can help me attempt this problem?
Given a number of web-pages, you are to develop a tool to create a cross-reference for the links in the pages. Specifically, for each web-page, you will generate two lists
1. The list of files that it references.
2. The list of files that reference it.
One can use a cross-reference like this to
-identify those files that that are isolated; that is, no other file refers to them. (Note that many times ``index.html'' is an isolate file, even though it is a very useful file!)
-for each file on the list, determine which files/URLs it references. This is useful to make sure that all these files still exist.
-for each file on the list, determine all files/URLs that refer to a given file. This is useful when the web-master decides to either elimitate a page or merge it into another page.