Thread: Beginners Contest #2 For those who wanted more!!

  1. #1
    Banned
    Join Date
    Jun 2005
    Posts
    594

    Beginners Contest #2 For those who wanted more!!

    Im only offering up one contest choice at a time this time around,
    i think it will make the competition more agressive. If this is a big
    success i will continue to post competitions.

    Contest #1 will be judge on the following criteria,

    1. How well it conforms to C++, (this means i dont want to see
    character array where strings would be better used)
    2. Length of code. (shorter being the best)
    3. Comments

    Object of this contest :

    Parse a file(html) doesnt need to be in html
    format and the filename should be provided by the
    user, commandline acceptable but not required.

    The user should beable to select which file type
    he wants removed from the file for later use.

    Pretend this is a file you are given, which contains, in html
    format many links to picture and video and music.

    Code:
    <a href="http://www.image.com/images/blah.jpg">words</a>
    <a href="http://www.image.com/images/blah.bmp">words</a>
    <a href="http://www.image.com/images/blah.gif">words</a>
    <a href="http://www.image.com/video/blah.wma">words</a>
    <a href="http://www.image.com/video/blah.mpg">words</a>
    <a href="http://www.image.com/video/blah.avi">words</a>
    <a href="http://www.image.com/video/blah.asf">words</a>
    <a href="http://www.image.com/music/blah.mp3">words</a>
    <a href="http://www.image.com/music/blah.mp3">words</a>
    <a href="http://www.image.com/music/blah.wma">words</a>
    Now obviously there will be more stuff in the html file then just
    links and some of the links wont start with "a href=" some will
    be "img src=" along with a few other so it may take some reasearch
    if your not fimilar with html. YOUR job as the coder, is to
    accept a user input or from a config file, file extention such as
    ".jpg" , ".gif", ".zip", ".mp3", ".avi" and so on. Once you have
    collected the file extentions you must sort out the links in the
    file so that only the links to the extention supplied by the user
    still exsist. You can either overwrite the file with the links that
    were pulled or you can put them in a new file.

    Once you have completed a request by the user to save the links
    with the following extention for example (.mp3 , .jpg) your
    output file if the above was your file would look like this :

    Code:
    http://www.image.com/images/blah.jpg
    http://www.image.com/music/blah.mp3
    http://www.image.com/music/blah.mp3

    All of this is free to interuptation.
    That of course means, as long as you get the links the user wants
    its entirly up to you on how to make it happen. That is you must
    still stay in the bounds of the criteria to win. I know this next
    part my neglect some of you, but you are required to have below
    800 posts at tiem of submission to compete in this contest.

    Due Date
    August 9th (Yep thats one week)
    Good Luck to those who participate, as a side note
    please message me or post in c++ board question you
    have when you start coding this project, especially if you get
    discouraged to compete in it.

    P.S.
    Many of you might be wondering what the point of this program
    is, well for one i come to many site that have alot of picture or
    video on it but there a thousand individual links and i have
    to click each one, well now i can just get all the links i want
    real fast then use a mass downloader. I just wrote this program
    a few days ago myself, so i though was a good idea to pass
    it on.

  2. #2
    Registered User
    Join Date
    Mar 2003
    Posts
    134
    Could you please give a sample run of the program ?

  3. #3
    Banned
    Join Date
    Jun 2005
    Posts
    594
    Pretend this is the command prompt,
    (This is what it should look like at a minimum, that is
    if you dont use a config file to get your file types).

    Code:
    Please Enter a filename :
    Please Enter the file type you wanted saved :
    Completed!
    your input should be a file containing html
    code, and your output should be a file
    holding the links to the specified file type.


    Im going to attach the one i wrote, it does a little more
    then what im asking here and im still working on it,
    i cant post the code for contest reason of course,
    but i will include a readme, and an example file.
    So you will be wanting to look in the file.html
    after you run the program for the results.
    also mine ask for a little more information because
    it takes some stuff into account that i didnt ask for here,
    so when it asked you for the url, you can just type junk
    there it wont affect the out put of file.html for the input
    here. enter file.html for the filename of course.

    here is the link to the archive file

  4. #4
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    so let me get this straight - the program asks the user what file they want (filename and extension), and it searches the HTML file given it on the command-line (or hard-coded) for it?

    a few more technical questions: do we have to account for single-quotes as well:
    Code:
    <A HREF='test.jpg'>bleh</A>
    and I'm guessing we have to worry about things like this as well:
    Code:
    <A HREF="test.jpg">test.jpg</A>
    also, what about the pathnames themselves... do we have to take into account that most webmasters use relative pathnames, or can we assume they hardcode the full path to each file?
    Code:
    <A HREF="http://www.mythingy.com/images/image5.jpg">jgieopa</A>
    as opposed to
    <A HREF="../images/image5.jpg">jvieopaj</A>
    one other thing: what about protocols? can we assume that it's all going to be done over HTTP, or do you want to account for FTP or anything else as well?

    /me wishes C++ had some standard regex syntax...
    Last edited by major_small; 08-02-2005 at 09:16 PM.
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  5. #5
    Banned
    Join Date
    Jun 2005
    Posts
    594
    Quote Originally Posted by major_small
    so let me get this straight - the program asks the user what file they want (filename and extension), and it searches the HTML file given it on the command-line (or hard-coded) for it?

    a few more technical questions: do we have to account for single-quotes as well:
    Code:
    <A HREF='test.jpg'>bleh</A>
    and I'm guessing we have to worry about things like this as well:
    Code:
    <A HREF="test.jpg">test.jpg</A>
    also, what about the pathnames themselves... do we have to take into account that most webmasters use relative pathnames, or can we assume they hardcode the full path to each file?
    Code:
    <A HREF="http://www.mythingy.com/images/image5.jpg">jgieopa</A>
    Code:
    yes
    
    
    Quote Originally Posted by major_smalls
    as opposed to <A HREF="../images/image5.jpg">jvieopaj</A>
    Quote Originally Posted by major_smalls

    one other thing: what about protocols? can we assume that it's all going to be done over HTTP, or do you want to account for FTP or anything else as well?

    /me wishes C++ had some standard regex syntax...

    i was thinking about that my program accounts for that, but since
    this was beginners and i didnt know how many people had experience
    with html i didnt want a lot of rules to turn people off, so no
    my test file will assume the entire link is there. the test file
    will nto contain any links to ftp, or irc, or aim or any other
    protocols that you can link to only http, and also of course
    the type taht display images and what not such as

    a href , img src , embed src, and the variations of that you should
    be extracting links from.

    Does that clear your question or do you have more?

    Btw i thank you for your quesiton they will help other people
    be more clear on the goal, however you do realize you
    cant compete in this one.

  6. #6
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    Quote Originally Posted by ILoveVectors
    Btw i thank you for your quesiton they will help other people
    be more clear on the goal, however you do realize you
    cant compete in this one.
    hah... party pooper

    I'll just post my solution after the contest close just for kicks
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  7. #7
    Banned
    Join Date
    Jun 2005
    Posts
    594
    well you know what submit your solution to me anyways,
    i usually like your work alot, so if there arent many posted
    for this contest ill put you in the running.

  8. #8
    Registered User
    Join Date
    Mar 2003
    Posts
    134
    in this step :

    Please Enter the file type you wanted saved :


    the user can specify more than one file type right ? and can we ask the user to enter the file types in a particular manner say separated by space or by a comma ?

  9. #9
    Banned
    Join Date
    Jun 2005
    Posts
    594
    yes you can interupt it freely, you can do it anyway you want as
    long as the end result is the same, meaning the correct links
    are in the file that the only important part.

  10. #10
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    well, since this contest is over, I'm posting the code I came up with (even though I'm not in the running)
    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    
    int main(int argc,char*argv[])
    {
    	std::string filename;
    	std::string line;
    	int index;
    	
    	if(argc>2)
    	{
    		std::cout<<"\nUsage:\n\tGetIt\n\tGetIt <source>\n";
    		exit(0);
    	}
    	else if(argc==2)
    	{
    		filename=argv[1];
    	}
    	else
    	{
    		filename="default.html";
    	}
    
    	std::ifstream infile(filename.c_str());
    
    	std::cout<<"Enter the Filename (including extension): ";
    	getline(std::cin,filename,'\n');
    
    	while(getline(infile,line,'\n'))
    	{
    		index=line.find(filename);	
    		if(index>-1)
    		{
    			line=line.substr(line.find_last_of("\"\'",index)+1);
    			line.at(line.find_first_of("\"\'"))='\0';	
    			line=line.c_str();
    			std::cout<<line<<std::endl;
    			break;
    		}
    	}
    
    	infile.close();
    	return 0;
    }
    and my (better) test input file:
    Code:
    <a href="http://www.image.com/images/blah.jpg">words</a>
    <a href="http://www.image.com/images/blah.bmp">blah.bmp</a>
    <a href="http://www.image.com/images/blah.gif">words</a>
    <a href='http://www.image.com/video/blah.wma'>blah.wma</a>
    
    <a href="video/blah.avi">blah.avi</a>
    
    <a href="http://www.image.com/video/blah.asf">words</a><a href="http://www.image.com/music/blah.mp3">words</a>
    
    and here's our <a href="http://www.image.com/music/blah.mp3">new mp3</a> for your listening enjoyment
    
    <a href="http://www.image.com/music/blah.wma"><img src="http://www.image.com/images/blah.jpg"></A>
    
    <a href="ftp://ftp.image.com/music/blah.cpp"></a>
    
    <embed src="music.ogg" width="20%" height="5%"></embed>
    Last edited by major_small; 08-09-2005 at 05:26 PM. Reason: syntax highlighter tripped up by escape characters >.<
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  11. #11
    Weak. dra's Avatar
    Join Date
    Apr 2005
    Posts
    166
    Mine.

    Code:
    #include <iostream>
    #include <string>
    #include <algorithm>
    #include <vector>
    #include <fstream>
    
    using namespace std;
    
    
    struct url {
               string address;
               string extension;
           };
    
    
    //predicate for the find_if function in url_end
    //returns true if the char is either a single or double quote, false otherwise
    bool quotes ( char c ){
    
    		      return ( c == '\'' || c == '\"' );
    }
    //function for find the end of the url, simply looks for the single or double quote
    string::iterator url_end ( string::iterator a, string::iterator b ){  
                               //take two iterators which delimit a string
                               //a will already be at the position of h 
                               
                               string::iterator i = a;
                               //i will be at the posisition of a " or '
                               
                               i = find_if( a, b, quotes );
                               //iterates through the string until "quote" returns true
                               return i;
                     }
    //looks for the beginning of the protocol http
    string::iterator url_beg ( string::iterator a, string::iterator b){
                               //take two iterators which delimit a string
    
                               string link = "<a href="; //present in any link
    
                               //seach for "link" string to make sure we found a link and not <img src> or anything else
                               string::iterator i = search( a, b, link.begin(), link.end() );
                              
                               if ( i != b ){
                                    //i will the at < in <a href=", return 9 places past it
                                    return i + 9;
                               }
    
                               else return b;
                     }
    //looks for the file extension in the url
    string extensions ( string url ){
                        //accepts a string (the url)
                        string::iterator j = url.end();
                        
                        //iterate backwards through url until you encounter a .  as in music.wav                
                        while ( *(j) != '.' ){
                                j--;
                        }
                        //create a string delimited by u and url.end()
                        string d = string( j, url.end() );
                        return d;
           }  
    //find the urls whithin the string
    vector<url> find_urls ( string& link ){
                            
                            typedef string::iterator iter;
                                   
                            iter a = link.begin(), b = link.end();
                            //this will hold all of the urls
                            vector<url> urls;
                            
                            url add;
                            //continue until a reaches the end of the string
                            while ( a != b ){
                                    //set a to the beginning
                                    a = url_beg ( a, b );
                                    //if a doesn't equal the end of the string, a link was found
                                    if ( a != b ){
                                         //creat an iterator to delimit the end of the url 
                                         iter c = url_end ( a, b );
                                         //create the string
                                         string d = string ( a, c );
                                         
                                         add.address = d;
                                         add.extension = extensions ( d );
                                         urls.push_back( add );
    
                                         //set a to equal c so we can look through the rest of the string
                                         a = c;
    
                                     }
                             }
                             
                             return urls;
                  }
    
    int main(){
    
             string link, in, in_file; 
             cout << "Enter path of the file: ";
             getline( cin, in_file );
             cout << endl << "Results will output to a file called results.txt" << endl;
             ifstream file( in_file.c_str() );  // for testing
             ofstream out( "results.txt" );
    
             while ( getline( file, in ) ){
                     //a string is read into in, and it is added to link to make one big string
                     link = link + in;
    
             }
             //find urls in the link string, put them into a vector
             vector<url> urls = find_urls ( link );
             
             if ( urls.empty() ){ 
                  cout << "No urls found.";
                  cin.get();
                  return 0;
             }
    
             else{
                  cout << "Please enter the extensions you wish to keep ( ex .mp3 ):" << endl;
                  
                  string ext;
                  vector<string> extensions;
                  //input extensions you want to keep
                  while ( cin >> ext ){
                          extensions.push_back( ext );
                  }
                  
                  vector<url>::iterator i = urls.begin();
                  //run through each url
                  while ( i != urls.end() ){
                  
                          vector<string>::iterator j = extensions.begin();
                          //check to see if the extension of the current url matches any of
                          //the ones typed by the user
                          while ( j != extensions.end() ){
                                  //if we found a match
                                  if ( *j == (*i).extension ){ 
                                       out << (*i).address << endl;
                                       //no need to walk through the rest
                                       j = extensions.end();
                                  }
                                  
                                  else{
                                      //otherwise, check the other extension the user typed
                                      j++;
                                  }
                          }
                          //check next url
                          i++;
                  }
             }
             return 0;
        }
    Nowhere near as short. haha.
    Last edited by dra; 08-09-2005 at 06:20 PM.

  12. #12
    Banned
    Join Date
    Jun 2005
    Posts
    594
    i am always sad when i have to block major_smalls from
    a contest, cause he jsut a good contestant, and always has
    lovely code. i didnt test your code major, but by looking at
    it i dont think it does what it suppose to.

    needless to say dra was the only competiter so it goes
    unspoken that your the winner .

    thank youf or competeting, btw overall i thought your code
    was lovely and was almost exactly what i was wanting to see.

    here in a day when i finish some changes i had planned for me
    code i will post it to show you what insipred this competetion.

    btw i hoep you two will compete in one of the other competetions
    i jsut posted there welcome to all levels.

  13. #13
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    you should test it... it does exactly what it's supposed to (unless I read it wrong), except in one case, which I could have easily fixed, but didn't because I wasn't in the competition

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.jpg
    http://www.image.com/images/blah.jpg

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.bmp
    http://www.image.com/images/blah.bmp

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.gif
    http://www.image.com/images/blah.gif

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.wma
    http://www.image.com/video/blah.wma

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.avi
    video/blah.avi

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.asf
    http://www.image.com/video/blah.asf

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.mp3
    http://www.image.com/music/blah.mp3

    jshao@MCP ~/Programming/C++/HTMLFind $ ./GetIt test.html
    Enter the Filename (including extension): blah.cpp
    ftp://ftp.image.com/music/blah.cpp
    Last edited by major_small; 08-10-2005 at 12:06 AM.
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  14. #14
    Banned
    Join Date
    Jun 2005
    Posts
    594
    you read wrong a little bit, i think, you should be finding all of a
    specific file type not so much a specific filename.type


    a user should enter .jpg
    and the fiel shoudl return all links to a .jpg file
    reguardless of wether its a href or img src;



    so if the file type asked was .jpg in the following

    Code:
    <a href="http://www.image.com/images/blah1.jpg>words</a>
    <a href="http://www.image.com/images/blah2.jpg>words</a>
    <a href="http://www.image.com/images/blah3.jpg>words</a>
    <a href="http://www.image.com/images/blah4.jpg>words</a>
    <a href="http://www.image.com/images/blah5.jpg>words</a>
    <a href="http://www.image.com/images/blah6.jpg>words</a>
    <a href="http://www.image.com/images/blah7.jpg>words</a>
    <a href="http://www.image.com/images/blah8.jpg>words</a>
    <a href="http://www.image.com/images/blah9.jpg>words</a>
    <a href="http://www.image.com/images/blah10.jpg>words</a>
    <a href="http://www.image.com/images/blah1.avi>words</a>
    <a href="http://www.image.com/images/blah2.avi>words</a>
    <a href="http://www.image.com/images/blah3.avi>words</a>
    <a href="http://www.image.com/images/blah4.avi>words</a>
    <a href="http://www.image.com/images/blah5.avi>words</a>
    <a href="http://www.image.com/images/blah6.avi>words</a>
    <a href="http://www.image.com/images/blah7.avi>words</a>

    the following would be returned with the choice of .jpg as extention

    Code:
    <a href="http://www.image.com/images/blah1.jpg>words</a>
    <a href="http://www.image.com/images/blah2.jpg>words</a>
    <a href="http://www.image.com/images/blah3.jpg>words</a>
    <a href="http://www.image.com/images/blah4.jpg>words</a>
    <a href="http://www.image.com/images/blah5.jpg>words</a>
    <a href="http://www.image.com/images/blah6.jpg>words</a>
    <a href="http://www.image.com/images/blah7.jpg>words</a>
    <a href="http://www.image.com/images/blah8.jpg>words</a>
    <a href="http://www.image.com/images/blah9.jpg>words</a>
    <a href="http://www.image.com/images/blah10.jpg>words</a>
    Last edited by ILoveVectors; 08-10-2005 at 12:11 AM.

  15. #15
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    oh I see... you were looking for file extensions, not individual files... meh.. it would have been the same amount of code
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Beginners Contest, Others our Welcome
    By ILoveVectors in forum Contests Board
    Replies: 42
    Last Post: 08-02-2005, 06:53 PM
  2. Expression Evaluator Contest
    By Stack Overflow in forum Contests Board
    Replies: 20
    Last Post: 03-29-2005, 10:34 AM
  3. WANTED: Contest Master
    By kermi3 in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-23-2003, 10:15 PM