alternate to an array???

This is a discussion on alternate to an array??? within the C++ Programming forums, part of the General Programming Boards category; A quick google search yeilds 1 million words as a max estimate for the number of words in the English ...

  1. #16
    Registered User
    Join Date
    Apr 2006
    Posts
    2,021
    A quick google search yeilds 1 million words as a max estimate for the number of words in the English language, including archaic and scientific words. But half of those are too technical and domain specific and remained unchronicled in oxford. Multiply that by about 10 for all the forms of a word. (I'm not sure about this figure). Then multiply that by about 5.1 for the average word length. Thats still only 510 million bytes. Not even a gig.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  2. #17
    Registered Abuser Loic's Avatar
    Join Date
    Mar 2007
    Location
    Sydney
    Posts
    115
    Thanks every one for all your help... i think i should be able to do it now...

    and for those of you who are interested in how i have wordlists that are so large... i am getting to from other computer security enthusiasts... http://forums.remote-exploit.org/sho...light=wordlist <-- there is a thread on wordlists

  3. #18
    Registered Abuser Loic's Avatar
    Join Date
    Mar 2007
    Location
    Sydney
    Posts
    115
    ok so i had a go to getting this to work, and in theory what i have should merge 2 sorted list and filter out the doubles.... but for some reason it isnt working and i cant find whats wrong... Please help...
    Code:
    #include <cstdlib>
    #include <iostream>
    #include <fstream>
    #include <string>
    
    using namespace std;
    
    int main(int argc, char *argv[])
    {
        int comp = 0;
        string wordA, wordB;
        ifstream wordlistA, wordlistB;
        ofstream output;
        
        wordlistA.open (argv[1]);
        wordlistB.open (argv[2]);
        output.open (argv[3]);
        
        if (wordlistA.is_open()&&wordlistB.is_open())
        {
            for (int i=0;;i++)
            {
                if (!wordlistA.eof()&&comp==-1||comp==0) {
                   getline (wordlistA, wordA);
                   /* debug */ cout << "in - " << wordA << endl;
                } else {
                   do {
                      getline (wordlistB, wordB);
                      output << wordB << "\n";
                   }while (!wordlistB.eof());
                   break;
                }
                if (!wordlistB.eof()&&comp==1||comp==0) {
                   getline (wordlistB, wordB);
                   /* debug */ cout << "in - " << wordB << endl;
                } else {
                   do {
                      getline (wordlistA, wordA);
                      output << wordA << "\n";
                   }while (!wordlistA.eof());
                   break;
                }
                comp = wordA.compare (wordB);
                /* debug */ cout << "comp - " << comp << endl;
                switch (comp) {
                       case -1:
                            output << wordA << "\n";
                            /* debug */ cout << "out - " << wordA << endl;
                       break;
                       case 1:
                            output << wordB << "\n";
                            /* debug */ cout << "out - " << wordB << endl;
                       break;
                }
    
           }
        }
        return EXIT_SUCCESS;
    }
    edit:
    also, the two text files i am using for testing are worda.txt and wordb.txt which is just the alphabet split up into 2 files.
    Last edited by Loic; 07-07-2008 at 12:21 AM.

  4. #19
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,446
    Here is an example that assumes that you can sort each file in memory, but cannot merge them in memory. Instead of demonstrating with files, I have chosen to demonstrate with a generous helping of stringstreams.
    Code:
    #include <iostream>
    #include <string>
    #include <sstream>
    #include <vector>
    #include <algorithm>
    #include <iterator>
    
    void sortWordList(std::istream& word_list, std::ostream& sorted_word_list);
    
    int main()
    {
        std::stringstream files[5];
        files[0] << "a d e f i j k m r s t y z";
        files[1] << "b c g h l n o p q u v w x";
        sortWordList(files[0], files[2]);
        sortWordList(files[1], files[3]);
    
        std::set_union(
            std::istream_iterator<std::string>(files[2]),
            std::istream_iterator<std::string>(),
            std::istream_iterator<std::string>(files[3]),
            std::istream_iterator<std::string>(),
            std::ostream_iterator<std::string>(files[4], " "));
    
        std::cout << files[4].str() << std::endl;
    }
    
    void sortWordList(std::istream& word_list, std::ostream& sorted_word_list)
    {
        // Read in the words into memory.
        std::vector<std::string> words;
        std::copy(
            std::istream_iterator<std::string>(word_list),
            std::istream_iterator<std::string>(),
            std::back_inserter(words));
    
        // Sort the list, remove duplicates and store to the output stream.
        std::sort(words.begin(), words.end());
        std::copy(words.begin(), std::unique(words.begin(), words.end()),
            std::ostream_iterator<std::string>(sorted_word_list, " "));
    }
    If you cannot sort each individual word list in place, then the solution is to move half of the word list into another file, then use the process outlined above, without using sortWordList() since std::set_union would suffice.

    A caveat though: this is based on how I think std::set_union should behave with respect to stream iterators (i.e., it does not keep everything in memory), and I could be wrong.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #20
    Registered Abuser Loic's Avatar
    Join Date
    Mar 2007
    Location
    Sydney
    Posts
    115
    sorry to be asking so many questions on this... but i am a little confused on how i would get that to acept a file??? ie using ifstream & ofstream...

    i think i understand how what you posted works... but with your files array you are reading and writing to & from them... where from what i understand with ifstream & ofstream one writes to a file and the other reads a file...

  6. #21
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,446
    but with your files array you are reading and writing to & from them... where from what i understand with ifstream & ofstream one writes to a file and the other reads a file...
    You can use std::fstream to both read and write from a file. You could also use an ofstream to write to a file, then use an ifstream to the same file to read from the file. The latter may well be simpler.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #22
    Registered Abuser Loic's Avatar
    Join Date
    Mar 2007
    Location
    Sydney
    Posts
    115
    after all that i find out that linux can do what i want to do already...

    if anyone wanted to know....
    Code:
    ##this will combine files 1,2,and 3 into the big file. 
    bt~#cat file1.txt file2.txt file3.txt > bigfile.txt
    
    This will alphabetize the list and remove the duplicates.
    bt~#cat bigfile.txt | sort | uniq  > newbigfile.txt
    just though i would share that with everyone...

    Thanks anyway

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 16
    Last Post: 05-29-2009, 07:25 PM
  2. from 2D array to 1D array
    By cfdprogrammer in forum C Programming
    Replies: 17
    Last Post: 03-24-2009, 10:33 AM
  3. [question]Analyzing data in a two-dimensional array
    By burbose in forum C Programming
    Replies: 2
    Last Post: 06-13-2005, 07:31 AM
  4. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  5. Quick question about SIGSEGV
    By Cikotic in forum C Programming
    Replies: 30
    Last Post: 07-01-2004, 07:48 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21