help needed with text analyzis

This is a discussion on help needed with text analyzis within the C++ Programming forums, part of the General Programming Boards category; Got a program code so far to: Code: #include <string> #include <fstream> #include <iomanip> #include <iostream> using namespace std; //---------------------------------------------------------------------------- ...

  1. #1
    Registered User
    Join Date
    Oct 2011
    Location
    Lithuania
    Posts
    4

    help needed with text analyzis

    Got a program code so far to:

    Code:
    #include <string>
    #include <fstream>
    #include <iomanip>
    #include <iostream>
    
    using namespace std;
    //----------------------------------------------------------------------------
    const char Cdat[] = "Data.txt";
    const char Crez[] = "Rezult.txt";
    const char Canalyzis[] = "Analyzis.txt";
    
    //----------------------------------------------------------------------------
    void TrimmText(const char dfv[], const char rfv[], const char afv[]);
    void AnalyzeLine(string &lin, string & TwoNumeral, int & TwoStart,
                          unsigned int & TwoSum);
    void EditLine(string &lin, int TwoStart, int TwoSum);
    
    int main()
    {
        TrimmText(Cdat, Crez, Canalyzis);
    }
    //----------------------------------------------------------------------------
    // Text reading by line
    // dfv - original text data file name, rfv - edited text data file name
    // afv - analyzis data file name
    void TrimmText(const char dfv[], const char rfv[], const char afv[])
    {
        string TwoNumeral;
        int TwoStart;
        unsigned int TwoSum;
        ifstream fd(dfv);
        ofstream fr(rfv);
        ofstream fa(afv);
        string L;
        fa << "---------------------------\n";
        fa << "| Word | Start |  Sum |\n";
        fa << "---------------------------\n";
        while(!fd.eof()) 
        {
            getline (fd, L);
            {
                AnalyzeLine(L, TwoNumeral, TwoStart, TwoSum);
                fr << L << endl;
                if (TwoSum <= 9 && TwoSum != 0) 
                {
                    fa << "| " << left << setw(5) << TwoNumeral << " | "
                        << right << setw(7) << TwoStart;
                    fa << " | " << setw(5) << TwoSum << " |\n";
                }
            }
        }
        fa << "---------------------------\n" << endl;
        fd.close();
        fd.close();
        fa.close();
    }
    //----------------------------------------------------------------------------
    // Function which finds words that are made of two numerals and its sum
    // lin - line, in which search is being made
    // TwoNumeral - word which is made of two numerals
    // TwoStart - return its start
    // TwoSum - returns its numerals sum
    void AnalyzeLine(string &lin, string & TwoNumeral, int & TwoStart, unsigned int & TwoSum)
    {
        string Skirt = " .,!?:;()\t";
        string Numbers = "0123456789";
        string Word;
        int zpr = 0, zpb = 0;
        int skaiciukai[2];
        char skirstymas[2];
        TwoNumeral = "";
        TwoStart = 0; TwoSum = 0;
        while ((zpr = lin.find_first_not_of(Skirt, zpb)) != string::npos) 
        {
            zpb = lin.find_first_of(Skirt, zpr);
            Word = lin.substr(zpr, zpb - zpr);
            if(Word.length()==2 && !Word.find_first_of(Numbers,0))
            {
                TwoNumeral = Word;
                TwoStart = zpr;
                skirstymas[0]=Word[0];
                skirstymas[1]=Word[1];
                char sk1=skirstymas[0];
                char sk2=skirstymas[1];
                skaiciukai[0]=atoi(&sk1);
                skaiciukai[1]=atoi(&sk2);
                TwoSum = skaiciukai[0]+skaiciukai[1];
                if (TwoSum <= 9 && TwoSum != 0)
                {
                    EditLine(lin, TwoStart, TwoSum);
                }
            }
        }
    }
    //----------------------------------------------------------------------------
    // function which deletes words that are made of two numerals and which sum is less or equal to 9
    // lin - line, which is being changed
    // TwoStart - the start of word made of two numerals
    
    void EditLine(string & lin, int TwoStart, int TwoSum)
    {
        string Numbers = "0123456789";
        if (Numbers.find_first_of(lin[TwoStart]) != string::npos) 
            {
                lin.erase(TwoStart, 2);
            }
    }
    I need it to delete words that are made of two numerals and which sum is less or equal to 9. (basicaly if its 18/15/71 it deletes that word, if its 91 it wont because its more than 9). Think im pretty much done getting program to do so, theres a problem - If there's more than 1 word that meets my criteria in single line, program returns only last word from that line to analyzis.txt while i need it to return every single one of it.. Any ideas?

    Heres data file example:
    Code:
    Ne vieno antikos 18 autorių veikaluose 17 esama likę 567 nupasakojimų,
     kaip kvapiosios substancijos keliavo 1900 mylių aplink Viduržemio jūrą.
     
            ****  14  12  13 99 125 ***
     
      Iš šių 105 įvairių aprašymų matyti, jog Egiptas, ypač faraonų laikais,
      
      
       05 labai daug *** 76 kvapiųjų substancijų (kai kurių nūdienos  10 archeologų
    
     skaičiavimais gal net  **** 80 %) 01 atsigabendavo iš tolimų ir ne tokių  tolimų kraštų: iš Arabijos, iš Viduržemio jūros 
     55 salų – Kipro, Chijo ir Kretos, iš Nilo aukštupio. 
     Ypač 333 kvepalų importas 22 Egipte suklestėjo XIII a. pr. m. e, kai šalį valdė Ramzis Didysis.
    Rezults file:
    Code:
    Ne vieno antikos  autorių veikaluose  esama likę 567 nupasakojimų,
     kaip kvapiosios substancijos keliavo 1900 mylių aplink Viduržemio jūrą.
     
            ****       99 125 ***
     
      Iš šių 105 įvairių aprašymų matyti, jog Egiptas, ypač faraonų laikais,
      
      
        labai daug *** 76 kvapiųjų substancijų (kai kurių nūdienos   archeologų
    
     skaičiavimais gal net  ****  %)  atsigabendavo iš tolimų ir ne tokių  tolimų kraštų: iš Arabijos, iš Viduržemio jūros 
     55 salų – Kipro, Chijo ir Kretos, iš Nilo aukštupio. 
     Ypač 333 kvepalų importas  Egipte suklestėjo XIII a. pr. m. e, kai šalį valdė Ramzis Didysis.
    Analyzis file:
    Code:
    ---------------------------
    | Word | Start |  Sum |
    ---------------------------
    | 17    |      37 |     8 |
    | 25    |      24 |     7 |
    | 10    |      64 |     1 |
    | 01    |      33 |     1 |
    | 22    |      27 |     4 |
    ---------------------------
    also if i put 01 in front of 125 in data file, I get **** 99 1 *** in rezults file which shouldnt happen.

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,672
    Code:
                skirstymas[0]=Word[0];
                skirstymas[1]=Word[1];
                char sk1=skirstymas[0];
                char sk2=skirstymas[1];
                skaiciukai[0]=atoi(&sk1);
                skaiciukai[1]=atoi(&sk2);
    Well atoi() is going to break on those char pointers, because they're NOT pointing at strings with a \0 at the end.

    Code:
        while(!fd.eof())
        {
            getline (fd, L);
    See the FAQ on why using feof() to control a loop is bad.
    Then use
    while ( getline (fd, L) )
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User
    Join Date
    Oct 2011
    Location
    Lithuania
    Posts
    4
    Alright then, what am I supposed to change/do so program returns all the words it deleted to analyzis file? Since im getting only last one of single row right now, and is there any alternative for atoi?

  4. #4
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,672
    Well atoi() might be OK, if you wrote your code properly so it was always referring to a string with a \0 at the end.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  5. #5
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,788
    I am unsure as to why you feel you have to assign individual characters:

    skirstymas[0]=Word[0];
    skirstymas[1]=Word[1];

    ...and not whole strings.
    As to atoi, you can use atoi(Word.c_str()), or even better, the use of string streams:

    std::stringstream str(Word);
    str >> myint;

    Or even better, boost's lexical_cast:

    myint = boost::lexical_cast<int>(mystr);
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 0
    Last Post: 11-23-2010, 12:36 AM
  2. reading text file to struct help needed!
    By werdy666 in forum C++ Programming
    Replies: 2
    Last Post: 01-25-2009, 10:37 AM
  3. Help is needed to read text from a file
    By yuzhangoscar in forum C Programming
    Replies: 12
    Last Post: 09-12-2008, 12:10 AM
  4. Type text = Press button = Display text in Google?
    By Raze88 in forum C++ Programming
    Replies: 4
    Last Post: 03-20-2008, 08:39 AM
  5. create a text file with data using text editor
    By fried egg in forum C Programming
    Replies: 3
    Last Post: 03-14-2002, 08:11 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21