Thread: Here is my Sentence Analyzer Program!

  1. #1
    Registered User
    Join Date
    Apr 2011
    Posts
    308

    Here is my Sentence Analyzer Program!

    The explanation for my program:

    int split_by_sentence(void)
    Receives input from the user. A sentence with a period or multiple sentences, each sentence must have a period.
    Writes the result to readtext.txt.

    int main ()
    Reads the readtext.txt file, sends a sentence/line at a time to the redefined_sentence function.

    void redefined_sentence (char* sentence)
    Receives the sentence and splits it into a word per line and writes the result to list.txt.

    This word per line file is opened and read a line/word at a time in a while loop, till the file is all read.
    The word is broken down into a single char and compared against a char array of consonants, to see if the word is a consonant.

    If the letter is a consonant the result is written to intermediate.txt, only the consonant version of the word is written to the file.
    When this happens, a integer is changed from zero to one.

    In a different part of the while loop, it checks to see if the integer has changed, if it has the whole word is written to consonant_words.txt.

    Now intermediate.txt and consonant_words.txt are closed and the while loop is finished.

    Now intermediate.txt is opened again and the last word of the file is passed into the variable 'last'.
    Then intermediate.txt is closed.

    Then intermediate.txt and consonant_words.txt are opened again.
    In a new while loop, if the letters in 'last' is the same as the word in intermediate.txt, the whole word version from consonant_words.txt is written to comparison.txt and last_word.txt.
    Now comparison.txt has words from the sentence that share consonant letters with the last word of the sentence.
    comparison.txt is needed in the percentage calculation later on.
    Now intermediate.txt, consonant_words.txt, comparison.txt, and last_word.txt are closed and the while loop is finished.

    Now comparison.txt is opened again and the matching words are printed on the screen.
    This is not the whole sentence, onlt the last word and words with matching consonants: paraphrase version, to get to the truth of the sentence.
    Now comparison.txt is closed.

    Now last_word.txt is opened again, the last word is put into the sentence "The last word is:".
    Now last_word.txt is closed.

    Now comparison.txt is opened again, and the word is put as a variable, along with the variable 'message' in the percentage_calculation function.

    void percentage_calculation(char *a_pch, char *message)
    Now the sentence from comparison.txt is broken down into a word per line, and each word is individually put into a while loop.
    The whole loop places the word into one of four categories, with 'B' being consonant and 'A' being vowel: ABA, AB, BA, BAB.
    Then all words now given one of these four values the math is done to see how much percentage wise a type of word was used in the sentence.
    Then, using if statements I check to see which of the four values was the largest, and the result is passed back into the 'message' variable and sent back to the redefined_sentence function.
    Also the percentages are printed on the screen.

    Now back in the redefined_sentence function, it puts the 'message' and 'last_string' variable into the word_generator function.

    void word_generator(char* C_F_two, char *message)
    C_F_two = 'last_string' variable.

    Using if statements, I check if the C_F_two word has a vowel and letter value, if it does the vowel, letter, 'C_F_two' and 'message' variables are passed into the search function.

    void search(char *src, char *a, char *b, char *c, char *message)

    Opens the readfile.txt, which has the dictionary and using a while loop goes through the dictionary a word at a time.
    I pass the first character and second last character of the dictionary word into their own variables.
    Then, using a for loop I break the dictionary word into individual characters/letters.
    Then, in the same calculation: I check if the vowel is one of the letters from the for loop, and if the letter is either the first character or second last character of the dictionary word.
    So all these steps are per-qualifiers.
    If a word qualified the number value is incremented by one, the value is never reset so the value increases with qualified words.
    Then I choose qualified words based on their number and save them to a variable to be displayed later.
    How the number qualifies the word is like ducks flying through the air during migration, the leader is neither the first or last word, so I wait for a few results then pick one.
    Yes, I used ducks to model my code, don't laugh, I got the idea after watching a mythbusters episode on the tv.

    Then I print the four results to screen, and using the 'message' variable I compare it to see the value of message and print the result in the sentence:
    "printf("The result using Analysis part ones data is: %s\n", d_d);", where d_d is one of the qualified words.

    The I close the program. So a lot of steps but they aren't rocket science steps, I think. The code was made by a novice C programmer, me, so it's not ultra advanced sci-fi code.

    Here is a short version of the description:

    int split_by_sentence(void) takes articles and breaks them down into a sentence per line so articles can be fed into the program.
    void redefined_sentence (char* sentence) created a hidden sentence from a sentence that reveals the hidden truth the writer is thinking as he writes,
    then uses this to make a percentage which is then fed into the search function to print the matching qualified word.
    void word_generator(char* C_F_two, char *message) ID's the vowel, letter combination of the word and goes to search function.
    void search(char *src, char *a, char *b, char *c, char *message) qualifies words, then picks the word after x number to make sure it is not the first or last result.
    And using the word from percentage function prints that below the four matching words.

    I wrote this because I thought you might be confused if you only read the code without me telling you what happened.

    There's only 1931 lines.
    I wish I knew enough C to make it smaller, but I don't.

    Besides corgi.c, you need these text files:

    bad.txt
    comparison.txt
    consonant_words.txt
    intermediate.txt
    last_word.txt
    list.txt
    m_and_s.txt
    readtext.txt
    readtext1.txt
    writelist.txt
    readfile.txt

    readfile.txt needs the diction from sourceforges 'kevins word list'.
    I use the Official 12Dicts Package, I copy and paste the contents of '2 of 12.txt' into readfile.txt.

    Here are some example sentences i put into the program:

    Code:
    Input a sentence. Press Enter when done.
    To some degree, the current Nintendo Wii console already features an "app store"
     called the Wii Shop Channel. Console owners can browse, purchase, and download
    bite-sized games called WiiWare in addition to old classics (ROMS) from the Supe
    r Nintendo, Sega Genesis and TurboGrafix 16 days.
    
    the current Nintendo console already an called the Shop Channel
    The last word is: Channel
    _______________________________________________________________
    
    Analysis part one:
    __________________
    
    
    letter vowel letter 0.300000
    letter vowel vowel 0.500000
    vowel vowel letter 0.100000
    vowel vowel vowel 0.100000
    
    Analysis part two:
    __________________
    
    Using the last word of the sentence: Channel,
    these are four possible word associations:
    
    half-dollar
    
    haggle
    
    dilettantish
    
    commonwealth
    
    The result using Analysis part ones data is: half-dollar
    
    _______________________________________________________________
    
    Console owners browse purchase and download bitesized games called addition old
    classics Nintendo Genesis and days
    The last word is: days
    _______________________________________________________________
    
    Analysis part one:
    __________________
    
    
    letter vowel letter 0.500000
    letter vowel vowel 0.187500
    vowel vowel letter 0.250000
    vowel vowel vowel 0.062500
    
    Analysis part two:
    __________________
    
    Using the last word of the sentence: days,
    these are four possible word associations:
    
    bachelorhood
    
    annoyed
    
    adulthood
    
    add
    
    The result using Analysis part ones data is: adulthood
    
    _______________________________________________________________
    
    Press any key to continue . . .
    To open walrus.c, remove the txt extension, then unzip it.

    walrus.zip.txt

    Thanks to everyone who helped me. I started this in late October early November and have been at it everyday since, now it's done.

    I know 1931 lines is a lot but most is copy and paste that Salem warned me about, but those copy and paste I forget how to make smaller code. But it works and that's what matters to me.


  2. #2
    Registered User
    Join Date
    Dec 2011
    Posts
    795
    > it works and that's what matters to me.
    This isn't good enough, eventually you'll start learning the mechanics of why using copy-paste leads to an efficiency drain on both you and the computer. If that doesn't motivate you (it should), it's much easier to edit tighter code.

    You have a ton of dependencies, which generally isn't a good idea, especially if one of them is from sourceforge. You should never rely on an Internet host for data, as there are so many ways for it to fail.

    Also, I'm scanning through your code, and the wasted memory and duplicated code is atrocious. Seriously, you have no excuse to malloc() nine char arrays, each of them 4096 bytes long, in just one function alone. Especially when only a couple are being freed. There's unused variables and unnecessary variables littered throughout the program.

    It's usually not the best idea, but I would recommend rewriting the entire project while heeding to these guidelines:
    - If you copy/paste a segment of code more than a couple times consecutively, it usually warrants a loop.
    - Don't allocate without freeing, and don't allocate too much space (no word is 4096 characters long).
    - Don't use overly large types when not necessary (no need to use ints for values of 0 and 1).
    - If you have code with tons of if() statements that all do the exact same thing, use the (&&) and (||) operators in your if() to shorten the code.
    - Choose better variable names that make sense
    - Don't ever waste functions, especially if they don't work:

    Code:
    char find_letter (char* a, char* b) 
    { 
        char string = strlen(b); 
        strncpy (a,b,1); 
        a[1]='\0'; 
        return 0; 
    }
    could be:
    Code:
    char find_letter(char *a)
    {
        if (a) 
            return a[0];
        else
            return 0;
    }
    /* Or, you could just do a[0] in your code. */

  3. #3
    Registered User
    Join Date
    Apr 2011
    Posts
    308
    I've updated my C program and lowered the number of lines in the source code from 1931 to about 1000.
    Also some usless variables were removed, so some code cleaning was done too.
    And I broke the code down into another function, so the functions are a bit easier to read and understand I hope.

    Just remove the txt extension then unzip it.
    corgi.zip.txt

    Thanks for the belief that I could reduce the code from 1900 to 1000 lines I tried and I did it too.

  4. #4
    Registered User
    Join Date
    Dec 2011
    Posts
    795
    It's a lot better, but your if statements are messed up.

    The code:
    Code:
    else if(percentage_4 > percentage_1 && percentage_2 && percentage_3)
    Translates to:
    Code:
    else if( (percentage_4 > percentage_1) && (percentage_2 != 0) && (percentage_3 != 0))
    You've still got some repetition: around line 585 where "if s==0" is repeated so many times, you should nest the statements. Another thing, while relatively minor, but you should get rid of the global variable "red", as it's unused. I haven't checked, but make sure all of your variables are used.

  5. #5
    Registered User
    Join Date
    Apr 2011
    Posts
    308
    I have further updated my code. I have tweaked some if conditions and nested them to make the source less lines of code.
    The global variable "red" is actually used in line 613.

    Here's the new source:
    corgi.zip.txt

    Thanks for the help.
    Last edited by jeremy duncan; 01-01-2012 at 04:09 PM.

  6. #6
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Good job on cutting the code down. A couple of ideas.

    First, if the global variable red is only used in one function, then declare it in that function.

    Stuff like this:
    Code:
        float one;
        float two;
        float three;
        float four;
        float total;
        float five;
        float six;
        float seven;
        float eight;
    could be
    Code:
    float numbers[8];
    Indices are 0 to 7, so one is 0, two is 1, etc.
    Then you can do things like this
    Code:
    for (i = 0; i < 8; i++)
            numbers[i] = 0;
    Same idea with the percentages:
    Code:
    float perc[4];
    for (i = 0; i < 4; i++)
        perc[i] = 0.0;
    If all greater_percentage_calculation is doing is appending "one", "two", "three", or "four" to message depending on whether the 1st, 2nd, 3rd or 4th percentage is highest then couldn't it be done like this?
    (This code assumes the percentages are never negative. I've also switched to using an array, as mentioned above.)
    Code:
    void greatest(float perc[4], char *message)
    {
        char *nums[] = {"one", "two", "three", "four"};
        int n = 0;
        if (perc[1] > perc[n]) n = 1;
        if (perc[2] > perc[n]) n = 2;
        if (perc[3] > perc[n]) n = 3;
        strcat(message, nums[n]);
    }

  7. #7
    Registered User
    Join Date
    Nov 2011
    Location
    Buea, Cameroon
    Posts
    197
    it looks like your program is a little complicated could you just summarize the whole program you intro its too long

  8. #8
    Registered User
    Join Date
    Apr 2011
    Posts
    308
    I've updated the code again, reducing it from about 1000 lines to around 700 lines:

    corgi.zip.txt

    And I've updated the output, here is a example:

    Code:
    Input a sentence. Press Enter when done.
    
    Police say a North Carolina man insisted his million-dollar note was real when h
    e was buying $476 worth of items at a Walmart.
    
    
    __________________
    __________________
    
    An input sentence is converted to lower case:
    __________________
    
    police say a north carolina man insisted his million-dollar note was real when h
    e was buying $476 worth of items at a walmart.
    
    
    __________________
    
    The secret message in the sentence:
    __________________
    
    police north carolina man insisted milliondollar note was real when was worth it
    ems at walmart
    
    
    The last word is: walmart
    __________________
    
    Analysis part one:
    __________________
    
    
    letter vowel letter 0.533333
    letter vowel vowel 0.266667
    vowel vowel letter 0.200000
    vowel vowel vowel 0.000000
    
    Analysis part two:
    __________________
    
    Using the last word of the sentence: walmart,
    these are four possible word associations:
    
    call
    
    attorney-general
    
    anti-intellectual
    
    annual
    
    The result using Analysis part ones data is: anti-intellectual
    
    _______________________________________________________________
    _______________________________________________________________
    
    Press any key to continue . . .
    To the person asking for an explaination.

    1.) a sentence is input.
    2.) is it converted to lower case.
    3.) It is searched for words that have at least one consonant the same as the last word has.
    4.) These matching words are formed into a shortened version of the sentence, ofter revealing hidden messages in the sentence.
    5.) Using the matching words I do percentage calculation which gives me the best word of four that matches how the sentence wants the reader to percieve the sentence, so the intention of the sentence is made into a single word.

    Thanks to the guy who made the math, that code didn't work, I think I did it wrong, but I modified it and now it works so the if conditions that were many pages of code are now only one page of code.

    Last edited by jeremy duncan; 01-02-2012 at 12:33 PM. Reason: typo

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 12-02-2011, 09:19 PM
  2. Program Runs off Screen / Lexical Analyzer
    By pantherman34 in forum C Programming
    Replies: 6
    Last Post: 05-05-2010, 06:10 PM
  3. Help with Sentence fix string program
    By daywalker- in forum C++ Programming
    Replies: 9
    Last Post: 11-01-2007, 06:44 AM
  4. Replies: 10
    Last Post: 08-16-2007, 01:02 PM
  5. Need Help on text analyzer program
    By drkmarine in forum C++ Programming
    Replies: 8
    Last Post: 03-16-2005, 06:34 PM