Thread: need algorithm help, sorting groups of text by comparing words

  1. #1
    Registered User
    Join Date
    Apr 2011
    Posts
    308

    need algorithm help, sorting groups of text by comparing words

    here is the code plan i wrote up so far, no coding done yet;

    updates still to do:

    - the words are in the articles lines
    i write the line the word is on, as the first word, then i write the words next to it that are in that line

    1 lion and mouse

    2 lion and mouse

    3 and mouse

    4 lion and mouse

    5 lion and

    6 lion and mouse

    7 lion mouse

    - sort article so adjacent lines that share the same words, erase the outer line

    2 lion and mouse

    3 and mouse

    4 lion and mouse

    5 lion and

    6 lion and mouse

    7 lion mouse

    - in groups of two, group the outer to inner lines,
    the top and bottom line are the first group,
    the second from the top and the second from the bottom are the second group,
    etc.

    2 lion and mouse
    7 lion mouse

    3 and mouse
    6 lion and mouse

    4 lion and mouse
    5 lion and

    - the words on a line are equal or not equal.
    "and" = "lion" in line 5.
    "and" = "mouse" in line 3.
    "lion" = "mouse" in line 7.

    2 lion = and = mouse
    7 lion = mouse != and

    3 and = mouse != lion
    6 lion = and = mouse

    4 lion = and = mouse
    5 lion = and != mouse

    ///////////////
    7 lion mouse = group a
    ///////////////
    3 and mouse = group b
    ///////////////
    5 lion and = group b
    ///////////////

    ____________________

    my question is what should i do to do the last step using c sharp code?

    this step;

    Code:
    - the words on a line are equal or not equal.
    "and" = "lion" in line 5.
    "and" = "mouse" in line 3.
    "lion" = "mouse" in line 7.
    
    2 lion = and = mouse
    7 lion = mouse != and
    
    3 and = mouse != lion
    6 lion = and = mouse
    
    4 lion = and = mouse
    5 lion = and != mouse
    
    ///////////////
    7 lion mouse = group a
    ///////////////
    3 and mouse = group b
    ///////////////
    5 lion and = group b
    ///////////////

  2. #2
    Registered User
    Join Date
    Apr 2011
    Posts
    308
    this is the algorithm i drafted for the last two updates in my previous list;

    2 lion and mouse
    3 and mouse
    4 lion and mouse
    5 lion and
    6 lion and mouse
    7 lion mouse

    step 1;
    - a group of 2 or more units, that share one or more words, but not all words
    3 and mouse
    5 lion and

    - a unit that holds a part of two or more groups but not all the words, named "special unit"
    7 lion mouse

    step 2;
    - group a unit that has the words in the "special unit", with a group that has the words in the "special unit", listing lowers numbers first

    you have a group of units and a "special unit" = 3 units * 2 = 6 units.

    the "special unit" is made from the group of units that is 4 units, four units for the group and 2 units for the "special unit" = 6 units.
    so the first group has 4 units in-between them.

    2 lion and mouse
    7 lion mouse

    - group a unit that has the words in the "special unit", with a group that is missing a word in the "special unit", listing lowers numbers first

    there is a group of units missing a word from the "special unit". the second group is compared to the first group.
    so this finds there is a mismatch in the second group, that is corrected by the third group.
    so the second group has the third group in-between its two units.

    3 and mouse
    6 lion and mouse

    the third group is the only group left,
    so if this is an odd number group, you have the extra unit be either a part of the unit that has no missing words,
    or a part of the unit that has missing words.

    4 lion and mouse
    5 lion and

    step 3:
    you can visualize step 1 and step 2 as 6 boxes, where there is two rows of three boxes.
    the bottom row is the units that have no missing words,
    the top row has the units missing a word on the outside and the unit not missing any words on the inside, so it looks like a triangle.

    need algorithm help, sorting groups of text by comparing words-untitled-jpg

    any ideas on a sw algorithm to do this?

  3. #3
    Registered User
    Join Date
    Apr 2011
    Posts
    308
    i made some code that does what i wanted and will post it as the answer to my question;

    heres the code;

    Code:
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading.Tasks;
    
    namespace project
    {
        class part_4
        {
            public void group_lines(string[] forward_line_numbers, string[] reverse_line_numbers, int new_how_many_lines)
            {
                // Reverse again.
                Array.Reverse(reverse_line_numbers);
    
                for (int i = 0; i < new_how_many_lines / 2; ++i)
                {
                    Console.WriteLine("//////////////////");
                    Console.WriteLine(forward_line_numbers[i]);
                    Console.WriteLine(reverse_line_numbers[i]);
                }
            }
    
            public static int CountWords(string test)
            {
                int count = 0;
                bool inWord = false;
    
                foreach (char t in test)
                {
                    if (char.IsWhiteSpace(t))
                    {
                        inWord = false;
                    }
                    else
                    {
                        if (!inWord) count++;
                        inWord = true;
                    }
                }
                return count;
            }
    
            public void compare(string[] new_shortened_list, string[] forward_line_numbers, string missing_words, string missing_lines, int i)
            {
                int test = 0;
                string[] myarray = missing_words.Split(' ');
                foreach (string temp in myarray)
                {
                    if ((temp == " ") && (temp == "") && (temp == null)) break;
    
                    bool result = Regex.IsMatch(new_shortened_list[i], temp);
    
                    if (result == false)
                    {
                        test = 1;
                    }
                }
                if (test == 0)
                {
                    Console.WriteLine(missing_lines);
                    Console.WriteLine(forward_line_numbers[i]);
                }
            }
            public void difference(string[] new_shortened_list,
                string[] working_array,
                string[] forward_line_numbers,
                int new_how_many_lines,
                int how_many_repetitions)
            {
                string missing_words = "";
                string missing_lines = "";
                int counter = 0;
                int word_counter = 0;
                StringBuilder s = new StringBuilder();
                StringBuilder t = new StringBuilder();
    
                for (int i = 0; i < new_how_many_lines; ++i)
                {
                    for (int j = 0; j < how_many_repetitions; ++j)
                    {
                        bool result_1 = Regex.IsMatch(new_shortened_list[i], working_array[j]); // compare words in each line to every word counted
                        if (result_1 == false) // if the word is missing in the line
                        {
                            bool result_2 = Regex.IsMatch(missing_words, working_array[j]); // is the word missing in the missing word list
                            if ((result_2 == false) && (counter < (how_many_repetitions - 1))) // if it is missing, and the list of missing words is less than the total number of words, i can write it to the missing word list
                            {
                                s.Append(working_array[j]);
                                s.Append(" ");
    
                                t.Append(forward_line_numbers[i]);
                                t.Append(" ");
    
                                counter++;
                            }
                        }
                    }
                    missing_words = s.ToString();
                    missing_lines = t.ToString();
    
                    word_counter = CountWords(new_shortened_list[i]);
                    if (word_counter < how_many_repetitions)
                    {
                        compare(new_shortened_list, forward_line_numbers, missing_words, missing_lines, i);
                    }
                }
            }
    
            public void split_sentence(string[] new_shortened_list, int new_how_many_lines)
            {
                for (int i = 0; i < new_how_many_lines; ++i)
                {
                    if (new_shortened_list[i].Length > 0)
                    {
                        int j = new_shortened_list[i].IndexOf(" ") + 1;
                        new_shortened_list[i] = new_shortened_list[i].Substring(j);
                    }
                }
            }
    
            public void get_first_number(string[] new_shortened_list,
                string[] forward_line_numbers,
                string[] reverse_line_numbers,
                int new_how_many_lines)
            {
                for (int i = 0; i < new_how_many_lines; ++i)
                {
                    forward_line_numbers[i] = new_shortened_list[i].Split(new char[] { ' ' })[0];
                    reverse_line_numbers[i] = new_shortened_list[i].Split(new char[] { ' ' })[0];
                }
            }
    
            public void get_first_word(string[] working_array, int how_many_repetitions)
            {
                for (int i = 0; i < how_many_repetitions; ++i)
                {
                    working_array[i] = working_array[i].Split(new char[] { ' ' })[0];
                }
            }
    
            /* working array contents:
             * lion 6 1 2 4 5 6 7
             * and 6 1 2 3 4 5 6
             * mouse 6 1 2 3 4 6 7
             */
    
            /* new_shortened_list array contents:
             * 2 lion and mouse
             * 3 and mouse
             * 4 lion and mouse
             * 5 lion and
             * 6 lion and mouse
             * 7 lion mouse
            */
    
            public void step_1(string[] new_shortened_list,
               string[] working_array,
               int new_how_many_lines,
               int how_many_repetitions)
            {
                get_first_word(working_array, how_many_repetitions);
    
                string[] forward_line_numbers = new string[new_how_many_lines];
                string[] reverse_line_numbers = new string[new_how_many_lines];
                get_first_number(new_shortened_list, forward_line_numbers, reverse_line_numbers, new_how_many_lines);
    
                split_sentence(new_shortened_list, new_how_many_lines);
    
                difference(new_shortened_list, working_array, forward_line_numbers, new_how_many_lines, how_many_repetitions);
    
                group_lines(forward_line_numbers, reverse_line_numbers, new_how_many_lines);
            }
        }
    }
    heres the program results;

    Code:
    3 5
    7
    //////////////////
    2
    7
    //////////////////
    3
    6
    //////////////////
    4
    5
    Press any key to continue . . .

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    I think heapsort would work well here. If you can arrange the items such that smaller strings, and strings that contain words of longer strings are higher in the heap, it should be close to right.

    If the idea works, then it should come up with the sequence "3 and mouse", "7 lion mouse", "5 lion and", "6 lion and mouse", "2 lion and mouse", "4 lion and mouse". Even though I'm not really sure why that's the correct order. Making it into a 3x3 grid is a matter of formatting.

    I don't have time to implement this myself right now, but there's an idea for you.

  5. #5
    Registered User
    Join Date
    Apr 2011
    Posts
    308
    for a idea like heap sort here is the way it should work, but it may not be heap sort because of the logic.

    - node 1 = all the possible words = (lion, and, mouse) = lines 2, 4, 6
    - node 2 = less than all words = (lion, mouse) = line 7
    - node 3 = less than all words = (mouse, and) = line 3
    - node 4 = less than all words = (lion, and) = line 5

    then i take node 2 as one type and node 3, 4, as the other type.

    then i sort out how the nodes link to node 1;
    - node 2 = line 7, node 1 = line 2
    - node 3 = line 3, node 1 = line 6
    - node 4 = line 5, node 1 = line 4

    heres a picture of it;

    need algorithm help, sorting groups of text by comparing words-nodes_-jpg
    Last edited by jeremy duncan; 08-11-2017 at 02:56 PM. Reason: typo

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Well there you go. I hope you're successful in making it dude.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. comparing letters to words
    By badboy1245 in forum C++ Programming
    Replies: 0
    Last Post: 09-22-2015, 06:47 AM
  2. Comparing words
    By Click_here in forum C Programming
    Replies: 2
    Last Post: 09-29-2012, 06:41 PM
  3. Problem with malloc() and sorting words from text file
    By goron350 in forum C Programming
    Replies: 11
    Last Post: 11-30-2004, 10:01 AM
  4. Sorting Into Groups Based On Arbitrary Size
    By Geolingo in forum C++ Programming
    Replies: 2
    Last Post: 08-15-2004, 01:38 AM
  5. Sorting words with a fast, effincient sorting method
    By Unregistered in forum C++ Programming
    Replies: 19
    Last Post: 07-12-2002, 04:21 PM

Tags for this Thread