# Thread: need algorithm help, sorting groups of text by comparing words

1. ## need algorithm help, sorting groups of text by comparing words

here is the code plan i wrote up so far, no coding done yet;

- the words are in the articles lines
i write the line the word is on, as the first word, then i write the words next to it that are in that line

1 lion and mouse

2 lion and mouse

3 and mouse

4 lion and mouse

5 lion and

6 lion and mouse

7 lion mouse

- sort article so adjacent lines that share the same words, erase the outer line

2 lion and mouse

3 and mouse

4 lion and mouse

5 lion and

6 lion and mouse

7 lion mouse

- in groups of two, group the outer to inner lines,
the top and bottom line are the first group,
the second from the top and the second from the bottom are the second group,
etc.

2 lion and mouse
7 lion mouse

3 and mouse
6 lion and mouse

4 lion and mouse
5 lion and

- the words on a line are equal or not equal.
"and" = "lion" in line 5.
"and" = "mouse" in line 3.
"lion" = "mouse" in line 7.

2 lion = and = mouse
7 lion = mouse != and

3 and = mouse != lion
6 lion = and = mouse

4 lion = and = mouse
5 lion = and != mouse

///////////////
7 lion mouse = group a
///////////////
3 and mouse = group b
///////////////
5 lion and = group b
///////////////

____________________

my question is what should i do to do the last step using c sharp code?

this step;

Code:
```- the words on a line are equal or not equal.
"and" = "lion" in line 5.
"and" = "mouse" in line 3.
"lion" = "mouse" in line 7.

2 lion = and = mouse
7 lion = mouse != and

3 and = mouse != lion
6 lion = and = mouse

4 lion = and = mouse
5 lion = and != mouse

///////////////
7 lion mouse = group a
///////////////
3 and mouse = group b
///////////////
5 lion and = group b
///////////////```

2. this is the algorithm i drafted for the last two updates in my previous list;

2 lion and mouse
3 and mouse
4 lion and mouse
5 lion and
6 lion and mouse
7 lion mouse

step 1;
- a group of 2 or more units, that share one or more words, but not all words
3 and mouse
5 lion and

- a unit that holds a part of two or more groups but not all the words, named "special unit"
7 lion mouse

step 2;
- group a unit that has the words in the "special unit", with a group that has the words in the "special unit", listing lowers numbers first

you have a group of units and a "special unit" = 3 units * 2 = 6 units.

the "special unit" is made from the group of units that is 4 units, four units for the group and 2 units for the "special unit" = 6 units.
so the first group has 4 units in-between them.

2 lion and mouse
7 lion mouse

- group a unit that has the words in the "special unit", with a group that is missing a word in the "special unit", listing lowers numbers first

there is a group of units missing a word from the "special unit". the second group is compared to the first group.
so this finds there is a mismatch in the second group, that is corrected by the third group.
so the second group has the third group in-between its two units.

3 and mouse
6 lion and mouse

the third group is the only group left,
so if this is an odd number group, you have the extra unit be either a part of the unit that has no missing words,
or a part of the unit that has missing words.

4 lion and mouse
5 lion and

step 3:
you can visualize step 1 and step 2 as 6 boxes, where there is two rows of three boxes.
the bottom row is the units that have no missing words,
the top row has the units missing a word on the outside and the unit not missing any words on the inside, so it looks like a triangle.

any ideas on a sw algorithm to do this?

3. i made some code that does what i wanted and will post it as the answer to my question;

heres the code;

Code:
```using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace project
{
class part_4
{
public void group_lines(string[] forward_line_numbers, string[] reverse_line_numbers, int new_how_many_lines)
{
// Reverse again.
Array.Reverse(reverse_line_numbers);

for (int i = 0; i < new_how_many_lines / 2; ++i)
{
Console.WriteLine("//////////////////");
Console.WriteLine(forward_line_numbers[i]);
Console.WriteLine(reverse_line_numbers[i]);
}
}

public static int CountWords(string test)
{
int count = 0;
bool inWord = false;

foreach (char t in test)
{
if (char.IsWhiteSpace(t))
{
inWord = false;
}
else
{
if (!inWord) count++;
inWord = true;
}
}
return count;
}

public void compare(string[] new_shortened_list, string[] forward_line_numbers, string missing_words, string missing_lines, int i)
{
int test = 0;
string[] myarray = missing_words.Split(' ');
foreach (string temp in myarray)
{
if ((temp == " ") && (temp == "") && (temp == null)) break;

bool result = Regex.IsMatch(new_shortened_list[i], temp);

if (result == false)
{
test = 1;
}
}
if (test == 0)
{
Console.WriteLine(missing_lines);
Console.WriteLine(forward_line_numbers[i]);
}
}
public void difference(string[] new_shortened_list,
string[] working_array,
string[] forward_line_numbers,
int new_how_many_lines,
int how_many_repetitions)
{
string missing_words = "";
string missing_lines = "";
int counter = 0;
int word_counter = 0;
StringBuilder s = new StringBuilder();
StringBuilder t = new StringBuilder();

for (int i = 0; i < new_how_many_lines; ++i)
{
for (int j = 0; j < how_many_repetitions; ++j)
{
bool result_1 = Regex.IsMatch(new_shortened_list[i], working_array[j]); // compare words in each line to every word counted
if (result_1 == false) // if the word is missing in the line
{
bool result_2 = Regex.IsMatch(missing_words, working_array[j]); // is the word missing in the missing word list
if ((result_2 == false) && (counter < (how_many_repetitions - 1))) // if it is missing, and the list of missing words is less than the total number of words, i can write it to the missing word list
{
s.Append(working_array[j]);
s.Append(" ");

t.Append(forward_line_numbers[i]);
t.Append(" ");

counter++;
}
}
}
missing_words = s.ToString();
missing_lines = t.ToString();

word_counter = CountWords(new_shortened_list[i]);
if (word_counter < how_many_repetitions)
{
compare(new_shortened_list, forward_line_numbers, missing_words, missing_lines, i);
}
}
}

public void split_sentence(string[] new_shortened_list, int new_how_many_lines)
{
for (int i = 0; i < new_how_many_lines; ++i)
{
if (new_shortened_list[i].Length > 0)
{
int j = new_shortened_list[i].IndexOf(" ") + 1;
new_shortened_list[i] = new_shortened_list[i].Substring(j);
}
}
}

public void get_first_number(string[] new_shortened_list,
string[] forward_line_numbers,
string[] reverse_line_numbers,
int new_how_many_lines)
{
for (int i = 0; i < new_how_many_lines; ++i)
{
forward_line_numbers[i] = new_shortened_list[i].Split(new char[] { ' ' })[0];
reverse_line_numbers[i] = new_shortened_list[i].Split(new char[] { ' ' })[0];
}
}

public void get_first_word(string[] working_array, int how_many_repetitions)
{
for (int i = 0; i < how_many_repetitions; ++i)
{
working_array[i] = working_array[i].Split(new char[] { ' ' })[0];
}
}

/* working array contents:
* lion 6 1 2 4 5 6 7
* and 6 1 2 3 4 5 6
* mouse 6 1 2 3 4 6 7
*/

/* new_shortened_list array contents:
* 2 lion and mouse
* 3 and mouse
* 4 lion and mouse
* 5 lion and
* 6 lion and mouse
* 7 lion mouse
*/

public void step_1(string[] new_shortened_list,
string[] working_array,
int new_how_many_lines,
int how_many_repetitions)
{
get_first_word(working_array, how_many_repetitions);

string[] forward_line_numbers = new string[new_how_many_lines];
string[] reverse_line_numbers = new string[new_how_many_lines];
get_first_number(new_shortened_list, forward_line_numbers, reverse_line_numbers, new_how_many_lines);

split_sentence(new_shortened_list, new_how_many_lines);

difference(new_shortened_list, working_array, forward_line_numbers, new_how_many_lines, how_many_repetitions);

group_lines(forward_line_numbers, reverse_line_numbers, new_how_many_lines);
}
}
}```
heres the program results;

Code:
```3 5
7
//////////////////
2
7
//////////////////
3
6
//////////////////
4
5
Press any key to continue . . .```

4. I think heapsort would work well here. If you can arrange the items such that smaller strings, and strings that contain words of longer strings are higher in the heap, it should be close to right.

If the idea works, then it should come up with the sequence "3 and mouse", "7 lion mouse", "5 lion and", "6 lion and mouse", "2 lion and mouse", "4 lion and mouse". Even though I'm not really sure why that's the correct order. Making it into a 3x3 grid is a matter of formatting.

I don't have time to implement this myself right now, but there's an idea for you.

5. for a idea like heap sort here is the way it should work, but it may not be heap sort because of the logic.

- node 1 = all the possible words = (lion, and, mouse) = lines 2, 4, 6
- node 2 = less than all words = (lion, mouse) = line 7
- node 3 = less than all words = (mouse, and) = line 3
- node 4 = less than all words = (lion, and) = line 5

then i take node 2 as one type and node 3, 4, as the other type.

then i sort out how the nodes link to node 1;
- node 2 = line 7, node 1 = line 2
- node 3 = line 3, node 1 = line 6
- node 4 = line 5, node 1 = line 4

heres a picture of it;

6. Well there you go. I hope you're successful in making it dude.