Questions to Structure - Beginner needs help!

**Adak** · 02-29-2012

Well unfortunately, I have moved on before this last problem, and no longer have that version original, in my computer.

Try this version and see if it works. It is different though.

Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

struct alist {
   char word[26];
   int count;
};

int main(void) {
   FILE * fp;
   int i,j,len,totalWords, totalDupes, unique, wasDup;
   struct alist list[600];
   struct alist temp;
   
   if((fp=fopen("Edelweiss.txt","r")) == NULL) {
         printf("Error! File did not open - closing program.\n");
         return 1;
   }
   printf("\n Word Analysis:\n\n                            Word    Frequency\n");
   printf("  ============================================\n");
   i = totalWords = wasDup = totalDupes = 0;
   while((fscanf(fp, "%s", list[i].word)) > 0) {
      len = strlen(list[i].word);
      while(!isalpha(list[i].word[len])) 
         list[i].word[len--]= '\0';

      for(j=0;j<i;j++) {
         if(!strcmp(list[j].word, list[i].word)) { //words match
            list[j].count++;                        //add to the tally
            --i;
            wasDup=1;                               //it was a duplicate word
            ++totalDupes;
         }
      }
      if(!wasDup)                                  //it was not a duplicate word
         list[i].count++;
      wasDup=0;

      ++i; 
      ++totalWords;
   }
   fclose(fp);
    
   unique = (totalWords - totalDupes); 
   
   printf("\n Sorted Alphabetically:\n\n");
   for(i=0;i<unique-1;i++) {
      for(j=i+1;j<unique;j++) {
         if(strcmp(list[i].word, list[j].word) > 0) {
            temp = list[i];
            list[i] = list[j];
            list[j] = temp;
         }
      }
   }
    
   for(i=0;i<unique;i++)
      printf("%4d. %26s %4d\n",i+1,list[i].word, list[i].count);

   printf("\nTotal Words: %d  Duplicate Words: %d  Unique Words: %d\n",totalWords,totalDupes,unique);
   printf("\n\n Press enter when ready, to see the next page of output\n");
   getchar();
      
   printf("\n Sorted by Word Frequency:\n\n");
   for(i=0;i<unique-1;i++) {
      for(j=i+1;j<unique;j++) {
         if(list[i].count < list[j].count) {
            temp = list[i];
            list[i] = list[j];
            list[j] = temp;
         }
      }
   }
   
   
   for(i=0;i<unique;i++)
      printf("%4d. %26s %4d\n",i+1,list[i].word, list[i].count);

   printf("\n");
   return 0;
}

Be sure to copy it ONLY after you hit the Plain Text button.

**Fresa** · 02-29-2012

I'm sorry... the frequency is also just a silly row of numbers. I coppied correct...

And on your computer it shows this correct??

**Adak** · 02-29-2012

Originally Posted by Fresa

I'm sorry... the frequency is also just a silly row of numbers. I coppied correct...

And on your computer it shows this correct??

Yes, it shows output like this:

Code:

 Word Analysis:

                            Word    Frequency
  ============================================

 Sorted Alphabetically:

   1.                    Blossom    1
   2.                  Edelweiss    2
   3.                      Small    1
   4.                        and    4
   5.                      bless    1
   6.                      bloom    2
   7.                     bright    1
   8.                      clean    1
   9.                  edelweiss    2
  10.                      every    1
  11.                    forever    2
  12.                      greet    1
  13.                       grow    2
  14.                      happy    1
  15.                   homeland    1
  16.                       look    1
  17.                        may    1
  18.                         me    2
  19.                       meet    1
  20.                    morning    1
  21.                         my    1
  22.                         of    1
  23.                       snow    1
  24.                         to    1
  25.                      white    1
  26.                        you    3

Total Words: 37  Duplicate Words: 11  Unique Words: 26


 Press enter when ready, to see the next page of output

 Sorted by Word Frequency:

   1.                        and    4
   2.                        you    3
   3.                      bloom    2
   4.                  edelweiss    2
   5.                    forever    2
   6.                       grow    2
   7.                         me    2
   8.                  Edelweiss    2
   9.                    Blossom    1
  10.                      every    1
  11.                      bless    1
  12.                      greet    1
  13.                      Small    1
  14.                      happy    1
  15.                   homeland    1
  16.                       look    1
  17.                        may    1
  18.                     bright    1
  19.                       meet    1
  20.                    morning    1
  21.                         my    1
  22.                         of    1
  23.                       snow    1
  24.                         to    1
  25.                      white    1
  26.                      clean    1

Are you perhaps in Debug mode?

**kevinstrijbos** · 02-29-2012

Well, after reading the first comment I was like: hell no.
An array of 50 000 structs? Not a good way to make a habit of coding efficiently.
A linked list would be the solution here

**Adak** · 02-29-2012

Try stepping through it, watching the variables, especially the list.counter values.

**MK27** · 02-29-2012

Originally Posted by kevinstrijbos

Well, after reading the first comment I was like: hell no.
An array of 50 000 structs? Not a good way to make a habit of coding efficiently.
A linked list would be the solution here

No, arrays are much more efficient if the content is static (ie, does not require insertion, deletion or re-ordering).

**Fresa** · 02-29-2012

Oh, now you swamp me

It's hard to understand your discussion without a good knowledge and also the vocabulary... Phew...

Of course i was in the debug-mode. shouldn't I? Up to now at school we always just debug.
But you are rigth, when I choose release-mode... IT WORKS!!!
But why doesn't it work in the debug-mode?
Now I will test it wirh my own program and then I try to understand everything. After that I will have some questions and then I want to understand your discussion.
AND anfterwards we can try the next part of the task...
But for this day, I think, it's enough!

THANK YOU!!!

**Adak** · 02-29-2012

Originally Posted by kevinstrijbos

Well, after reading the first comment I was like: hell no.
An array of 50 000 structs? Not a good way to make a habit of coding efficiently.
A linked list would be the solution here

Linked lists are elegant in design, but:

1) They take longer to program - sometimes much longer - than an array.

2) Without indexing, etc., they take much longer to access the values, being sequential by nature. Algorithms like "Dancing Links", show the power linked lists can bring to some problems.

It sounds like you're used to working with just the stack memory (1-3MB is typical), and not the entire heap portion of memory that C can use. It's a much bigger memory world than what you're used to.

Global and allocated memory are memory from the heap, btw.

Students learn arrays before they are taught linked lists, so a suggestion to use linked lists for an assignment, can definitely be off the mark, If the student doesn't specify use of a linked list, you can bet an array is what they need to use.

**Adak** · 02-29-2012

Debug mode makes certain assumptions about your program - generally they're good, but sometimes (as here), they stubbornly get tied up in a knot they can't get out of.

You could try and rebuild the project, or even closing and re-opening the IDE you're using, but sometimes it's just not clear why the Debug mode compile will fail, and the release mode will work. Looking at the generated assembly is about the only way to REALLY tell what went wrong, and I'm not about to dive into THAT pile of goo!

Glad you got it working!

**Fresa** · 03-01-2012

The first thing I have to say: In the release-mode it also works with my Demotextdatei.txt. Now it founds 591 words (total) and 251 unique words. That's nearly correct. According the solutions the total count have to be 590 words. I mailed to my teacher and got some more infomation:
- In the Demodatei.txt is one word, that have a "forbidden" length. The word "Verschlüsselungsalgorithmus" has got 27 signs. So I have to filter it. At the moment it is also list, however it could be (---> "... word[26] ..."!!)
But there have to be just 236 unique words. Up to now I couldn't find the mistake. Just word number 173 to 176 are in question:
173 b
174 B
175 C (here the frequency is "0"????)
176 (blank space)

also

148 z
149 a

But the program also differs in upper case/ capitalization. I think this will be the problem, because the total count ist correct, but just the "forbidden" word with 27 signs...

Now to your duscussion. The most important content of our task is to work with this special structure.
This was given:

Code:

struct alist {
    char word[26];
    int count;
};

My teacher also wrote me, that a buffer storage of 5000 different words should be possible in a static array. Now I am confused. Later I should try to expand my program with pointers and linked lists. But this is always something different, isn't it? Or should I ask the user of my program at the beginning, how he want to work?

Another question: Is it possible to open multiple files and to analyze them in one step?

I created a menu. Now the user could choose how the worlds should been list. After resolving the problem with the 236 words (and discussing about arrays and linked lists) and pick out the 27-sign-word we should go on with this step.

Code:

printf("\nHow should the words been list? \n1: alphabetically \n2: by frequency");
scanf("%d",&entry);        

        while(k==2)
        {
            switch(entry)
            {
                case 1: k=1; printf("\ntest: alphabetically\n"); break;
                case 2: k=1; printf("\ntest: by frequency\n");break;
                default: printf("\nWrong input! Please retry.\n"); fflush(stdin); k=2; scanf("%d",&entry);
            }
        }

**Adak** · 03-01-2012

Originally Posted by Fresa

The first thing I have to say: In the release-mode it also works with my Demotextdatei.txt. Now it founds 591 words (total) and 251 unique words. That's nearly correct. According the solutions the total count have to be 590 words. I mailed to my teacher and got some more infomation:
- In the Demodatei.txt is one word, that have a "forbidden" length. The word "Verschlüsselungsalgorithmus" has got 27 signs. So I have to filter it. At the moment it is also list, however it could be (---> "... word[26] ..."!!)
But there have to be just 236 unique words. Up to now I couldn't find the mistake. Just word number 173 to 176 are in question:
173 b
174 B
175 C (here the frequency ist "0"????)
176 (blank space)

also

148 z
149 a

But the program also differs in upper case/ capitalization. I think this will be the problem, because the total count ist correct, but just the "forbidden" word with 27 signs...

Did you teacher say what you should do with the word that is too long? Maybe enlarge the word array or add code to skip that one word or ?

Now to your duscussion. The most important content of our task is to work with this special structure.
This was given:

Code:

struct alist {
    char word[26];
    int count;
};

My teacher also wrote me, that a buffer storage of 5000 different words should be possible in a static array. Now I am confused.

Yes, you can handle 5000 words in a local function, without using heap memory.

Later I should try to expand my program with pointers and linked lists. But this is always something different, isn't it? Or should I ask the user of my program at the beginning, how he want to work?

The best way I've found to do it is this:

They tell me what they want to do.

I figure out how to do it.

They have no input about the type of data structures or algorithm I use. That's all up to me.

They SHOW me how they want to work, so the user interface and features can correspond to it. Never take just a description, because it will always be incomplete, inaccurate, and result in a bad interface if it's not really something simple.

Another question: Is it possible to open multiple files and to analyze them in one step?

Yes on the open multiple files. Analysis? Perhaps yes.

I created a menu. Now the user could choose how the worlds should been list. After resolving the problem with the 236 words (and discussing about arrays and linked lists) and pick out the 27-sign-word we should go on with this step.

Code:

printf("\nHow should the words been list? \n1: alphabetically \n2: by frequency");
scanf("%d",&entry);        

        while(k==2)
        {
            switch(entry)
            {
                case 1: k=1; printf("\ntest: alphabetically\n"); break;
                case 2: k=1; printf("\ntest: by frequency\n");break;
                default: printf("\nWrong input! Please retry.\n"); fflush(stdin); k=2; scanf("%d",&entry);
            }
        }

You need to work on that menu. While(k==2) is not what you want.
You want a menu with the 1 and 2 options, and the default - that's fine, but then you want an option to either quit the menu function (back up), or quit the whole program. What about where k==1?

**Fresa** · 03-01-2012

Did you teacher say what you should do with the word that is too long? Maybe enlarge the word array or add code to skip that one word or ?

I think I should skip this word, but maybe it is useful tu list it in an extra printf("");. Just to control, thar it works right and for better controlling for my teacher

Yes, you can handle 5000 words in a local function, without using heap memory.

That sounds good, so I haven't to change anything?

They tell me what they want to do.

I figure out how to do it.

They have no input about the type of data structures or algorithm I use. That's all up to me.

They SHOW me how they want to work, so the user interface and features can correspond to it. Never take just a description, because it will always be incomplete, inaccurate, and result in a bad interface if it's not really something simple.

Who is "they"?? The user?
I'm sorry, but I don't understand the thing with the description. Maybe I have to create two cases (switch) or something else. And at the beginning the user could choose how the file should be analyzed and then the program starts to work in the according case (switch).

I try to explain my ideas of this menu.
k is initialzed: int k=2;
so the it works in the switch. If I do a wrong entry I have a new try. No problem... and then, if I type 1 or 2 I will get to case 1 or 2... now I have to complete this cases. To put the correct algorithm in it or another variable and after the while-loop an algorithm with ifs or something that is similar. Maybe there is another method, that is better, shorter or eseayer. Do you have an idea?

(Maybe after this menu I have to set k=2 again, if I want to go back to this menu later.)
Info: My teacher don't like gotos!

EDIT: How can I resolve the problem, that "Edelweiss" and "edelweiss" is the same word? At the moment the program differs.

**Fresa** · 03-01-2012

I read some minutes ago, that I will have problems with "umlauts" with ctype.h!!! is this right? oh noo!

**Adak** · 03-01-2012

I added this, inside main():

1. char buffer[100];
2. fscanf goes into the buffer, instead of link[i].word, now
3. len gets the strlen(buffer), instead of link[i].word, now
4. if(len > 25) continue;
Immediately follows #3 above.

5. strcpy(link[i].word, buffer), after #4, to put only good words, into the array of struct words
6. link[i].word[j] = tolower(link[i].word[j] is used in a for loop, to convert all the words into lowercase, before any other comparisons are made.

Final output is all lowercase words. Words that are too long are not included or counted.

Did you happen to read how to handle umlauts in the program? I don't have umlauts, so I know nothing about them.

**Fresa** · 03-01-2012

I hosted my Demodatei.txt and there are umlauts... stupid German...
Before I settle yout steps please tell me, that we could resolve the problems with the umlauts!!!

Maybe there is another way with fgetc and so on?

Or maybe we can tranform the umlauts like this:
ä -> ae
ö -> oe
ü -> ue ...

is this possible?

So there isn't anymore the word "geöffnet" in the list, but "geoeffnet"?

IS THIS POSSIBLE????

and something like this doesn't work correct (and I can't do this for the whole alphabet): if(list[i].word[i] = 'S') list[i].word[i] = 's';