Thread: Reading text file and structuring it..

  1. #1
    Registered User
    Join Date
    Nov 2004
    Posts
    17

    Reading text file and structuring it..

    Ok pretty much a newby in programming in C. Can read some data and put it in structures. Just the basic stuff that I had to do and learn at University. However, I now have to write XML in C. I am left up with a text-file that looks like this. Btw..This is not home work!! I graduated long time ago..

    Anyway..

    ************ the text file ***********************


    junk
    junk
    Header containg 21 lines of junk which are of no interest.
    junk
    junk

    /* now come the more interesting stuff */

    xx. CF: 5740W (5345W - 6033W)
    information which also has to be stored and is variable in size.

    xx. CF in CA: 6312E (6222W - 6302W)
    Again some information.

    xx. CAC: 5848W

    etc. etc.

    /* this goes on for some 40 times. Sometimes there is info, and sometimes there is not. There is however always a "xx. " that states the beginning of something important. */

    What I want this program to do is that it reads the text above, and parses it in to some variables. For example I shall parse the first 2 lines, to just let you see what I actually mean.


    ************* the first two lines ****************
    xx. CF: 5740W (5345W - 6033W)
    information which also has to be stored and is variable in size.
    ********************************************

    The first "xx. " can be treated as &junk.
    "CF in CA" is a variable and should be remembered as like for example cmlist[i]CM_NAME
    ": " again &junk
    "57" is an y-position, and should be remembered as cmlist[i]CM_Y
    "40" is an x-position, and should be remembered as cmlist[i]CM_X
    "W" is a variable.. it can be E(=east) or W(=west).. it matters because I then know whether to multiply cmlist[i]CM_X with 1 or -1.
    " (5345W - 6033W)" Can be treated as junk.

    Then we should go to the next line.
    If there is information then cmlist[i]CM_TAG should be set at 1, else at 0.
    If CM_TAG is equal to 1, then "information which also has to be stored and is variable in size." should be stored in cmlist[i]CM_INFO.

    Then we are finished and can raise [i], and do the whole thing again... In this program only 30 to 40 i's are needed.

    scanln and scanf are pretty much new to me. I do have the book of Kernighan and Ritchie, but it does not help me that much on reading text when it is not structured like expected. I do not expect anyone of you to write a C-program that does the above, however.. a start and some hints or tips would be very very welcome.


    Kind regards,
    Killroy

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Consider reading this FAQ entry on getting a line of input. Since you'd be using fgets as your best option, you'd replace stdin with the file pointer.

    Then you can check the first two characters of the buffer for xx, and if they are, then parse the line appropriately with something like sscanf.

    Consider writing out the process on how you'd logicly break it up if you yourself encountered it, before turning it into actual code. It'll help you get the logic of it sorted out first.

    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    Ok.. thanx for the info. I now see your point on why I should use fgets and not fscanf. I did some practise on it. Can scan a text for a single char and replace with another..

    However, How do I go on further on this one? How do I make C tell, if you find "xx. " then fgets everything untill you reach a ":" and then skip one space and read two digits, another two digits, one char and then skip to next line and remember everything untill you hit "xx. " again?

  4. #4
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    You're over complicating.
    Code:
    read a line
    if the first two characters of said line are "xx"
        read from the buffer we read the line into accordingly
    else
        do nothing with this line
    In actuality, there is no need for an "else" here. If the line is what you want, chop it up and read it. I'd suggest figuring out a qualifying format, and using it with sscanf to assign the values from the buffer to variables.

    Quzah.
    Hope is the first step on the road to disappointment.

  5. #5
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    K.. but how do I tell C that if it finds "xx" it should read form the buffer??

    How can I say: If you find "xx. " then fscanf everything (including spaces and/or tabs) untill you reach a ":" and then skip one space and read two digits, another two digits, one char and then skip to next line and then remember everything (which is the info) untill you hit "xx. " again?

    I have been looking at the Web, in Kernighan and Ritchie, but still have not found a suitable answer to write this code. Everytime with "fscanf" it reads something untill it comes to a space, tab or newline and forgets about the rest. I want it to read to for exeample "xx. " or ":"..

  6. #6
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    I first wrote this program on this problem, but i get stuck on one error, which I simply can't solve..
    An error in the line:

    ** if (strcmp (Name, nameDefinitions[idx].Name) == 0) **
    Saying that argument 2 is incompatible with prototype: prototype: pointer to constant to char..


    #include <string.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <ctype.h>
    #include <sys/types.h>
    #include <unistd.h>
    #include <math.h>

    #define MAX_LINE (128+1) /* Maximum length of a source line plus one */

    #define CM_CF 1
    #define CM_CF_CA 2
    #define CM_CF_WA 3
    ....
    ....
    #define CM_info 58
    #define CM_GENE_INFO 59

    #define CM_NAME_LENGTH 15

    typedef struct
    {
    char *Name[CM_NAME_LENGTH];
    int Value;
    } NameDefinition_t;

    typedef struct
    {
    int Index;
    int X; /* Longitude */
    int Y; /* Latitude */
    char Direction; /* Direction, East or West */
    int CommentLines; /* Number of infolines */
    char **Comments;
    } NameDescription_t;

    NameDefinition_t CM_lijst[] = {
    {"CF", "CM_CF"},
    {"CF in CA", "CM_CF_CA"},
    ....
    ....
    {"info", "CM_info"},
    {"GENERAL INFORMATION", "CM_GENE_INFO"}
    };


    /* ************* Function ************************* */

    int FindNameValue (char *Name) {
    int i;

    for (i=0; *CM_lijst[i].Name; i++)
    if (strcmp(Name, CM_lijst[i].Name)==0) /* here it gives me an error */
    return(CM_lijst[i].Value);
    return(0);
    } /* The above function searches the name definitions and returns the associated value */

    /* ************************************************ */

    NameDescription_t *NameDescriptions = NULL;


    /* ************ Start of main ********************* */


    main ()
    {
    NameDescription_t *Name;

    char InputBuffer[MAX_LINE];
    char StartJunk[5], NameString[CM_NAME_LENGTH];
    int X1, X2, X3;
    int Y1, Y2, Y3;
    int idx;
    char C1, C2, C3;
    FILE *STRPfile, *XML;

    for (idx=0; idx<21; idx++) {
    fgets (InputBuffer,MAX_LINE,STRPfile);
    }

    fgets (InputBuffer,MAX_LINE,STRPfile); /* Loop through all the data */
    while (1)
    {
    sscanf (InputBuffer, "%4.4s%s: %2.2d%2.2d%c (%2.2d%2.2d%c - %2.2d%2.2d%c)",
    StartJunk, NameString, &X1, &Y1, &C1, &X2, &Y2, &C2, &X3, &Y3, &C3);

    Name=(NameDescription_t*)malloc(sizeof(NameDescrip tion_t));
    memset (Name,0,sizeof(NameDescription_t));
    switch (FindNameValue (NameString)) {
    case CM_CF: /* Process Cold Front data */
    Name->Index = CM_CF;
    Name->X = X1;
    Name->Y = Y1;
    Name->Direction = C1;
    while (fgets(InputBuffer,MAX_LINE,STRPfile))
    {
    if (memcmp (InputBuffer, "xx. ",4)==0)
    break;
    Name->CommentLines++;
    Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);
    }
    case CM_CF_CA:
    Name->Index = CM_CF_CA;
    Name->X = X1;
    Name->Y = Y1;
    Name->Direction = C1;
    while (fgets(InputBuffer,MAX_LINE,STRPfile))
    {
    if (memcmp (InputBuffer, "xx. ",4)==0)
    break;
    Name->CommentLines++;
    Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);
    }
    .....
    .....
    case CM_GENE_INFO:
    Name->Index = CM_GENE_INFO;
    Name->X = X1;
    Name->Y = Y1;
    Name->Direction = C1;
    while (fgets(InputBuffer,MAX_LINE,STRPfile))
    {
    if (memcmp (InputBuffer, "xx. ",4)==0)
    break;
    Name->CommentLines++;
    Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);
    }
    }
    }
    }

  7. #7
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    Thought it could be a bit simpler than this code on which I get stuck for some weeks, with lots of effort..

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Go read the forum announcements to learn the forum guide lines and how to use code tags. Without code tags, no one will read your code.
    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main( void )
    {
        char buf[BUFSIZ]={0};
    
        printf("I will check for xx starting a line. Enter a line now: ");
        fflush( stdout );
        if( fgets( buf, BUFSIZ, stdin ) != NULL )
        {
            if( strstr( buf, "xx" ) == buf )
            {
                printf("The first two characters in your line were xx.\n");
            }
            else
            {
                printf("The first two characters in your line were NOT xx.\n");
            }
        }
    
        return 0;
    }
    There's a quick hack that will do. Now probably it isn't what you want, but it would work. You'd be better off checking to see if there is any white space to start the line and then skipping it, but that wasn't the requirement you had.

    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    Thank you for the tip.. it will sure and help me to get to a higher level programming this. Maybe I am not doing it the right way, but i am learning and eventually i will get there.

  10. #10
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    This better?

    I first wrote this program on this problem, but i get stuck on one error, which I simply can't solve..
    An error in the line:

    ** if (strcmp (Name, nameDefinitions[idx].Name) == 0) **
    Saying that argument 2 is incompatible with prototype: prototype: pointer to constant to char..


    Code:
    #include <string.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <ctype.h>
    #include <sys/types.h>
    #include <unistd.h>
    #include <math.h>
    
    #define MAX_LINE (128+1) /* Maximum length of a source line plus one */
    
    #define CM_CF 1 
    #define CM_CF_CA 2
    #define CM_CF_WA 3
    ....
    ....
    #define CM_info 58
    #define CM_GENE_INFO 59
    
    #define CM_NAME_LENGTH 15
    
    typedef struct
    {
    char *Name[CM_NAME_LENGTH];
    int Value;
    } NameDefinition_t;
    
    typedef struct
    {
    int Index;
    int X; /* Longitude */
    int Y; /* Latitude */
    char Direction; /* Direction, East or West */
    int CommentLines; /* Number of infolines */
    char **Comments;
    } NameDescription_t;
    
    NameDefinition_t CM_lijst[] = {
    {"CF", "CM_CF"},
    {"CF in CA", "CM_CF_CA"},
    ....
    ....
    {"info", "CM_info"},
    {"GENERAL INFORMATION", "CM_GENE_INFO"}
    };
    
    
    /* ************* Function ************************* */
    
    int FindNameValue (char *Name) {
    int i;
    
    for (i=0; *CM_lijst[i].Name; i++)
    if (strcmp(Name, CM_lijst[i].Name)==0) /* here it gives me an error */
    return(CM_lijst[i].Value);
    return(0);
    } /* The above function searches the name definitions and returns the associated value */
    
    /* ************************************************ */
    
    NameDescription_t *NameDescriptions = NULL;
    
    
    /* ************ Start of main ********************* */
    
    
    main ()
    {
    NameDescription_t *Name;
    
    char InputBuffer[MAX_LINE];
    char StartJunk[5], NameString[CM_NAME_LENGTH];
    int X1, X2, X3;
    int Y1, Y2, Y3;
    int idx;
    char C1, C2, C3;
    FILE *STRPfile, *XML;
    
    for (idx=0; idx<21; idx++) {
    fgets (InputBuffer,MAX_LINE,STRPfile);
    }
    
    fgets (InputBuffer,MAX_LINE,STRPfile); /* Loop through all the data */
    while (1)
    {
    sscanf (InputBuffer, "%4.4s%s: %2.2d%2.2d%c (%2.2d%2.2d%c - %2.2d%2.2d%c)",
    StartJunk, NameString, &X1, &Y1, &C1, &X2, &Y2, &C2, &X3, &Y3, &C3);
    
    Name=(NameDescription_t*)malloc(sizeof(NameDescrip tion_t));
    memset (Name,0,sizeof(NameDescription_t));
    switch (FindNameValue (NameString)) {
    case CM_CF: /* Process Cold Front data */
    Name->Index = CM_CF;
    Name->X = X1;
    Name->Y = Y1;
    Name->Direction = C1;
    while (fgets(InputBuffer,MAX_LINE,STRPfile))
    {
    if (memcmp (InputBuffer, "xx. ",4)==0)
    break;
    Name->CommentLines++;
    Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    Name->Comments[Name->CommentLines-1] = strdup(InputBuffer); 
    } 
    case CM_CF_CA:
    Name->Index = CM_CF_CA;
    Name->X = X1;
    Name->Y = Y1;
    Name->Direction = C1;
    while (fgets(InputBuffer,MAX_LINE,STRPfile))
    {
    if (memcmp (InputBuffer, "xx. ",4)==0)
    break;
    Name->CommentLines++;
    Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    Name->Comments[Name->CommentLines-1] = strdup(InputBuffer); 
    }
    .....
    .....
    case CM_GENE_INFO:
    Name->Index = CM_GENE_INFO;
    Name->X = X1;
    Name->Y = Y1;
    Name->Direction = C1;
    while (fgets(InputBuffer,MAX_LINE,STRPfile))
    {
    if (memcmp (InputBuffer, "xx. ",4)==0)
    break;
    Name->CommentLines++;
    Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    Name->Comments[Name->CommentLines-1] = strdup(InputBuffer); 
    }
    }
    }
    }

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    typedef struct
    {
        char *Name[CM_NAME_LENGTH];
        int Value;
    } NameDefinition_t;
    See how much nicer it is when we indent? At any rate...

    "Name" above is an array of pointers to characters. To be exact, it's an array of fifteen pointers to characters. Thus, if this is really what you want, your table would go something like this:
    Code:
    NameDefinition_t CM_lijst[] = {
        { { "something", "something", "something" }, SOMEVALUE }, /* the first entry in the table */
        { { "stuff", "stuff", "stuff", "stuff", "stuff", "stuff", }, AVALUE }, /* the second... */
        { { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14" }, NUMBER }, /* the third */
        ...and so on...
    GENERAL INFORMATION", "CM_GENE_INFO"}
    };
    Thus, your strcmp line then could be something like:
    Code:
    int dosomething( const char *name )
    {
        if( strcmp( name, CM_lijst[ n ].Name[ x ] ) == 0 )
        {
            ...match...
        }
        ...whatever...
    }
    Is that what you're trying to do?

    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    Nope.. i do not really think that is what I want to do.

    Good to share some light on this matter, now lets dig in it one more time, perhaps it has opened my eyes.

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Code filled with .... means we can't compile it
    Pick a simple case which demonstrates the problem as you see it, which we can compile and run.

  14. #14
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    K.. fixed the problem.. no more errors, just some massive core dumps. I mean really massive, the systemadministrator already passed by.
    Ayone an idea what I do wrong? I also placed a testfile.. You should copy it to your notepad and save it as text.txt

    Code:
    #include <string.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <ctype.h>
    #include <sys/types.h>
    #include <unistd.h>
    #include <math.h>
    
    #define MAX_LINE	(128+1) /* Maximum length of a source line plus one */
    #define CM_NAME_LENGTH	20
    
    #define CM_CF		1 
    #define CM_CF_CA	2
    #define CM_WF_Shield	3
    #define CM_Comma	     4
    #define CM_Lee_Cld	5
    
    
    typedef struct
    {
      char	Name[CM_NAME_LENGTH];
      int	Value;
    }  NameDefinition_t;
    
    typedef struct
    {
      int Index;
      int X; /* Longitude */
      int Y; /* Latitude */
      char Direction; /* Direction, East or West */
      int CommentLines; /* Number of infolines */
      char **Comments;
    }  NameDescription_t;
    
    NameDefinition_t CM_lijst[] = {
    {"CF", CM_CF},
    {"CF in CA", CM_CF_CA},
    {"WF Shield", CM_WF_Shield},
    {"Comma", CM_Comma},
    {"Lee Cloud", CM_Lee_Cld},
    {0, 0}
    };
    
    /* Open Structure to write XML file*/
    
    typedef struct WRITER {
    	float lat;
    	float lon;
    	char CM[50];
    	char info[2000];
    	char dir;
    	int cl;
    	}WRITE;
    
    WRITE *StrXML;
    
    
    /* ************* Function ************************* */
    
    int FindNameValue (char *Name) {
    	int i;
    
    	for (i=0; *CM_lijst[i].Name; i++)
    	  if (strcmp(Name, CM_lijst[i].Name)==0)
    	  return(CM_lijst[i].Value);
    	return(0);
    } /* The above function searches the name definitions and returns the associated value */
    
    /* ************************************************ */
    
    NameDescription_t *NameDescriptions = NULL;
    
    
    /* ************ Start of main ********************* */
    
    
    main ()
    {
      NameDescription_t *Name;
    
    char InputBuffer[MAX_LINE];
    char StartJunk[5], NameString[CM_NAME_LENGTH];
    int X1, X2, X3;
    int Y1, Y2, Y3;
    int idx, n, k;
    char C1, C2, C3;
    FILE *STRPfile, *XML;
    
    idx=0;
    n=0;
    k=0;
    
    /* Allocate memory to later write XML file*/
    StrXML=(WRITE*)malloc(256*256*sizeof(WRITE));
    
    if ((STRPfile=fopen("test.txt","r"))==NULL){
    	fprintf(stderr,"Error opening info-file :%s\n","test.txt");
    	exit(1);
       }
    
      for (idx=0; idx<2; idx++) {
        fgets (InputBuffer,MAX_LINE,STRPfile);
      }
    
      fgets (InputBuffer,MAX_LINE,STRPfile);  /*  Loop through all the data  */
      while (!feof(STRPfile))
      {
        sscanf (InputBuffer, "%4.4s%s: %2.2d%2.2d%c (%2.2d%2.2d%c - %2.2d%2.2d%c)",
          StartJunk, NameString, &X1, &Y1, &C1, &X2, &Y2, &C2, &X3, &Y3, &C3);
    
        Name=(NameDescription_t*)malloc(sizeof(NameDescription_t));
        memset (Name,0,sizeof(NameDescription_t));
        switch (FindNameValue (NameString)) {
    case  CM_CF:   /*  Process Cold Front data  */
          Name->Index = CM_CF;
          Name->X = X1;
          Name->Y = Y1;
          Name->Direction = C1;
          while (fgets(InputBuffer,MAX_LINE,STRPfile))
          {
            if (memcmp (InputBuffer, "xx",2)==0)
    	break;
    	Name->CommentLines++;
    	Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    	Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);	
    	} 
    case CM_CF_CA:
          Name->Index = CM_CF_CA;
          Name->X = X1;
          Name->Y = Y1;
          Name->Direction = C1;
          while (fgets(InputBuffer,MAX_LINE,STRPfile))
          {
            if (memcmp (InputBuffer, "xx",2)==0)
    	break;
    	Name->CommentLines++;
    	Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    	Name->Comments[Name->CommentLines-1] = strdup(InputBuffer); 
    }
    case CM_WF_Shield:
          Name->Index = CM_WF_Shield;
          Name->X = X1;
          Name->Y = Y1;
          Name->Direction = C1;
          while (fgets(InputBuffer,MAX_LINE,STRPfile))
          {
            if (memcmp (InputBuffer, "xx",2)==0)
    	break;
    	Name->CommentLines++;
    	Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    	Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);
    }
    case CM_Comma:
          Name->Index = CM_Comma;
          Name->X = X1;
          Name->Y = Y1;
          Name->Direction = C1;
          while (fgets(InputBuffer,MAX_LINE,STRPfile))
          {
            if (memcmp (InputBuffer, "xx",2)==0)
    	break;
    	Name->CommentLines++;
    	Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    	Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);
    }
    case CM_Lee_Cld:
          Name->Index = CM_Lee_Cld;
          Name->X = X1;
          Name->Y = Y1;
          Name->Direction = C1;
          while (fgets(InputBuffer,MAX_LINE,STRPfile))
          {
            if (memcmp (InputBuffer, "xx",2)==0)
    	break;
    	Name->CommentLines++;
    	Name->Comments=(char**)realloc(Name->Comments,sizeof(char*)*Name->CommentLines);
    	Name->Comments[Name->CommentLines-1] = strdup(InputBuffer);
    }
           } /* End of switch-loop */
        } /* End of while-loop */
    
    fclose (STRPfile);
    
    } /* End of program */

    Test file
    Code:
    TO MLXX WNWN
    110600 ZAXX
    TXEM41 LOYZ 110600
    xx. CF: 5740W (5345W - 6033W)
        Anticyclonic curvature of the cloud band but classical 
        distribution of the TA.
    
    xx. CF in CA: 6312W (6222W - 6302W)
        VCS shows CA ahead of the CF (below 700 hPa).
    
    xx. WF Shield: 5407W (4910W - 5902W)
        Weak TA < 0, otherwise classical WF.
    
    xx. Lee Cloud: 4101W
    
    xx. Lee Cloud: 4811E
    
    xx. Comma: 4608E (4606E - 4610E)
    In previous post it was too big. My apologies. This program will also compile and show the trick.
    Last edited by Killroy; 11-18-2004 at 09:32 AM. Reason: Too big

  15. #15
    Registered User
    Join Date
    Nov 2004
    Posts
    17
    Hmm... obvious.. I am like complaining about massive core dumps, you have to be nuts to even compile this program and run it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Structuring a file for I/O
    By John.H in forum C Programming
    Replies: 2
    Last Post: 01-29-2003, 10:28 AM