Thread: Simple File Creation Algorithm

  1. #1
    Registered User
    Join Date
    Aug 2001
    Posts
    20

    Simple File Creation Algorithm

    I might be going about this the wrong way, I'm fairly new to C. I don't have the code with me to post, but here is what I am trying to accomplish:

    Problem: Read in a fixed length file, with fixed length records. Each field in the record is also fixed in length, but each field contains unwanted spaces, possible tabs, etc. I need to read in the records of this file, and produce a separate output file that contains 1 record output for every record input. Each output record needs to contain the same fields that were in the input record. Each field needs to be "massaged" (i.e. unwanted spaces removed, etc.) and field delimeters need to be placed in the output file between each of the fields within the output records.

    I tried to do this using C, with what I know as traditional algorithm for reading/writing files.
    I read a record into a structure, which was defined containing strings for each field. I was able to process the records as input ok, using fread. Then, I had a similar output structure defined, with output strings defined for each field. I moved the input fields (after messaging them) into the output fields. Then I used fwrite to write the output structure. My output was garbled. Everything looked wrong, and it looked like structures were being overlayed, so I opted to process one byte at a time. I figured that I was not clear on how to use strings inside structures, especially since there seems to be an extra byte at the end of each string for end of string marker, and I do not want that byte in my output file, as I need to use a specially defined field delimeter (my output is going to another software routine that I cannot change).

    Anyway, here is what I attempted to do, byte by byte, and it works. However, it is too much IO, and I want to change this to the right way, using one write per record, but need help!

    1. Read in a fixed length file of fixed length records (text).
    2. For each record I read in I have defined a structure, and am using fread to read them in.
    3. While there are records in the input file:
    3.a For each field:
    3.a.1 Examine and strip off spaces, or otherwise reformat the field
    3.a.2 Write the field (one byte at time) to an output file
    3.a.3 Write a delimiter at the end of each field
    3.b At end of each record, place a newline delimiter, and write that field to the file
    4. Close input and output files and return.


    I admit this is too cumbersome and does too much IO. I need to change this so that I only build output strings into a structure and then write one structure out for each record.

    Perhaps I don't understand the basics of string processing.

    Any suggestions? Much appreciated ..... the books were a little confusing to me right now.

    Thanks!

  2. #2
    Anti-Terrorist
    Join Date
    Aug 2001
    Location
    mming, Game DevelopmentCSR >&<>&2Minimization of boolean functions, PROM,PLA design >&0>&WA, USA guitar, dogsCommercial Aviation >&>>&USAProgramming
    Posts
    742
    If you are not working with binary files but instead are using text files than read the records with 'fscanf' instead of 'fread' and write the records to the new file using 'fprintf' instead of 'fwrite'.

    After reading most of your message it sounds like you mentioned that you have 2 structures. I don't think that is necessary. It would be way easier to identify the problem if you posted some code. This is a standard type of question, 'fileI/O'.

    It would also be good to post the first few lines of the data file and how you want the output file to look like. This would be so easy to do if this new message board was working properly but unfortunately the attachments are not operational. The Webmaster is currently ammending this. This is a new website and it has lots of bugs in it.

    A number of people could help you with this but give some more information. The description was detailed but seeing the file records and what output you want to have in the output file is the main question of concern because after seeing a sample of that, than the remedy is old hat.

    Treat the question like a 'black box'

    input --- process ---- output

    Just tell us the input and the output.
    Last edited by Witch_King; 08-21-2001 at 04:41 PM.
    I compile code with:
    Visual Studio.NET beta2

  3. #3
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    Thanks ... explicit does work better than a bunch or words. Here we go:

    Example Input:
    -------------
    0003328763 039485 Smith , MaryAnne M. 6/21/2001 5/22/2001 7/8/2001 Customer purchased clothing articles from Women's Wear 10 150.47
    0039405832 034567 Williams John M 10/21/2000 11/14/2000 12/8/2000 Customer purchased electronic equipment from Electronics 3 649.45


    Example Output:
    --------------
    0003328763|039485|Smith^MaryAnne^M^|20010621|20010 522|20010708|Customer purchased clothing articles_Women's Wear|10|150.47
    0039405832|034567|Williams^John^M^|20001021|200011 14|20001208|Customer purchased electonic equipment_Electronics|3|649.45

    Each of the output records needs a carriage control to be the last character of that record.

    Thank you!!!!!

  4. #4
    Registered User
    Join Date
    Aug 2001
    Posts
    20

    FILE IO

    My previous message became different that what I had sent. I intended to have more spaces between the fields on the input records. I must edit those out of the output, and there are multiple spaces between fields.

  5. #5
    Anti-Terrorist
    Join Date
    Aug 2001
    Location
    mming, Game DevelopmentCSR >&<>&2Minimization of boolean functions, PROM,PLA design >&0>&WA, USA guitar, dogsCommercial Aviation >&>>&USAProgramming
    Posts
    742
    Code:
    #include "StdAfx.h"
    
    #include<stdio.h>
    #include<conio.h>
    #include<stdlib.h>
    
    typedef struct node_s
    {
    	char sMember1[20];//0003328763
    	char sMember2[10];//039485 
    	char sMember3[12];//Smith 
    	char sMember4[15]; //, MaryAnne 
    	char sMember5[2];//M. 
    	char sMember6[12];//6/21/2001
    	char sMember7[12];//5/22/2001 
    	char sMember8[12];//7/8/2001 
    	char sMember9[60];//Customer purchased clothing articles from Women's Wear
    	int sMember10;//10 
    	double sMember11;//150.47
    	struct node_s * linkp;
    }node_t;
    
    //prototypes
    FILE *OpenTheFile(FILE *);
    node_t * LoadList(node_t *, FILE *);
    void Display(node_t *);
    
    int main()
    {
    	FILE *fPtr = NULL;
    	node_t * hp = NULL;
    	fPtr = OpenTheFile(fPtr);
    	hp = LoadList(hp,fPtr);
    	Display(hp);
    
    	return 0;
    }
    ///////////////////////////////
    FILE *OpenTheFile(FILE *fPtr)
    {
    	fPtr = fopen("TextFile1.txt","r");
    	if(fPtr == NULL)
    	{
    		fprintf(stderr,"File not found");
    		_getch();
    		exit(EXIT_FAILURE);
    	}
    	return fPtr;
    }
    ///////////////////////////////
    node_t * LoadList(node_t * hp, FILE * fPtr)
    {
    	int status;
    	char * ptr = NULL;
    	char sTemp[20];
    	node_t *currentp, *previousp, *newp;
    	currentp = previousp = newp = NULL;
    	
    	while(true)
    	{
    		//create a new node used to store a record from the input file
    		newp = (node_t *) malloc (sizeof(node_t));
    		newp->linkp = NULL;
    		
    		//fill node, and validate input file record
    		status = fscanf(fPtr,"%s",sTemp);
    		ptr = sTemp;
    		++ptr;
    		if (strcmp(".",sTemp) == 0 || strcmp(",",sTemp) == 0)
    		{
    			status = fscanf(fPtr,"%s",sTemp);
    		}else
    		if ( *ptr == '.')
    		{
    			*ptr = ' ';
    		}else
    		if	(status == EOF)
    		{
    			free(newp);
    			break;
    		}
    		
    		strcpy(newp->sMember1,sTemp);
    			
    		/*
    			%[.,]s%[.,]s%[.,]s%s%s%s%d%lf", newp->sMember1,
    			newp->sMember2,newp->sMember3,newp->sMember4,newp->sMember5,
    			newp->sMember6,newp->sMember7,newp->sMember8,newp->sMember9,
    			&newp->sMember10,&newp->sMember11);
    		*/
    		//if no record was found than the file is empty or else
    		//the list if full
    
    
    		//If the new node is the first node in the list than designate
    		//it as the head of the linked list (build list order routine)
    		if(hp == NULL)
    		{
    			hp = newp;
    		}else
    		{
    			//otherwise insert new node at the end of the list
    			currentp = hp;
    			while(currentp)
    			{
    				previousp = currentp;
    				currentp = currentp->linkp;
    			}
    			previousp->linkp = newp;
    		}
    	
    		//secure list for later attempt to traverse the list
    		//by closing the end nodes pointer
    		newp->linkp = NULL;
    	}//end of while loop	
    
    	return hp;
    }
    
    void Display(node_t *hp)
    {
    
    	while(hp)
    	{
    		printf("%s ",hp->sMember1);
    	/*
    		printf("%s %s %s %s %s %s %s %s %s %d %lf", hp->sMember1,
    			hp->sMember2,hp->sMember3,hp->sMember4,hp->sMember5,
    			hp->sMember6,hp->sMember7,hp->sMember8,hp->sMember9,
    			hp->sMember10,hp->sMember11);
    	*/
    		hp = hp->linkp;
    	}
    }
    Look at this code. It scans the contents of the file but it doesn't load all the list members. I'm 2/3 of the way through the program but I have to stop and study for a 'statistics' exam tomorrow. If you want to check back tomorrow than I'll finish the program. I don't have time now. When I finish it off I'll load the structure properly. This just scans all the data and removes and periods or commas. It can probably be improved but I have to run to my other books. Very rough draft but when I'm done it will look pretty.
    I compile code with:
    Visual Studio.NET beta2

  6. #6
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    This is great! Thanks so much!

  7. #7
    Anti-Terrorist
    Join Date
    Aug 2001
    Location
    mming, Game DevelopmentCSR >&<>&2Minimization of boolean functions, PROM,PLA design >&0>&WA, USA guitar, dogsCommercial Aviation >&>>&USAProgramming
    Posts
    742
    I can finish this off after I get back from school this afternoon, well actually late afternoon. There's some junk in the code right now because I was rushing to see if I could remove the commas and periods. There is a better way to do it. I can scan every word and make sure that it isn't a comma or a period and also check the end of the string to see if there is a comma or a period. Also the string of variable length needs to be concatenated to form one big string. I hadn't done that at all yet. I was writing this and at one point I figured that I did it wrong because there was no need for any structure because there was no updating of the information, but on second thought the structure might actually be necessary because it will make it easier to remove the corruption from the file. It is important though that the fields are all like so:

    code number
    0003328763|
    code number 2
    039485|
    Name
    Smith^MaryAnne^M^|
    code number 3 (see this gets modified)
    20010621|
    code number 4 (modified)
    20010522|
    code number 5 (modified)
    20010708|
    string description
    Customer purchased clothing articles_Women's Wear|
    number
    10|
    price?
    150.47

    I'm not sure what each thing is supposed to represent so I'll just call them code numbers, etc. I noticed that some of the numbers get modified. This is why the linked list will be important. Okay so this is the format.

    code number| code number 2| Name| code number 3| code number 4| code number 5| string description| number| price

    The names don't mattter, but what matters is that these fields remain costant in the input file. None of these fields can be in any strange order. Commas, periods, forward slashes, white space, is that all, is there any other junk in the input file that has to be removed? I have to know if there is. But having accounted for other junk and if the order remains constant than the problem can be solved. I'll be back home in about 8 hrs from now.
    I compile code with:
    Visual Studio.NET beta2

  8. #8
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    You are so helpful! Thank you so much. For the code numbers that you mentioned needed modification, you were right, they do. They are actually dates, that are coming in the input file as mddccyy or mmddccyy and have to be changed to ccyymmdd form. There should be no other punctuation in the name other than a ^ after the last name, first name and middle initial (or name - sometimes there might be a full middle name).
    The last two numbers are quantity and price.

    Again, thank you so much! You are a godsend!

    muffin

  9. #9
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    Also, can't remember if I mentioned it or not, but the '|' is used as a delimiter between every output field, and there needs to be a carriage return at the end of every output line. I couldn't show that carriage return on the output example because I didn't know how to represent it. I think it is a hex 0d or something like that.

    thanks again!

  10. #10
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    One more note .... I will not know how many spaces will be in-between fields, and I do not know how long the name or the description, or quantity or price will be. So, the records are all variable length, and the amount of white space between fields is also variable lenght, and the fields themselves are variable length.

    I was trying to read in the file, into a structure that had arrays within it, like the following, but of course that did not really work (past the first record or so):

    struct inData {
    code1[];
    code2[];
    name[];
    date1[];
    date2[];
    date[];
    desc[];
    amount[];
    price[];
    } input;

    I really think I need to do a file read of the whole record, then do a parsing of character by character to a structure that I can use later on to write out to the output file .... am I on the right track?

    Thanks again ....
    muffin

  11. #11
    Anti-Terrorist
    Join Date
    Aug 2001
    Location
    mming, Game DevelopmentCSR >&<>&2Minimization of boolean functions, PROM,PLA design >&0>&WA, USA guitar, dogsCommercial Aviation >&>>&USAProgramming
    Posts
    742
    Again, thank you so much! You are a godsend!
    You might not think that for long. I had a statistics test today and it was tough. I won't be able to get to this tonight because Mid terms are pressuring me and I have too much homework. I thought that I would have a few hours but I don't. I'll try to get to it as soon as I can.
    I compile code with:
    Visual Studio.NET beta2

  12. #12
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    Hey I completely understand .... good luck on your exams!

  13. #13
    Anti-Terrorist
    Join Date
    Aug 2001
    Location
    mming, Game DevelopmentCSR >&<>&2Minimization of boolean functions, PROM,PLA design >&0>&WA, USA guitar, dogsCommercial Aviation >&>>&USAProgramming
    Posts
    742
    When do you have to have this done by? My last midterm exam is on Wednesday. I still have 2 labs to do even after that but I can hold them off for a day.
    I compile code with:
    Visual Studio.NET beta2

  14. #14
    Registered User
    Join Date
    Aug 2001
    Posts
    20
    It's ok, you don't have to, I am making some progress.

    I might have a few more specific questions over the weekend or so, but the specs have changed today anyway, so it has to be a litttle bit different (though it's pretty much the same idea).

    If you do have some time, maybe you could just watch for my questions next week or so?

    Thanks again, and I really appreciate it!

    Good luck on your exams!!!


Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A development process
    By Noir in forum C Programming
    Replies: 37
    Last Post: 07-10-2011, 10:39 PM
  2. Data Structure Eror
    By prominababy in forum C Programming
    Replies: 3
    Last Post: 01-06-2009, 09:35 AM
  3. Basic text file encoder
    By Abda92 in forum C Programming
    Replies: 15
    Last Post: 05-22-2007, 01:19 PM
  4. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 04:54 PM