Thread: Porformance problem building a list

  1. #1
    Registered User
    Join Date
    May 2011
    Posts
    3

    Porformance problem building a list

    Hi, i regret asking for help on my very first topic on the forum, but i truly need a hand or two, and please forgive my poor english.

    I have to read "symbols" from a file, and build a list saving on each node a symbol and the number of times that symbol appears, it may seem easy, but the problem is that the program has to finish under 3 seconds and there are 26 million characters to be read. As you will see, im a novice, so my code is very rudimentary, and of course its not even close to those 3 seconds i need to get.

    You may be wondering why i wrote "symbols", the reason is that i get from the parameter line how many ASCII characters form a "symbol" for example if i get "2" from the parameter line, each symbol will be pairs of ASCIIs like "7(" or "P;".

    That being said, i googled a lot with no succes, so i hope that you guys can help me figure out the way to speed up this program. Thanks a lot for your time.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    int num_param, t;
    char *param[4];
    struct symbolic_node{       
    	int counter;
    	short status;
    	struct symbolic_node *sig;
    	struct symbolic_node *ant;
    	char *symb;
    };
    int sintax_check();
    int symbol_obtainer(char **);
    void list_builder(char **,int,struct symbolic_node **);
    
    int main(int argc, char *argv[]){  
    
        int i,j,sintax_error,n;
    	short array_pos=0;
       	FILE *pFile;
      	int lSize;
      	char *buffer;
      	struct symbolic_node *l_symb;
    	
     	num_param=argc-1;
    	for(i=1;i<=num_param;i++){  				
        	param[i]=argv[i];
        }
    	n=atoi(param[2]);
        sintax_error=sintax_check();                       
        if(sintax_error==1){				               
        	fputs("Sintax error\n",stderr);
        }else{
        	lSize=symbol_obtainer(&buffer);             
    		list_builder(&buffer,lSize,&l_symb);   
    		free(buffer);
    		printf("Done\n"); 
    	}					
    }
    
    int sintax_check(){							
    								
    	int i, error=0;
    	char *ptr;
    	
    	if(num_param!=2){								
    		error=1;
    	}else{
    		if(error==0){
    			if(isdigit(param[2][0])==0)error=1;
    		}	
    	}
    	return error;
    }
    
    int symbol_obtainer(char **buff){  
    								  
    	FILE *pfile;               
    	int i,res_reading, n;
    	int num_bytes;
    	char *char_storage;
    	
    	n=atoi(param[2]);
    	pfile=fopen(param[1],"rb");
    	if(pfile==NULL){
    		fputs("File not found\n",stderr);
    		exit(1);
    	}else{
    		fseek(pfile,0,SEEK_END);
    		num_bytes=ftell(pfile);
    		rewind(pfile);
    		num_bytes=num_bytes/n;
    		char_storage=(char *)calloc(num_bytes,n);
    		if(char_storage==NULL){
    			fputs("Memory allocation error\n",stderr);
    			exit(2);
    		}
    		res_reading=fread(char_storage,n,num_bytes,pfile);
    		if(res_reading!=num_bytes){
    			fputs("Reading error\n",stderr);
    			exit(3);
    		}
    	}
    	*buff=char_storage;
    	fclose(pfile);
    	return num_bytes;
    }
    
    void list_builder(char **buff,int tamanio,struct symbolic_node **header){
    
    	struct symbolic_node *aux, *aux1, *aux2, *aux3, *lsymb;          	  
    	short n,run_level;											    
    	short num_nodos=0;											 
    	int i,cont=0;													 
    	char *char_storage;											 	
    	n=atoi(param[2]);											 
    	
    	aux=(struct symbolic_node *)malloc(sizeof(struct symbolic_node));
    	aux->ant=NULL;
    	aux->sig=NULL;
    	char_storage=*buff;
    	*header=aux;
    	aux3=aux;
    	aux->symb=(char *)malloc(n);
    	memcpy(aux->symb,char_storage,n);
    	aux->counter=1;
    	num_nodos++;
    	char_storage+=n;
    	for(i=1;i<tamanio;i++){														
    		run_level=0;															
    	  	aux1=(struct symbolic_node *)malloc(sizeof(struct symbolic_node));      
    	      aux1->symb=(char*)malloc(n);											
    		memcpy(aux1->symb,char_storage,n);										
    		aux1->counter=1;
    		aux2=aux;
    	
    		while(aux2!=NULL&&(run_level!=1)){
    			if(memcmp(aux2->symb,aux1->symb,n)==0){
    				free(aux1->symb);				
    				free(aux1);
    				aux2->counter+=1;
    				run_level=1;
    			}
                            aux3=aux2;
    			aux2=aux2->sig;
    		}
    		if(run_level==0){														
    			num_nodos++;
    			aux1->ant=aux3;														
    			aux3->sig=aux1;														
    			aux3=aux3->sig;														
    			aux1->sig=NULL;
    		}																												
    		char_storage+=n;
    	}
    
    }
    Thx a lot, plz let me know if you dont understand something or if you need anything else.
    Last edited by Hydrahte; 05-06-2011 at 12:39 PM.

  2. #2
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    Sounds like you want to implement hash table or tree?

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > aux1=(struct symbolic_node *)malloc(sizeof(struct symbolic_node));
    > aux1->symb=(char*)malloc(n);
    > memcpy(aux1->symb,char_storage,n);
    You should definitely do this AFTER the search has failed to find what you're looking for.
    It's expensive to keep allocating and freeing things you're not going to keep.

    Also, check your editor isn't using spaces and TABS for indentation (switch to spaces only, and make sure all hard tabs are replaced).
    Your editor can cope, but forums usually mess it up.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    May 2011
    Posts
    3
    Quote Originally Posted by Bayint Naung View Post
    Sounds like you want to implement hash table or tree?
    Hi, do you mean building a tree or a hash table instead of building the list? Are those data estructures faster than a list? I had never used them before.

    Quote Originally Posted by Salem View Post
    > aux1=(struct symbolic_node *)malloc(sizeof(struct symbolic_node));
    > aux1->symb=(char*)malloc(n);
    > memcpy(aux1->symb,char_storage,n);
    You should definitely do this AFTER the search has failed to find what you're looking for.
    It's expensive to keep allocating and freeing things you're not going to keep.

    Also, check your editor isn't using spaces and TABS for indentation (switch to spaces only, and make sure all hard tabs are replaced).
    Your editor can cope, but forums usually mess it up.
    Hi, ok, im gonna try allocating memory after the search is done, and thx for the advice about the tabs, gedit displayed the code very good but as you said the forum messed it up.

    EDIT: MOTHER OF GOD, i did as you said and the time dropped dramatically from 6.3/6.5 to 3.6-3.8 its amazing. Thinking about it, what i was doing was retarded, creating and destroying a node when i can compare directly from the array. Thx a lot, i think i will never forget this mistake.
    Last edited by Hydrahte; 05-06-2011 at 01:27 PM.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    A balanced binary tree is definitely faster than a linked list.
    A binary tree can be searched in O(log2(N)) time, whereas a linked list is O(N/2). So for 1000 elements, you're looking at 10 comparisons vs. 500.

    An alternative is an array of lists, like
    struct symbolic_node *lists[128];

    Rather than having one global list, have a separate list keyed on the first letter of the symbol, say
    memcpy(lists[char_storage[0]]->symb,char_storage,n);

    You'll have many shorter lists, but you only need to search one of them at any one time.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Are you guaranteed to have 26 million bytes of memory? Why not just read the entire file with one fread. Then sort and count duplicates using the 'width' as specified.
    I don't understand the need for lists especially when worst case it can be just as big as the entire file, plus all that linking junk, plus it complicates everything.

  7. #7
    Registered User
    Join Date
    May 2011
    Posts
    3
    Quote Originally Posted by nonoob View Post
    Are you guaranteed to have 26 million bytes of memory? Why not just read the entire file with one fread. Then sort and count duplicates using the 'width' as specified.
    I don't understand the need for lists especially when worst case it can be just as big as the entire file, plus all that linking junk, plus it complicates everything.
    Well, i use lists becouse we are just starting with C at college so im unfamiliar with any other data estructures. Thankfully memory usage doesnt matter so yes, we can spend as much memory as we want. So you would keep the symbols in the array, sort them, and use the memory width to know how many of them are there right? sounds good, im going to try your idea, wish me luck and thx a lot for your help.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. building a linked list from tree leaf nodes
    By ayman88 in forum C Programming
    Replies: 0
    Last Post: 02-27-2010, 04:24 PM
  2. need help building a linked list
    By supern00b in forum C Programming
    Replies: 7
    Last Post: 12-11-2009, 01:41 PM
  3. Having a problem building Zthread
    By indigo0086 in forum Tech Board
    Replies: 1
    Last Post: 04-23-2007, 10:56 AM
  4. building 3 double linked list
    By ronenk in forum C Programming
    Replies: 4
    Last Post: 12-08-2004, 05:24 PM
  5. Building a linked-list, help please!
    By vnrabbit in forum C Programming
    Replies: 4
    Last Post: 08-22-2002, 07:58 AM