Thread: char array to string?

  1. #1
    Registered User
    Join Date
    Jan 2008
    Posts
    79

    char array to string?

    I'm trying to make a html parser, i tried it using pointers because im still very undsure about them. when i print rbuffer a charater at a time it works perfectly. However when i try and make it a string it prints jibberish

    Code:
    #include <string.h>
    
    char htmlparser(char *buffer)
    {
    	int i;
    	char *rbuffer = (char*) malloc(256);
    	int len = strlen(buffer);
    	for(i = 0;  i < len; i++)
    	{
    		if(buffer[i] == '<')
    		{
    			
    			while(buffer[i] != '>')
    			{
    				i++;
    				if(buffer[i] == '>')
    				{
    					i++;
    					break;
    				}
    			}
    
    		}
    	 	rbuffer[i] = buffer[i];
    		printf("%c",rbuffer[i]);// prints correctly
    	}
    	i++;
    	rbuffer[i] = '\0';
    	
    	printf("%s", rbuffer); // prints jibberish  
    	return 0 ;
    }
    
    int main()
    {
    	char *ptr;
    	char buf[] = "<fdfdfdfdffdfdfdf>ignore all previous words <> <tttdfeerrf> done.";
    	ptr = buf;
    	htmlparser(ptr);
    	return 0;
    }

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    You need two indexes. You need i for buffer and j for rbuffer.

    Todd
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Of course it prints rubbish.
    OK, so when allocating memory, the memory is filled with rubbish from the start.
    Now, so you assign character at position i, but only if it's not within a < and >.
    So position 0, 1, 2, 3, etc still contains its rubbish because you never assign anything to that position in the array.
    There also several flaws in the code.

    Code:
    			while(buffer[i] != '>')
    			{
    				i++;
    				if(buffer[i] == '>')
    				{
    					i++;
    					break;
    				}
    			}
    Completely unnecessary since the loop will break if buffer[i] is > anyway. And it's better to use a for loop in this case, too.

    Code:
    	char *ptr;
    	char buf[] = "<fdfdfdfdffdfdfdf>ignore all previous words <> <tttdfeerrf> done.";
    	ptr = buf;
    	htmlparser(ptr);
    Why do you believe you have to define an extra variable and assign it a value to pass it to a function? You can do it directly:
    Code:
    	char buf[] = "<fdfdfdfdffdfdfdf>ignore all previous words <> <tttdfeerrf> done.";
    	htmlparser(buf);
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  4. #4
    Registered User
    Join Date
    Jan 2008
    Posts
    79
    I made the changes you suggested todd. I'm getting the same results.

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Try to understand why you're getting and why. Use a debugger if you can and watch the rbuffer variable and it should be clear to you.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #6
    Registered User
    Join Date
    Jan 2008
    Posts
    79
    I know, but like i said i was just trying to get used to using pointers, because i dont understand them fully.

    also there is an if function inside the loop because i need it to add an extra value to i after if finds it otherwise it will print the '>' if i move the i++ outside of the while loop it dosent work properly.

  7. #7
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by mushy View Post
    I know, but like i said i was just trying to get used to using pointers, because i dont understand them fully.
    A pointer is the same as an array. You are assigning values at some points in the array and in some points you do not, so it keeps the rubbish values.

    also there is an if function inside the loop because i need it to add an extra value to i after if finds it otherwise it will print the '>' if i move the i++ outside of the while loop it dosent work properly.
    You still don't need to break and you can use a for loop and everything will be done automatically for you.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  8. #8
    Registered User
    Join Date
    Jan 2008
    Posts
    79
    ok i changed the code to this. now the second printf function prints "2" and thats it. Not sure how to run the debugger im using minGW commandline and notepad

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int htmlparser(char buffer[])
    {
    	int i;
    	
    	int len = strlen(buffer);
    	char rbuffer[len];
    	for(i = 0;  i < len; i++)
    	{
    		if(buffer[i] == '<')
    		{
    			
    			while(buffer[i] != '>')
    			{
    				i++;	
    				
    				
    			}
                                          	 i++;
    		   
    		}
    	
    	 	rbuffer[i] = buffer[i];
    		printf("%c",rbuffer[i]);// prints correctly
    		
    	}
    	i++;
    	
    	
    	
    	printf("\n\n%s", rbuffer); // prints the number 2
    	return 0 ;
    }
    
    int main()
    {
    	//char *ptr;
    	char buf[] = "<fdfdfdfdffdfdfdf>ignore all previous words <> <tttdfeerrf> done.";
    	//ptr = buf;
    	 htmlparser(buf);
    	return 0;
    }

  9. #9
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I don't see a second index.

    Here's what to do. On paper, write out a simple case using a string of "<mushy>".

    Figure out what the value is i is after the inner WHILE that looks for the closing tag. Then, figure where in rbuffer you are assigning the first value.

    Todd
    Mainframe assembler programmer by trade. C coder when I can.

  10. #10
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Another diagnostic technique you can use is to print out the value of your buffer, rbuffer and i at each loop iteration. You'll see exactly when the problem starts.
    Mainframe assembler programmer by trade. C coder when I can.

  11. #11
    Registered User
    Join Date
    Jan 2008
    Posts
    79
    Hey todd, i finally understood why i needed two indexes when i did as you suggested. It works alot better now but still not perfect. Look a my code
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int htmlparser(char buffer[])
    {
    	int i;
    	int j = 0;
    	int m = 0;
    	
    	
    	int len = strlen(buffer);
    	char rbuffer[len];
    	for(i = 0;  i < len; i++)
    	{
    		if(buffer[i] == '<')
    		{
    			
    			while(buffer[i] != '>')
    			{
    				i++;	
    				
    				
    			}
    			//if i add a i++ here i get 3 jibberish characters at end of string
    
    		   
    		}
    		
    	 	rbuffer[j] = buffer[i];
    		printf("&#37;c    - value of rbuffer\n",rbuffer[j]);// prints correctly
    		printf("%c    - Value of buffer\n",buffer[i]);
    		printf("%d ------ %d\n", i,j);
    		j++;
    		
    		
    	}
    	i++;
    	
    	
    	
    	printf("\n\n%s", rbuffer); // prints the number 2
    	return 0 ;
    }
    
    int main()
    {
    	//char *ptr;
    	char buf[] = "<fdfdfdfdffdfdfdf>ignore all previous words <> <tttdfeerrf>done.";
    	//ptr = buf;
    	 htmlparser(buf);
    	return 0;
    }
    If i run it like that. it will do it pretty much right only it will leave the '>' characters in the string
    when i try to do a i++ after its found one to skip past and ignore that character. That works and the string prints out right only it has 3 jibberish characters at the end of the string for some reason.

  12. #12
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    You're seeing garbage at the end of rbuffer because you're not terminating the string properly. You do need the i++ there to index past the closing tag.

    Look at this version to see the diagnostics as I had envisioned them:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int htmlparser(char buffer[])
    {
    	int i;
    	int j = 0;
    	int m = 0;
    	
    	int len = strlen(buffer);
    	char rbuffer[len+1];
    	for(i = 0;  i < len+1; i++) rbuffer[i] = 'X' ; 
    	rbuffer[len] = 0 ; 
    
    	for(i = 0;  i < len; i++)
    	{
    		if(buffer[i] == '<')
    		{
    			while(buffer[i] != '>')
    			{
    				i++;	
    			}
    			//if i add a i++ here i get 3 jibberish characters at end of string
    			i++ ; 
    		}
    		
    	 	rbuffer[j] = buffer[i];
    		printf("i=&#37;d, j=%d\n", i,j);
    		printf("buffer  = %s\n",&buffer[i]);
    		printf("rbuffer = %s\n",rbuffer);// prints correctly
    		j++;
    	}
    	i++;
    	
    	printf("\n\n%s", rbuffer); // prints the number 2
    	return 0 ;
    }
    
    int main()
    {
    	//char *ptr;
    	char buf[] = "<fdfdfdfdffdfdfdf>ignore all previous words <> <tttdfeerrf>done.";
    	//ptr = buf;
    	 htmlparser(buf);
    	return 0;
    }
    Glad you're getting it!

    Todd
    Last edited by Dino; 02-03-2008 at 07:36 AM.
    Mainframe assembler programmer by trade. C coder when I can.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Code review
    By Elysia in forum C++ Programming
    Replies: 71
    Last Post: 05-13-2008, 09:42 PM
  2. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  3. Struct *** initialization
    By Saravanan in forum C Programming
    Replies: 20
    Last Post: 10-09-2003, 12:04 PM
  4. Character arrays
    By PsychoBrat in forum C++ Programming
    Replies: 7
    Last Post: 06-21-2002, 12:02 PM
  5. Strings are V important...
    By NANO in forum C++ Programming
    Replies: 15
    Last Post: 04-14-2002, 11:57 AM