Thread: splitting a string.

  1. #1
    Registered User
    Join Date
    May 2015
    Posts
    130

    splitting a string.

    Hi gentlemen, I'm trying to split a string that's given in this pattern :
    decoder: #(45+HDR)-6, and prints out on the screen the output as:
    45
    HDR //HDR is a char
    6
    I tried to use strtok or strtol but didn't benefit, I used while loop that runs on the complete sentence "decoder:#(45+HDR)-6", but it just would be more complex than normal..., eventually I used token's methods.
    Code:
    Code:
    #include <stdio.h>
    #include<stdlib.h>
    #include<string.h>
    int main(void)
    {
       char str[80] = "decoder: #(45+HDR)-6";
       const char s[2] = "#";
       char *token;
       
       /* getting the first token */
       token = strtok(str, s);
       
       /* here, I'm walking through other tokens */
       while( token != NULL ) 
       {
          printf( " %s\n", token );
        
          token = strtok(NULL, s);
       }
       return(0);
    }
    the case here is printing the whole string until arriving to the given "s"-the limit..and I have in this case limits are "#", "+","-"..
    this code is working correctly and all good but gives wrong result, I'm not succeeding to print what it should be, so help me please as I'm kinda a new brand C programmer.

  2. #2
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    You may want to read through this topic since you seem to be working on a similar project. Post#2 might prove interesting.

    Jim

  3. #3
    Registered User
    Join Date
    May 2015
    Posts
    130
    thanks.

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Sometimes throwing a format string at a problem can be effective.
    Code:
       char str[80] = "decoder: #(45+HDR)-6";
       long numbers[2];
       char token[32];
    
       if ( sscanf(str, "decoder: #(%ld+%3s)-%ld", &numbers[0], token, &numbers[1]) == 3 )
       /* http://linux.die.net/man/3/sscanf */
       { 
          printf("tokenized: %ld %s %ld\n", numbers[0], token, numbers[1]); 
       }

  5. #5
    Registered User
    Join Date
    May 2015
    Posts
    130
    Quote Originally Posted by whiteflags View Post
    Sometimes throwing a format string at a problem can be effective.
    Code:
       char str[80] = "decoder: #(45+HDR)-6";
       long numbers[2];
       char token[32];
    
       if ( sscanf(str, "decoder: #(%ld+%3s)-%ld", &numbers[0], token, &numbers[1]) == 3 )
       /* http://linux.die.net/man/3/sscanf */
       { 
          printf("tokenized: %ld %s %ld\n", numbers[0], token, numbers[1]); 
       }
    It'll work for a certain situation, what about if the HDR is HDRERT, so how could I change automatically the number which is between "%" and the "s" in the function sscanf-meaning of-"%3s", the number 3 in this statement.

    thanks anyway.

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    There is no especially simple way. You have to try building the string with string functions such as sprintf, strcpy or strncat.
    For example, there will be areas in your string where you want %3s, but to write that with sprintf you have to use the string "%%%lds". The %% prints a percent sign, rather than triggering a conversion, the %ld is your magic field width (the 3 in %3s), and finally the s. This all produces %3s.

    So you are best served copying as much as you can to the format string variable, then calling sprintf to insert the field width, and copying anything left over.

    This is different from printf where you can specify a width using something like %*.s and the arguments are the width, and a string pointer.

    Here is an example program with the expected outcome:
    Code:
    #include <stdio.h>
    #include <string.h>
    #define FMT_SIZE 80
    
    long BuildFormatString(char *format, long formatSize, long tokenLength);
    
    int main(void)
    {
    	char format[FMT_SIZE] = "";
    	char str[90] = "decoder: #(45+HDR)-6";
    	long numbers[2];
    	char token[32];
    	if (BuildFormatString(format, FMT_SIZE, 3) == FMT_SIZE)
    	{
    		fprintf(stderr, "format string may be truncated!\n");
    		return 0;
    	}
    	else
    	{
    		printf("format string: %s\n", format);
    	}
    
    	if (sscanf(str, format, &numbers[0], token, &numbers[1]) == 3)
    		/* http://linux.die.net/man/3/sscanf */
    	{
    		printf("tokenized: %ld %s %ld\n", numbers[0], token, numbers[1]);
    	}
    
    	return 0;
    }
    
    long BuildFormatString(char *format, long formatSize, long tokenLength)
    {
    	/* Help on functions used:
    	http://linux.die.net/man/3/strncpy
    	http://linux.die.net/man/3/sprintf
    	http://linux.die.net/man/3/strncat
    	*/
    	long length = 0;
    	strncpy(format, "decoder: #(%ld+", formatSize - 1);
    	length = strlen(format);
    	sprintf(&format[length], "%%%lds", tokenLength);
    	length = strlen(format);
    	strncat(format, ")-%ld", formatSize - length - 1);
    	length = strlen(format);
    	return length;
    }
    In my opinion, this is still simpler than other options, but beauty is in the eye of the beholder.
    Last edited by whiteflags; 06-09-2015 at 07:19 PM.

  7. #7
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    It can be done with a general format string, too. Note that the spacing is significant.
    Code:
    #include <stdio.h>
    
    int main() {
      char line[]        = "decoder: #(45+HDR)-6";
      char line_spaced[] = " decoder : # ( 45 + HDR ) - 6 ";
      char fmt[] = " %31[^: ] : # ( %31[^+- ] %c %31[^) ] ) %c %31s";
      char a[32], b[32], c[32], d[32], op1, op2; // a : # ( b op1 c ) op2 d
      int m;
    
      *a=*b=*c=*d=0; op1=op2='X';
      m=sscanf(line, fmt, a, b, &op1, c, &op2, d);
      printf("[%d] %s, %s, %c, %s, %c, %s\n", m, a, b, op1, c, op2, d);
    
      *a=*b=*c=*d=0; op1=op2='X';
      m=sscanf(line_spaced, fmt, a, b, &op1, c, &op2, d);
      printf("[%d] %s, %s, %c, %s, %c, %s\n", m, a, b, op1, c, op2, d);
    
      return 0;
    }

  8. #8
    Registered User
    Join Date
    May 2015
    Posts
    130
    Quote Originally Posted by algorism View Post
    It can be done with a general format string, too. Note that the spacing is significant.
    Code:
    #include <stdio.h>
    
    int main() {
      char line[]        = "decoder: #(45+HDR)-6";
      char line_spaced[] = " decoder : # ( 45 + HDR ) - 6 ";
      char fmt[] = " %31[^: ] : # ( %31[^+- ] %c %31[^) ] ) %c %31s";
      char a[32], b[32], c[32], d[32], op1, op2; // a : # ( b op1 c ) op2 d
      int m;
    
      *a=*b=*c=*d=0; op1=op2='X';
      m=sscanf(line, fmt, a, b, &op1, c, &op2, d);
      printf("[%d] %s, %s, %c, %s, %c, %s\n", m, a, b, op1, c, op2, d);
    
      *a=*b=*c=*d=0; op1=op2='X';
      m=sscanf(line_spaced, fmt, a, b, &op1, c, &op2, d);
      printf("[%d] %s, %s, %c, %s, %c, %s\n", m, a, b, op1, c, op2, d);
    
      return 0;
    }
    but if I want to print the '45','HDR','6' without the operators and any other things, it will throw out a wrong result, so I guess that's not useful for that statement..

    I modified your code, and here it's:
    Code:
    #include <stdio.h>
    #include <string.h>
     
    int main() {
      char line[]        = "decoder: #(45+HDR)-6";
      char line_spaced[] = " decoder : # ( 45 + HDR ) - 6";
      char fmt[] = " %31[^: ] : # ( %31[^+- ] %c %31[^) ] ) %c %31s";
      char a[32], b[32], c[32], d[32], op1, op2; // a : # ( b op1 c ) op2 d
      int m;
     
      *a=*b=*c=*d=0; op1=op2='X';
      m=sscanf(a, b,c,d);
      printf("%s, %c, %s, %c, %s\n",a, b,c,d);
      return 0;
    }

  9. #9
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Have you tried the string tokenizer pointed out in post #2?

    You may pass it ": #(+)-" as delim, and 4 as maxtoks in order to get back the string tokens "decoder", "45", "HDR" and "6" inside the tokens array. Then it's only a matter of strtoXX()'ing the desired tokens to appropriate variables.

    PS. Btw, the tokenizer returns the count of tokens successfully parsed into the tokens array.
    "Talk is cheap, show me the code" - Linus Torvalds

  10. #10
    Registered User
    Join Date
    May 2015
    Posts
    130
    Quote Originally Posted by whiteflags View Post
    There is no especially simple way. You have to try building the string with string functions such as sprintf, strcpy or strncat.
    For example, there will be areas in your string where you want %3s, but to write that with sprintf you have to use the string "%%%lds". The %% prints a percent sign, rather than triggering a conversion, the %ld is your magic field width (the 3 in %3s), and finally the s. This all produces %3s.

    So you are best served copying as much as you can to the format string variable, then calling sprintf to insert the field width, and copying anything left over.

    This is different from printf where you can specify a width using something like %*.s and the arguments are the width, and a string pointer.

    Here is an example program with the expected outcome:
    Code:
    #include <stdio.h>
    #include <string.h>
    #define FMT_SIZE 80
    
    long BuildFormatString(char *format, long formatSize, long tokenLength);
    
    int main(void)
    {
        char format[FMT_SIZE] = "";
        char str[90] = "decoder: #(45+HDR)-6";
        long numbers[2];
        char token[32];
        if (BuildFormatString(format, FMT_SIZE, 3) == FMT_SIZE)
        {
            fprintf(stderr, "format string may be truncated!\n");
            return 0;
        }
        else
        {
            printf("format string: %s\n", format);
        }
    
        if (sscanf(str, format, &numbers[0], token, &numbers[1]) == 3)
            /* http://linux.die.net/man/3/sscanf */
        {
            printf("tokenized: %ld %s %ld\n", numbers[0], token, numbers[1]);
        }
    
        return 0;
    }
    
    long BuildFormatString(char *format, long formatSize, long tokenLength)
    {
        /* Help on functions used:
        http://linux.die.net/man/3/strncpy
        http://linux.die.net/man/3/sprintf
        http://linux.die.net/man/3/strncat
        */
        long length = 0;
        strncpy(format, "decoder: #(%ld+", formatSize - 1);
        length = strlen(format);
        sprintf(&format[length], "%%%lds", tokenLength);
        length = strlen(format);
        strncat(format, ")-%ld", formatSize - length - 1);
        length = strlen(format);
        return length;
    }
    In my opinion, this is still simpler than other options, but beauty is in the eye of the beholder.
    Initially, I'd thank you very much for your concern of..
    secondly, this code is still specific for a certain statement, for instance: if I input a string like "decoder: #(45+HDR)-6" it will gives a correct result, but what about if I input "decoder: #(45+HDR)-HERY!" it will stuck over and prints weird things, which it should print '45','HDR','HERY!'.
    the pattern of "decoder: #(45+HDR)" isn't gonna be changed and it's stable, but after the "-" I can input whatever things I would like to... string/number/integers..etc, that's what concern me to not using this tokenizer code, thanks.

  11. #11
    Registered User
    Join Date
    May 2015
    Posts
    130
    Quote Originally Posted by migf1 View Post
    Have you tried the string tokenizer pointed out in post #2?

    You may pass it ": #(+)-" as delim, and 4 as maxtoks in order to get back the string tokens "decoder", "45", "HDR" and "6" inside the tokens array. Then it's only a matter of strtoXX()'ing the desired tokens to appropriate variables.

    PS. Btw, the tokenizer returns the count of tokens successfully parsed into the tokens array.
    Apparently we posted the same comment's content in the same time, read what I've posted recently, it's related to ur comment.

  12. #12
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Quote Originally Posted by Romyo2 View Post
    Apparently we posted the same comment's content in the same time, read what I've posted recently, it's related to ur comment.
    As long as the tokens to be extracted are delimited by some chars, and their total count is bounded, the tokenizer may prove quite handy. I don't see why the 2nd example you gave can be a problem.
    "Talk is cheap, show me the code" - Linus Torvalds

  13. #13
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Oh now I see what you're saying. You mean you don't know what type one or more tokens should be converted to? If so, then it's a different type of problem (not a "splitting a string" one, I mean).
    "Talk is cheap, show me the code" - Linus Torvalds

  14. #14
    Registered User
    Join Date
    May 2015
    Posts
    130
    Quote Originally Posted by migf1 View Post
    Oh now I see what you're saying. You mean you don't know what type one or more tokens should be converted to? If so, then it's a different type of problem (not a "splitting a string" one, I mean).
    Huh? I thought on splitting string because we actually extract out specific elements from a string..

  15. #15
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Quote Originally Posted by Romyo2 View Post
    Huh? I thought on splitting string because we actually extract out specific elements from a string..
    English is not my mother language, but I think "splitting a string" implies to break a given string into smaller pieces of the same type, substrings if you like. That's what a string tokenizer does. Please correct me if I'm wrong.

    It's then up to you what you want to do with each substring (token).

    In our case, if after tokenization all you want to do is say just print the derived tokens (substrings) you may do something like the follwoing:

    Code:
    ...
    int n = s_tokenize(s, ..., tokens);
    for (int i=0; i < n; i++) {
        puts( tokens[i] );
    }
    ...
    This will print all the tokens as strings, so it doesn't have any problem with either your 1st or your 2nd example.

    Now, it's up to you to do whatever you have to do with each token individually (including converting it to a different type).
    "Talk is cheap, show me the code" - Linus Torvalds

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. splitting a string.
    By Romyo2 in forum C Programming
    Replies: 7
    Last Post: 05-08-2015, 11:40 AM
  2. splitting a string
    By trsmash in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2010, 05:22 PM
  3. Splitting up a string
    By monki000 in forum C Programming
    Replies: 12
    Last Post: 03-04-2010, 12:40 PM
  4. Splitting a string?
    By motionman95 in forum C Programming
    Replies: 12
    Last Post: 04-14-2009, 07:29 AM
  5. splitting a string
    By smegly in forum C Programming
    Replies: 6
    Last Post: 05-20-2004, 12:04 PM