Thread: awk -- escaping the delimiter

  1. #1
    {Jaxom,Imriel,Liam}'s Dad Kennedy's Avatar
    Join Date
    Aug 2006
    Location
    Alabama
    Posts
    1,065

    awk -- escaping the delimiter

    Okay, lets say I have the following file:
    Code:
    var0:0x9453!var1:Some random string!var2:"Stop!" the woman!var3:00432123432123432885
    etc, etc, etc.

    Okay, now what I want to do is to be able to break out the various data points with some delimiter -- in this case it would be the !, however, in the "real" file it would probably be :. I would expect to use awk on this line (after tailing the last line, which is the only line I'm really interested in) by looping through the number of times I need to to get all the data. So, the loop would look something like this:
    Code:
    LINE=`tail -n1 filename`
    for i in 0 1 2 3  ; do
            let j=i+1
            CMDSTR="VAR${i}=`echo $LINE | awk -f! '{print \$${j}}'`"
            eval $CMDSTR
    done
    But, the problem comes in where I have the ! after var2's Stop!. So, what I need to do is to figure a way to allow escaping (or quoting) of the "text" that is in each field so that I don't chunk myself on the awk command.

    Any ideas?

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I dunno awk but if I understand your problem correctly it is a common one with parsing say XML or HTML, where you may want to ignore things inside and/or outside of <>. It is not as simple as you think, my approach is to split the line something like this:

    Code:
    [root~] LINE='var0:0x9453!var1:Some random string!var2:"Stop!" the woman!var3:0043212343212343288'
    [root~] echo $LINE | perl -ne '@ray=split/"/,$_;$i=-1;foreach(@ray){$i++;next 
    if($i%2);$_=~s/\!/_MK_/g;};$line=join("\"",@ray);@ray=split/_MK_/,$line;$line=join("
    \n",@ray);print $line'
    var0:0x9453
    var1:Some random string
    var2:"Stop!" the woman
    var3:0043212343212343288
    The _MK_ placeholder is not a satisfying hack but that's what you get in "one line".* But it does work. In reality, if that is not feasible, I'd store the various part and reassemble them without that. In any case it is better done with an external script -- I imagine awk is capable, it has arrays and regexps right? So do it in an external script. I suppose you could do this with a combination of bash and awk in a short function too. I don't enjoy bash that much tho.

    I'd love to hear if someone has a simpler alternative to my divide and conquer algorithm**. The only other solution I see is full blown multipass parsing (which that actually does involve multiple passes).

    *also, that one presumes the line does not begin with ". Again, that is something better dealt with in a real function or stdin->stdout script.
    ** which in case that is not clear: split the line into an array on ". Replace your delimiter with a placeholder that cannot be present (eg, _MK_) but only in the even numbered elements (0,2,4, etc). Join the array back into a line. Now use the placeholder as a delimiter.
    Last edited by MK27; 05-27-2010 at 10:01 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    {Jaxom,Imriel,Liam}'s Dad Kennedy's Avatar
    Join Date
    Aug 2006
    Location
    Alabama
    Posts
    1,065
    ::sighs heavily:: I don't have perl.

    I may have a way to do this, however, without having to look at the fragments of the lines, but compare the whole lines themselves. The split would then be a header with a colon, then the data. This is easy enough as I can get the header through an awk script, then sed out the header for the data.

    And, yes, you do understand the problem and YES it is a pain in the butt to do this in a C app, much more so in "simple" bash scripting (I don't have a full blown bash as this is an embedded system).

  4. #4
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    awk to the rescue!

    Somewhat of a hack, but it works as far as my short testing goes:
    Code:
    BEGIN {
       inq = 0;
       RS="!";     # record separator
       FS="\n";
    }
    
    /"/ {
       if(inq) {
          inq = 0;
       }else{
          inq = 1;
       }
    }
    
    /$/ {
       if(inq) {
          printf "%s%s", $0, RS;
          next;
       }else{
          print $0;
       }
    }
    Running it:
    Code:
    zac@breeze:cboard (0) $ cat line | awk -f test.awk 
    var0:0x9453
    var1:Some random string
    var2:"Stop!" the woman
    var3:00432123432123432885
    
    zac@breeze:cboard (0) $ cat line
    var0:0x9453!var1:Some random string!var2:"Stop!" the woman!var3:00432123432123432885

  5. #5
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Hmmm. Me likes the look of that. Should grok awk I guess.

    I think that BS about requiring multipass parsing I spouted earlier wuz because byte by byte is actually a pain in perl (so I wouldn't bother using a "state" flag) and I've never bothered to do this in C. After that I just stopped thinking.
    Last edited by MK27; 05-27-2010 at 09:13 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #6
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I don't think it's a pain the butt in a C program at all... Finite state machine anyone?

    Proof-of-concept:
    Code:
    zac@neux:cboard (0) $ cat line | ./fs
    var0:0x9453
    var1:Some random string
    var2:"Stop!" the woman
    var3:00432123432123432885
    zac@neux:cboard (0) $ cat fs.c
    #include <stdio.h>
    
    void magic(const char * line)
    {
       int state = 0;    /* start state */
    
       while(*line)
       {
          switch(state)
          {
             case 0:
    
                /* start quote */
                if(*line == '"')
                {
                   state = 1;
                   putchar(*line);
                }else if(*line == '!'){
                   state = 2;
                }else{
                   putchar(*line);
                }
             break;
    
             case 1:
                putchar(*line);
    
                if(*line == '"')
                {
                   state = 0;
                }
             break;
    
             case 2:
                putchar('\n');
                putchar(*line);
                state = 0;
             break;
    
             default:
                puts("Zac can't program");
          }
    
          ++line;
       }
    }
    
    int main(void)
    {
       char line[256];
    
       fgets(line, sizeof line, stdin);
    
       magic(line);
    
       return 0;
    }
    It's the same principle for escaping...
    Last edited by zacs7; 05-27-2010 at 10:47 PM.

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by zacs7 View Post
    I don't think it's a pain the butt in a C program at all... Finite state machine anyone?
    Probably the best idea.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    just my 2c
    Code:
    awk -F! '{
       for (i=1;i<=NF;i++) {
          if ($i~"\"") {
             if (flag) printf("%s\n", $i)
             else {
                printf("%s%s", $i, FS)
                ++flag
             }
          }
          else
             print $i
       }
    }' file
    output
    Code:
    var0:0x9453
    var1:Some random string
    var2:"Stop!" the woman
    var3:00432123432123432885

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. awk on non stop stream
    By TuXaKoS in forum Linux Programming
    Replies: 16
    Last Post: 04-28-2010, 03:56 PM
  2. mistake of using awk
    By lehe in forum Linux Programming
    Replies: 6
    Last Post: 04-02-2009, 04:41 PM
  3. awk problem
    By KIBO in forum Linux Programming
    Replies: 2
    Last Post: 01-23-2007, 08:53 AM
  4. Socket Select() Delimiter
    By Lee A O in forum C++ Programming
    Replies: 3
    Last Post: 11-09-2002, 07:05 PM
  5. Need Delimiter Program helpful hints.
    By Unregistered in forum C Programming
    Replies: 7
    Last Post: 02-16-2002, 06:27 PM