diff parser

This is a discussion on diff parser within the C Programming forums, part of the General Programming Boards category; Hey, So I'm searching for smarter answer to a diff parser I'm writing. I'm writing a custom beatification program that ...

  1. #1
    Registered User
    Join Date
    Jun 2008
    Posts
    8

    diff parser

    Hey,

    So I'm searching for smarter answer to a diff parser I'm writing. I'm writing a custom beatification program that uses Artistic Style to satisfy part of its needs.

    The code has to have standardized ugly-code warning message. But, AStyle doesn't produce an output of changes it makes, so I'm using diff on the before and after code to figure out what's changed.

    How would you do it?

    My pseudocode, if you're interested:

    Right now, I'm converting each line that diff highlights in the "before changes" into a regex pattern. In my Python parser, I've got two lists: before_change and after_change. before_change contains lines (as strings) of "before changes" code that diff produces, and after_change contains those of "after changes".

    Basically I convert every line in before_change into a regex expression, and check to see if it's matches any in after_change. On a match, I delete both entries from the lists to prevent redundant matches. I also ignore any blank lines, and lines containing only curly braces.

    I'm not happy with my code. It's too verbose, and not as robust as I want. I can see possible issues with repeated keywords:
    1. short lines like
    Code:
    doFunction1(x) ;
    would match up with

    Code:
    doFunction1(x); //and
    doFunction2( doFunction1(x)) ;
    OR

    2. simple lines like
    Code:
     } else {
    would match
    Code:
    } else { and  
        } else { //(that are similar but belong to different groups of statements)
    Any tips?

    Thanks.

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,762
    What's wrong with the output of the regular 'diff' program?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User
    Join Date
    Jun 2008
    Posts
    8
    Quote Originally Posted by Salem View Post
    What's wrong with the output of the regular 'diff' program?
    I need the format to be "Warning, LINE_NUM: blah blah blar blar"

  4. #4
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,762
    diff gives you the line numbers and the lines.
    It can't be hard to get to where you want to be from there surely?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  5. #5
    Registered User
    Join Date
    Jun 2008
    Posts
    8
    Nope, not very difficult. Just verbose and not very robust. I'm checking for whitespace changes using regex, but it's not very precise. I'm simply taking one of the lines and turning it into a regex pattern, and matching in all the lines in diff. Then, I check for indentation differences.

    Do I have any other alternatives?

  6. #6
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,762
    Once you start messing with regex's, you're going to start matching a lot more things than you planned for.

    I've never encountered a problem with the robustness of diff. The algorithms used have been worked on for many many years. True, occasionally, it does generate the occasional weird output when two similiar changes are close to one another, but then you'd have the same problem anyway.

    As for verbose, you'd need to do a bit of post-processing yourself. It wasn't meant to be an answer you could use "out of the box".
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  7. #7
    Registered User
    Join Date
    Jun 2008
    Posts
    8
    Quote Originally Posted by Salem View Post
    True, occasionally, it does generate the occasional weird output when two similiar changes are close to one another, but then you'd have the same problem anyway.".
    I'm in a pickle. Artistic Style is only changing whitespace and newline issues, meaning that it's comparing dissimilar lines. I diff'ed the before and after files, it's giving me total hell. Even though there is no change in characters, it's matches two blocks to one block, etc.

    This is a broad question: do you have any tips on tools I can use to overcome this?

  8. #8
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,762
    So your styler can only ever change say
    void foo(){
    into
    void foo ( ) {

    and never into
    void foo( )
    {
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  9. #9
    Registered User
    Join Date
    Jun 2008
    Posts
    8
    That makes me sad. But thanks.

  10. #10
    Registered User slingerland3g's Avatar
    Join Date
    Jan 2008
    Location
    Seattle
    Posts
    603
    Thinking that a combination of using diff and indent along with your Artistic Styler may get you what you want, if I understood what you were trying to achieve.

    indent can be used to stylistically alter your .c program to preferred formats like in KnR style or into gcc style. Perhaps messing with its options may help?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Working with Parser Generators - Functions
    By jason_m in forum C Programming
    Replies: 1
    Last Post: 09-09-2008, 10:38 PM
  2. doubt in c parser coding
    By akshara.sinha in forum C Programming
    Replies: 4
    Last Post: 12-23-2007, 01:49 PM
  3. Parser Help
    By Barnzey in forum C++ Programming
    Replies: 10
    Last Post: 10-26-2005, 01:10 PM
  4. Problem with a file parser.
    By Hulag in forum C++ Programming
    Replies: 7
    Last Post: 03-17-2005, 09:54 AM
  5. Parser - need help!
    By thelma in forum C Programming
    Replies: 2
    Last Post: 04-05-2004, 09:06 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21