Thread: extract numbers in strings with words, punctuation and numbers

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    31

    extract numbers in strings with words, punctuation and numbers

    Hello,

    I have to read a file which, among other data, contains here and there lines with the format

    # word1 number, word2 number

    word1 and word2 are always the same words, the comma is always in the same place (after the first number), but the numbers change, and could have one, two, or three digits. I need to save those numbers as ints.

    Detecting the lines while reading the file is of course very easy, but extracting the numbers is proving much more difficult for me.

    I tried a very add-hoc code that looks at the whole line as string and has lots of if statements, but it is not working. I also found in this forum a post about using strtok for a situation like mine, but I am still confused. I should say that I am not a programmer, but someone who needs to write a C program for research, and who has learned by imitating examples and the kind advise of people in forums like this.

    Any help would be much appreciated.

    Thanks,
    mc61

  2. #2
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    fscanf or some combination of fgets/sscanf is often used for such things. Did you try that? What did you try?
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  3. #3
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Code:
    space  space  space  space
     #1    #2      #3    #4
    #^word1^number,^word2^number
    I used the code tags just so I could show it with a better font.

    Does the position and number of spaces in each line, give you some
    ideas how this could be done? ASCII # for a space is 32, btw.

    I'm thinking of a while (1) loop, and then just counting the blank spaces we encounter
    while going through the buffer holding this line of data. When we get to two on our
    blank char count, then we take the next several char's and change them to their int values.
    Do the same when blank char count == 4.

    I don't believe the comma's are positioned as advantageously for your purpose, as the blank
    characters are. YMMV.

    No sense me repeating what you've already tried. Post your attempt (working or not), and
    let's see what might be a good idea for this.
    Last edited by Adak; 12-20-2007 at 12:16 PM.

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    If you have formatted input, you should use the formatted input function, the little-known scanf:
    Code:
    check = sscanf(line_from_file, "# Word1 %d, Word2 %d", &i, &j);
    If check==2, then you found two numbers. Works like a charm. (I'm assuming you already have a loop that reads successive lines into a string.)

  5. #5
    Registered User
    Join Date
    Dec 2007
    Posts
    31
    Thank you all for the suggestions. The code

    Code:
    sscanf(line_from_file, "# Word1 %d, Word2 %d", &i, &j);
    worked like a charm indeed!!

    Thanks again,

    mc61

  6. #6
    Registered User ssharish2005's Avatar
    Join Date
    Sep 2005
    Location
    Cambridge, UK
    Posts
    1,732
    or you could even try this

    Code:
    sscanf(ptr, "# %*s %d, %*s %d", &i, &j);
    ssharish

Popular pages Recent additions subscribe to a feed