Thread: Array of getchar / count

  1. #16
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    int to_array_index(int c) {
        if (c >= 97 && c <= 122) /* ascii a - b */
        {
            return c - 97;
        }
        else if (c >= 65 && c <= 90)  /* ascii A - B */
        {
            return c - 65
        }
        else {
            return -1; /* error, not letter */
        }
    }
    Yeuch...
    Code:
    int to_array_index(int c) {
        if (c >= 'a' && c <= 'z') /* ascii a - b */
        {
            return c - 'a';
        }
        else if (c >= 'A' && c <= 'Z')  /* ascii A - B */
        {
            return c - 'A'
        }
        else {
            return -1; /* error, not letter */
        }
    }
    is much easier to read, don't you think?

    Yes, I know that 97 is 'a', but why not let the compiler work it out for you. The one extra character you have to type in 5 out of the 6 places isn't really going to matter in the whole scheme of things, is it?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  2. #17
    Why bbebfe is not bbebfe? bbebfe's Avatar
    Join Date
    Nov 2008
    Location
    Earth
    Posts
    27

    Wink

    Quote Originally Posted by matsp View Post
    Code:
    int to_array_index(int c) {
        if (c >= 97 && c <= 122) /* ascii a - b */
        {
            return c - 97;
        }
        else if (c >= 65 && c <= 90)  /* ascii A - B */
        {
            return c - 65
        }
        else {
            return -1; /* error, not letter */
        }
    }
    Yeuch...
    Code:
    int to_array_index(int c) {
        if (c >= 'a' && c <= 'z') /* ascii a - b */
        {
            return c - 'a';
        }
        else if (c >= 'A' && c <= 'Z')  /* ascii A - B */
        {
            return c - 'A'
        }
        else {
            return -1; /* error, not letter */
        }
    }
    is much easier to read, don't you think?

    Yes, I know that 97 is 'a', but why not let the compiler work it out for you. The one extra character you have to type in 5 out of the 6 places isn't really going to matter in the whole scheme of things, is it?

    --
    Mats

    You are actually right, it's more better now.
    Do you know why bbebfe is NOT bbebfe?

  3. #18
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Nonetheless note that matsp's modification still does not make the code absolutely portable unlike cas' suggestions, though it makes it slightly more portable and definitely more readable. On the other hand, such absolute portability can generally be ignored nowadays since Unicode is a superset of ASCII.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #19
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by laserlight View Post
    Nonetheless note that matsp's modification still does not make the code absolutely portable unlike cas' suggestions, though it makes it slightly more portable and definitely more readable. On the other hand, such absolute portability can generally be ignored nowadays since Unicode is a superset of ASCII.
    To be clear, my solution was not an attempt to make the code more portable, but to make it more READABLE. I should perhaps have mentioned that in my post - for portability, we'd have to accept that the characters MAY not come in any particular order, and that there may be gaps between two letters in the alphabet.

    More so if we start looking at international languages, e.g. Scandinavian languages which have ä, å, ö - those are outside the range of 32-126 that ASCII printables are within. But if you wanted to count the number of occurrences of the letters in a Swedish text, they would be necessary. In that case, something like cas' code would be much better suited.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #20
    HelpingYouHelpUsHelpUsAll
    Join Date
    Dec 2007
    Location
    In your nightmares
    Posts
    223
    Portability was not mentioned in the OP, so I ommitted it. If Marth_01 has this problem, then the last thing they would be thinking about is portability or worrying whether it works w/ Unicode & ASCII.
    Also how do cas's suggestions differenciate between ASCII and EBDIC or unicode? If they don't then you are being hypocritical.
    How about just support Unicode? I was meant to replace ASCII as the standard character set so when (&if) it replaces ASCII no one will use the other charactersets & will end up w/ absolute portability.
    long time no C; //seige
    You miss 100% of the people you don't C;
    Code:
    if (language != LANG_C && language != LANG_CPP)
        drown(language);

  6. #21
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    Quote Originally Posted by P4R4N01D View Post
    Also how do cas's suggestions differenciate between ASCII and EBDIC or unicode? If they don't then you are being hypocritical.
    My code did not assume any ordering at all regarding the character set. I showed two methods. The first searched a sorted array of alphabetic characters and returned the index of the found character (or -1 if none was found). Since I controlled the order of the characters in the array, I was able to guarantee that 'a' yielded 0, 'b' yielded 1, and so on, without caring about what the actual values of 'a', 'b', etc. are.

    The second method was what you might call a "brute force" method. What it does is basically this: If the input is 'a', return 0. If the input is 'b', return 1. And so on. It again doesn't matter what values 'a' and 'b' have.

    If you're going to build the code on an EBCDIC system (assuming it was written on an ASCII system) then you'd need to convert the source file itself, using iconv or something similar. But once it's converted, my code would work properly. Code that assumes ASCII would similarly have to be converted with iconv, but once it's converted, it won't work properly on the EBCDIC system.

    I agree that this isn't the most necessary portability issue (as I pointed out in my original post), but at the very least I want to make sure that people know that C can run on non-ASCII systems. It's one thing to assume ASCII because you know your target systems are all ASCII; it's quite another (in my opinion) to assume ASCII because you aren't aware of other options.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. array of pointers/pointer arithmetic
    By tlpog in forum C Programming
    Replies: 18
    Last Post: 11-09-2008, 07:14 PM
  2. Replies: 2
    Last Post: 07-11-2008, 07:39 AM
  3. 2d array question
    By gmanUK in forum C Programming
    Replies: 2
    Last Post: 04-21-2006, 12:20 PM
  4. count array value
    By miryellis in forum C Programming
    Replies: 7
    Last Post: 10-05-2004, 10:02 AM
  5. Quick question about SIGSEGV
    By Cikotic in forum C Programming
    Replies: 30
    Last Post: 07-01-2004, 07:48 PM