Array of getchar / count

**matsp** · 11-19-2008

Code:

int to_array_index(int c) {
    if (c >= 97 && c <= 122) /* ascii a - b */
    {
        return c - 97;
    }
    else if (c >= 65 && c <= 90)  /* ascii A - B */
    {
        return c - 65
    }
    else {
        return -1; /* error, not letter */
    }
}

Yeuch...

Code:

int to_array_index(int c) {
    if (c >= 'a' && c <= 'z') /* ascii a - b */
    {
        return c - 'a';
    }
    else if (c >= 'A' && c <= 'Z')  /* ascii A - B */
    {
        return c - 'A'
    }
    else {
        return -1; /* error, not letter */
    }
}

is much easier to read, don't you think?

Yes, I know that 97 is 'a', but why not let the compiler work it out for you. The one extra character you have to type in 5 out of the 6 places isn't really going to matter in the whole scheme of things, is it?

--
Mats

**bbebfe** · 11-19-2008

Originally Posted by matsp

Code:

int to_array_index(int c) {
    if (c >= 97 && c <= 122) /* ascii a - b */
    {
        return c - 97;
    }
    else if (c >= 65 && c <= 90)  /* ascii A - B */
    {
        return c - 65
    }
    else {
        return -1; /* error, not letter */
    }
}

Yeuch...

Code:

int to_array_index(int c) {
    if (c >= 'a' && c <= 'z') /* ascii a - b */
    {
        return c - 'a';
    }
    else if (c >= 'A' && c <= 'Z')  /* ascii A - B */
    {
        return c - 'A'
    }
    else {
        return -1; /* error, not letter */
    }
}

is much easier to read, don't you think?

Yes, I know that 97 is 'a', but why not let the compiler work it out for you. The one extra character you have to type in 5 out of the 6 places isn't really going to matter in the whole scheme of things, is it?

--
Mats

You are actually right, it's more better now.

**laserlight** · 11-19-2008

Nonetheless note that matsp's modification still does not make the code absolutely portable unlike cas' suggestions, though it makes it slightly more portable and definitely more readable. On the other hand, such absolute portability can generally be ignored nowadays since Unicode is a superset of ASCII.

**matsp** · 11-19-2008

Originally Posted by laserlight

Nonetheless note that matsp's modification still does not make the code absolutely portable unlike cas' suggestions, though it makes it slightly more portable and definitely more readable. On the other hand, such absolute portability can generally be ignored nowadays since Unicode is a superset of ASCII.

To be clear, my solution was not an attempt to make the code more portable, but to make it more READABLE. I should perhaps have mentioned that in my post - for portability, we'd have to accept that the characters MAY not come in any particular order, and that there may be gaps between two letters in the alphabet.

More so if we start looking at international languages, e.g. Scandinavian languages which have ä, å, ö - those are outside the range of 32-126 that ASCII printables are within. But if you wanted to count the number of occurrences of the letters in a Swedish text, they would be necessary. In that case, something like cas' code would be much better suited.

--
Mats

**P4R4N01D** · 11-19-2008

Portability was not mentioned in the OP, so I ommitted it. If Marth_01 has this problem, then the last thing they would be thinking about is portability or worrying whether it works w/ Unicode & ASCII.
Also how do cas's suggestions differenciate between ASCII and EBDIC or unicode? If they don't then you are being hypocritical.
How about just support Unicode? I was meant to replace ASCII as the standard character set so when (&if) it replaces ASCII no one will use the other charactersets & will end up w/ absolute portability.

**cas** · 11-19-2008

Originally Posted by P4R4N01D

Also how do cas's suggestions differenciate between ASCII and EBDIC or unicode? If they don't then you are being hypocritical.

My code did not assume any ordering at all regarding the character set. I showed two methods. The first searched a sorted array of alphabetic characters and returned the index of the found character (or -1 if none was found). Since I controlled the order of the characters in the array, I was able to guarantee that 'a' yielded 0, 'b' yielded 1, and so on, without caring about what the actual values of 'a', 'b', etc. are.

The second method was what you might call a "brute force" method. What it does is basically this: If the input is 'a', return 0. If the input is 'b', return 1. And so on. It again doesn't matter what values 'a' and 'b' have.

If you're going to build the code on an EBCDIC system (assuming it was written on an ASCII system) then you'd need to convert the source file itself, using iconv or something similar. But once it's converted, my code would work properly. Code that assumes ASCII would similarly have to be converted with iconv, but once it's converted, it won't work properly on the EBCDIC system.

I agree that this isn't the most necessary portability issue (as I pointed out in my original post), but at the very least I want to make sure that people know that C can run on non-ASCII systems. It's one thing to assume ASCII because you know your target systems are all ASCII; it's quite another (in my opinion) to assume ASCII because you aren't aware of other options.

Thread: Array of getchar / count

Thread Tools

Search Thread

Display

Similar Threads

array of pointers/pointer arithmetic

Dynamic array of structures containing yet another dynamic array of structures

2d array question

count array value

Quick question about SIGSEGV