Thread: Pros and cons of using unsigned char for representing small numbers

  1. #1
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404

    Pros and cons of using unsigned char for representing small numbers

    Hi,

    here's a fictional example based on a real-world problem I had. Basically, I had a class with a data member who's value couldn't be greater than 9, and I was wondering what type I should have used in the class interface to represent this number.

    Ex.1: using unsigned char in the class interface to represent a number in the range [0, 9]
    Code:
    #include <iostream>
    
    class SmallInteger
    {
    public:
        unsigned char getValue() const
        {
            return value_;
        }
    
        void setValue(unsigned char v)
        {
            if (v <= 9)
                value_ = v;
        }
    
    private:
        unsigned char value_;
    };
    
    int main()
    {
        SmallInteger s;
    
        s.setValue(7);
        std::cout << s.getValue() << '\n';  // Incorrect: user gets a "beep" instead of "7"... at least on my system...
        std::cout << static_cast<unsigned int>(s.getValue()) << '\n';  // Correct... but painful
    
        return 0;
    }
    Ex.2: using unsigned int in the class interface to represent a number in the range [0, 9]
    Code:
    #include <iostream>
    
    class SmallInteger
    {
    public:
        unsigned int getValue() const
        {
            return value_;
        }
    
        void setValue(unsigned int v)
        {
            if (v <= 9)
                value_ = static_cast<unsigned char>(v);
        }
    
    private:
        unsigned char value_;
    };
    
    int main()
    {
        SmallInteger s;
    
        s.setValue(7);
        std::cout << s.getValue() << '\n';  // Correct!
    
        return 0;
    }
    I used unsigned int instead of unsigned char because it made the interface more "natural" and easier to use. But then, is there any disadvantage of doing so ? And what about unsigned short ?

    Should I (in most case) stick to the "int" type in my interfaces even if the number manipulated can fit inside an unsigned short or an unsigned char AND the user is aware of the constraint ? And performance-wise, is there one way better than the other ?

    Thanks.
    I hate real numbers.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    unsigned int is most likely 4x larger than unsigned char. Other than that (and consequences of that), I wouldn't think it's much different. Unsigned short is probably inbetween.

    Of course, the drawback about char types is the overrides for output that will output the character, not the number.

    It all depends on what you want to achieve, really. Usually, small integers are used to save space. If you don't actually save any space by using a larger integer, what is the point?

    You could of course write a output operator (operator<<) for SmallInteger [and an input operator >> of course], and move the painful cast into that function.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Code:
    std::cout << s.getValue() << '\n';  // Incorrect: user gets a "beep" instead of "7"... at least on my system...
    That is because 7 in the ASCII chart means beep.
    http://www.idevelopment.info/data/Pr...ii_table.shtml

    Perhaps you can cast it before returning? (return an unsigned int while storing it as a unsigned char)

    *edit* Nevermind. Didn't read your whole post. */edit*
    Last edited by cyberfish; 08-18-2008 at 10:13 AM.

  4. #4
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Using the int might be faster, as that occupies the natural size for the machine. It is also more clear (as you noted), so I would prefer that option.

    The only reason I'd consider the char is if you had many instances of this class and memory consumption was a possible issue in your application. In that case it might be worth the extra effort to output the value properly.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    So it's basically a trade-off between memory consumption and computing speed.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    And the fact that if your requirements for the number ever do magically grow...like they tend to do...using an unsigned char would mean it could only grow so far. Many times when the requirements are written some will say that this or that will never grow beyond such and such. However planning for this 'variable' to grow may prevent you from having to go back and alter code later.

    If memory consumption is not really an issue then give yourself some 'buffer' room.

    Performance wise I would say there is next to no difference IF you allow the compiler to auto-align your objects. If not then there may be a difference in performance between the two. However even if the compiler aligns the data for you then it should not affect the overall performance...at least not to any degree that you would notice - unless a lot of your data was misaligned. Misaligned data is slower but unless you have proven this is the cause of your slow down then I would say this is a moot point. Just allow the compiler to align the data for you and you should be just fine. I rarely, if ever, consider performance as a reason to use a certain integral data type.
    Last edited by VirtualAce; 08-18-2008 at 02:46 PM.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    The Alpha architecture doesn't have a byte load, only a qword load. Anything smaller needs to be masked after loading, which can be a significant performance hit for smaller data, even if it's aligned. (Also, because Alpha doesn't support unaligned loads, if the byte is not aligned it needs to be shifted, too.)

    Loading a qword is one instruction. Loading an aligned byte is two. Loading an unaligned byte is, I believe, five.
    Code:
    create aligned address
    load qword
    determine misalignment
    shift by appropriate amount
    mask
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    Alright, thanks for those valuable insights. I appreciate it.
    I hate real numbers.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by CornedBee View Post
    The Alpha architecture doesn't have a byte load, only a qword load. Anything smaller needs to be masked after loading, which can be a significant performance hit for smaller data, even if it's aligned. (Also, because Alpha doesn't support unaligned loads, if the byte is not aligned it needs to be shifted, too.)

    Loading a qword is one instruction. Loading an aligned byte is two. Loading an unaligned byte is, I believe, five.
    Code:
    create aligned address
    load qword
    determine misalignment
    shift by appropriate amount
    mask
    From what I understand, the EV56 (1996) had "BWX - Byte Word extension", so from that point instructions that read/write bytes and words existed in Alpha. http://en.wikipedia.org/wiki/DEC_Alpha

    But you are indeed correct in that some processors do not "like" certain sizes of data and it takes extra instructions to process "simple" data.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Pros and Cons of blocking versus non-blocking sockets?
    By abachler in forum Networking/Device Communication
    Replies: 4
    Last Post: 05-08-2008, 06:52 AM