Thread: Bitwise shifts

  1. #1
    Registered User
    Join Date
    Feb 2012
    Posts
    2

    Bitwise shifts

    Hello, I'm trying to write a simple "file encryption" type program, merely to teach myself a little bit about programming in C, but I keep getting strange errors in the output files from my program. What my program does is take a hexadecimal key on the command line (say, "F0BD5") and then reduce the ASCII values to actual binary values, (so 'F' (0x46) turns into 0x0F), and then "roll" each byte depending on the key value. For example, if the byte is 01101010b, and the key was 0x03, the original byte would roll to the right three places, so the output is 01001101b.The least significant three bits would be appended to the top. However, I'm having trouble with my "rolling" function. Here's my source code:
    Code:
    #include <stdio.h>
    #include <strings.h>
    
    
    #define    BLOCK    4096
    #define    DEBUG    1
    
    
    int        convertkey(void);
    char    encode(char, char);
    int        engine(char *);
    char    getkey(void);
    
    
    char    *key,    *infilename,    *outfilename,
            inbuffer[BLOCK],        outbuffer[BLOCK];
    
    
    int        kp=0,    maxkp;
    
    
    FILE    *infile,    *outfile;
    
    
    int main(int argc, char *argv[]) {
        if(argc!=5) {
            fprintf(stderr, "Need operation, key, input, and output files!\n");
            return 1;
        }
        key=argv[2];
        if((infile=fopen(argv[3], "rb"))==NULL) {
            fprintf(stderr, "Can't open input file.\n");
            return 2;
        }
        if((outfile=fopen(argv[4], "wb+"))==NULL) {
            fprintf(stderr, "Can't open output file.\n");
            return 3;
        }
        if((maxkp=convertkey())==0) {
            fprintf(stderr, "Invalid hexadecimal key.\n");
            return 4;
        }
        if(engine(argv[1])!=0) {    //send argv1 so the engine knows if encoding or decoding
            fprintf(stderr, "Write error in the engine.\n");
            return 5;
        }
    
    
        fclose(infile);
        fclose(outfile);
    
    
        return 0;
    }
    
    
    int convertkey(void) {
        int    i;
    
    
        for(i=0; i<strlen(key); i++) {
            if((key[i]>='0')&&(key[i]<='9')) {
                key[i]-='0';
                continue;
            }
            if((key[i]>='A')&&(key[i]<='F')) {
                key[i]-='A';
                continue;
            }
            if((key[i]>='a')&&(key[i]<='f')) {
                key[i]-='a';
                continue;
            }
            else {
                key[i]=0;
                break;
            }
        }
        return i;    //so the length doesn't need to be rechecked.
    }
    
    
    char getkey(void) {
        char    retval;
    
    
        if(kp==maxkp) kp=0;
        retval=key[kp];
        kp++;
    
    
        return retval;
    }
    
    
    int engine(char *operation) {
        int    i,    iolen;
        while((iolen=fread(inbuffer, 1, BLOCK, infile))!=0) {
            for(i=0; i<iolen; i++) {
                if(operation[0]=='e') outbuffer[i]=encode(getkey(), inbuffer[i]);
                else outbuffer[i]=encode((8-getkey()), inbuffer[i]);
            }
            if(fwrite(outbuffer, 1, iolen, outfile)!=iolen) return 1;
        }
        return 0;
    }
    
    
    char encode(char enckey, char encbyte) {
        char    usekey,    retval;
    
    
        usekey=enckey&0x07;
        retval=((encbyte>>usekey)|(encbyte<<(8-usekey)));
        if(enckey&0x08) retval=~retval;
        return retval;
    }
    To see what was happening, I made a file with the values from 0x00 to 0xFF in it, and "encrypted" it with a key of 0x01 (should roll to the right one, and place the last bit up top), but when I do this, as soon as the input value is 0x80 or above, the output always has the 0x80 bit set to 1.


    Ultimately, after encrypting and then decrypting, half the bytes come back correct, and the others are 0xFF. I just don't get why this is happening.
    Last edited by Sean Killian; 02-08-2012 at 02:28 PM.

  2. #2
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Right shift uses "sign extension" for signed values.
    Use unsigned values like unsigned char, etc.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  3. #3
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by oogabooga View Post
    Right shift uses "sign extension" for signed values.
    Not necessarily correct. It's certainly not a good habit to get used to that behavior or think it portable. A similar issue came up a few days back, for which I dug up the standard, it's implementation defined*. Here's the post: Bit manipulation puzzles. Regardless, the solution is as you say:
    Use unsigned values like unsigned char, etc.
    *Only for signed types with negative values. Basically, the standard makes a mess of these operations.

  4. #4
    Registered User
    Join Date
    Feb 2012
    Posts
    2
    Thank you so much. That's certainly a pain! Thanks, though!

  5. #5
    Registered User
    Join Date
    Feb 2012
    Posts
    44
    Yes, assembler comes in handy here. Most instruction sets I've seen have Logical shift right (treat as unsigned) and Arithmetic shift right (propogate sign bit). I haven't done any shifting to see what the underlying asm code is for the PC platform.

  6. #6
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Thanks anduril. I didn't know that. I thought it was always compiled as an arithmetic shift for signed types.

    From the standard:
    [For] E1 >> E2 ... If E1 has a signed type and a negative value, the resulting value is implementation-defined.
    It's sad that it isn't defined as an arithmetic shift, since then you'd have (portable and direct) access to both types of shift by using signed or unsigned values. For that reason, I bet most implementations define it as arithmetic if they can.

    In the following program:
    Code:
    int main() {
        signed   char sc = 0xaa;
        unsigned char uc = 0xaa;
        sc >>= 1;
        uc >>= 1;
        return 0;
    }
    Using gcc on windows, sc >>= 1 compiled as
    sarb (%eax)
    which is "Shift Arithmetic Right Byte".

    and uc >>= 1 compiled as
    shrb (%eax)
    which is "SHift logical Right Byte".
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  7. #7
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by jlewand View Post
    Yes, assembler comes in handy here. Most instruction sets I've seen have Logical shift right (treat as unsigned) and Arithmetic shift right (propogate sign bit). I haven't done any shifting to see what the underlying asm code is for the PC platform.
    Assembler is generally a useful skill, but it, and the underlying architecture, have nothing* to do with implementation defined behavior (IDB). IDB has to do with the implementation (compiler and standard libraries). The standard only dictates that the implementation document how it works. By "PC platform", I assume you mean the x86 and x64 architectures. They do support an arithmetic shift right, which does sign extension, but other architectures might not. Thus, a compiler writer is fully justified in making >> always perform a logical shift right (without sign extension) even on signed data, and even when said data is negative, so long as it's documented.

    * I say "nothing" because technically it has nothing to do with it, though most compiler writers take the target architecture(s) into consideration when defining their implementation.

  8. #8
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by oogabooga View Post
    It's sad that it isn't defined as an arithmetic shift, since then you'd have (portable and direct) access to both types of shift by using signed or unsigned values. For that reason, I bet most implementations define it as arithmetic if they can.
    Not every system uses 2's complement for negative numbers, so sign extension might not make sense in those cases, hence implementation defined or undefined behavior. Also, not every architecture defines arithmetic shifts. ARM for example, has an arithmetic shift right, which may be used, but it does not have an arithmetic shift left.

    Using gcc on windows
    That's the key right there, your compiler, libraries, etc that form the implementation. GCC on Linux *might* do something different (though doubtful). MSVC++, Pelles C, etc all get to chose how they want to handle that. They just have to document it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 15
    Last Post: 10-06-2009, 11:20 PM
  2. Bit shifts
    By Buckshot in forum C++ Programming
    Replies: 13
    Last Post: 05-23-2005, 08:14 PM
  3. Bit Shifts
    By TheSquid in forum C Programming
    Replies: 2
    Last Post: 09-19-2004, 12:16 AM
  4. Circular Bit Shifts
    By Inquirer in forum C++ Programming
    Replies: 2
    Last Post: 08-31-2003, 02:04 PM
  5. bitwise shifts
    By sballew in forum C Programming
    Replies: 28
    Last Post: 12-01-2001, 12:59 AM

Tags for this Thread