Bitwise shifts

**Sean Killian** · 02-08-2012

Hello, I'm trying to write a simple "file encryption" type program, merely to teach myself a little bit about programming in C, but I keep getting strange errors in the output files from my program. What my program does is take a hexadecimal key on the command line (say, "F0BD5") and then reduce the ASCII values to actual binary values, (so 'F' (0x46) turns into 0x0F), and then "roll" each byte depending on the key value. For example, if the byte is 01101010b, and the key was 0x03, the original byte would roll to the right three places, so the output is 01001101b.The least significant three bits would be appended to the top. However, I'm having trouble with my "rolling" function. Here's my source code:

Code:

#include <stdio.h>
#include <strings.h>


#define    BLOCK    4096
#define    DEBUG    1


int        convertkey(void);
char    encode(char, char);
int        engine(char *);
char    getkey(void);


char    *key,    *infilename,    *outfilename,
        inbuffer[BLOCK],        outbuffer[BLOCK];


int        kp=0,    maxkp;


FILE    *infile,    *outfile;


int main(int argc, char *argv[]) {
    if(argc!=5) {
        fprintf(stderr, "Need operation, key, input, and output files!\n");
        return 1;
    }
    key=argv[2];
    if((infile=fopen(argv[3], "rb"))==NULL) {
        fprintf(stderr, "Can't open input file.\n");
        return 2;
    }
    if((outfile=fopen(argv[4], "wb+"))==NULL) {
        fprintf(stderr, "Can't open output file.\n");
        return 3;
    }
    if((maxkp=convertkey())==0) {
        fprintf(stderr, "Invalid hexadecimal key.\n");
        return 4;
    }
    if(engine(argv[1])!=0) {    //send argv1 so the engine knows if encoding or decoding
        fprintf(stderr, "Write error in the engine.\n");
        return 5;
    }


    fclose(infile);
    fclose(outfile);


    return 0;
}


int convertkey(void) {
    int    i;


    for(i=0; i<strlen(key); i++) {
        if((key[i]>='0')&&(key[i]<='9')) {
            key[i]-='0';
            continue;
        }
        if((key[i]>='A')&&(key[i]<='F')) {
            key[i]-='A';
            continue;
        }
        if((key[i]>='a')&&(key[i]<='f')) {
            key[i]-='a';
            continue;
        }
        else {
            key[i]=0;
            break;
        }
    }
    return i;    //so the length doesn't need to be rechecked.
}


char getkey(void) {
    char    retval;


    if(kp==maxkp) kp=0;
    retval=key[kp];
    kp++;


    return retval;
}


int engine(char *operation) {
    int    i,    iolen;
    while((iolen=fread(inbuffer, 1, BLOCK, infile))!=0) {
        for(i=0; i<iolen; i++) {
            if(operation[0]=='e') outbuffer[i]=encode(getkey(), inbuffer[i]);
            else outbuffer[i]=encode((8-getkey()), inbuffer[i]);
        }
        if(fwrite(outbuffer, 1, iolen, outfile)!=iolen) return 1;
    }
    return 0;
}


char encode(char enckey, char encbyte) {
    char    usekey,    retval;


    usekey=enckey&0x07;
    retval=((encbyte>>usekey)|(encbyte<<(8-usekey)));
    if(enckey&0x08) retval=~retval;
    return retval;
}

To see what was happening, I made a file with the values from 0x00 to 0xFF in it, and "encrypted" it with a key of 0x01 (should roll to the right one, and place the last bit up top), but when I do this, as soon as the input value is 0x80 or above, the output always has the 0x80 bit set to 1.

Ultimately, after encrypting and then decrypting, half the bytes come back correct, and the others are 0xFF. I just don't get why this is happening.

**oogabooga** · 02-08-2012

Right shift uses "sign extension" for signed values.
Use unsigned values like unsigned char, etc.

**anduril462** · 02-08-2012

Originally Posted by oogabooga

Right shift uses "sign extension" for signed values.

Not necessarily correct. It's certainly not a good habit to get used to that behavior or think it portable. A similar issue came up a few days back, for which I dug up the standard, it's implementation defined*. Here's the post: Bit manipulation puzzles. Regardless, the solution is as you say:

Use unsigned values like unsigned char, etc.

*Only for signed types with negative values. Basically, the standard makes a mess of these operations.

**Sean Killian** · 02-08-2012

Thank you so much. That's certainly a pain! Thanks, though!

**jlewand** · 02-08-2012

Yes, assembler comes in handy here. Most instruction sets I've seen have Logical shift right (treat as unsigned) and Arithmetic shift right (propogate sign bit). I haven't done any shifting to see what the underlying asm code is for the PC platform.

**oogabooga** · 02-08-2012

Thanks anduril. I didn't know that. I thought it was always compiled as an arithmetic shift for signed types.

From the standard:

[For] E1 >> E2 ... If E1 has a signed type and a negative value, the resulting value is implementation-defined.

It's sad that it isn't defined as an arithmetic shift, since then you'd have (portable and direct) access to both types of shift by using signed or unsigned values. For that reason, I bet most implementations define it as arithmetic if they can.

In the following program:

Code:

int main() {
    signed   char sc = 0xaa;
    unsigned char uc = 0xaa;
    sc >>= 1;
    uc >>= 1;
    return 0;
}

Using gcc on windows, sc >>= 1 compiled as
sarb (%eax)
which is "Shift Arithmetic Right Byte".

and uc >>= 1 compiled as
shrb (%eax)
which is "SHift logical Right Byte".

**anduril462** · 02-08-2012

Originally Posted by jlewand

Yes, assembler comes in handy here. Most instruction sets I've seen have Logical shift right (treat as unsigned) and Arithmetic shift right (propogate sign bit). I haven't done any shifting to see what the underlying asm code is for the PC platform.

Assembler is generally a useful skill, but it, and the underlying architecture, have nothing* to do with implementation defined behavior (IDB). IDB has to do with the implementation (compiler and standard libraries). The standard only dictates that the implementation document how it works. By "PC platform", I assume you mean the x86 and x64 architectures. They do support an arithmetic shift right, which does sign extension, but other architectures might not. Thus, a compiler writer is fully justified in making >> always perform a logical shift right (without sign extension) even on signed data, and even when said data is negative, so long as it's documented.

* I say "nothing" because technically it has nothing to do with it, though most compiler writers take the target architecture(s) into consideration when defining their implementation.

**anduril462** · 02-08-2012

Originally Posted by oogabooga

It's sad that it isn't defined as an arithmetic shift, since then you'd have (portable and direct) access to both types of shift by using signed or unsigned values. For that reason, I bet most implementations define it as arithmetic if they can.

Not every system uses 2's complement for negative numbers, so sign extension might not make sense in those cases, hence implementation defined or undefined behavior. Also, not every architecture defines arithmetic shifts. ARM for example, has an arithmetic shift right, which may be used, but it does not have an arithmetic shift left.

Using gcc on windows

That's the key right there, your compiler, libraries, etc that form the implementation. GCC on Linux *might* do something different (though doubtful). MSVC++, Pelles C, etc all get to chose how they want to handle that. They just have to document it.

Thread: Bitwise shifts

Thread Tools

Search Thread

Display

Bitwise shifts

Similar Threads

C program to convert a decimal number to hex using masks and shifts

Bit shifts

Bit Shifts

Circular Bit Shifts

bitwise shifts

Tags for this Thread