Thread: Optimizing hot section of code.

    Maybe use a lea instruction? Still working on my analytical code before diving into converting it to assembly so not sure. Also I noticed you used a jz instead of jmp, why was that?

    I managed to "optimize" it even further into this stripped down version.
    Problem is, there was no change in performance. I can only assume this is what
    the compiler ended up with. flp1969 is right in that compilers have gotten really
    really good at optimizing code. Really.

    The only way I can see to make this faster is to eliminate the right shift operation;
    but then I would have to make the array humongus to accomodate the 64 bit indeces.
    I assume that alone would bring negative performance effects to other sections of
    of the program that work just fine now because they fit nicely in the cache.

    TBH, I didn't bother to test it. As it is now, I'm at quite close to Openssl's DES
    performance. And my base64 encoding is over 100% faster. Considering my code
    -although by no means perfect- is actually readable, and the unholy mess that
    the Openssl code base is, case and point, I'd say I'm pretty happy with the result.
    static inline uint32_t    compress(const uint64_t block)
        register uint32_t    compressed;
        compressed = 0;
        compressed |= (uint32_t)g_sboxes[0][block >> 58] << 28;
        compressed |= (uint32_t)g_sboxes[1][(block >> 52) & 0x3f] << 24;
        compressed |= (uint32_t)g_sboxes[2][(block >> 46) & 0x3f] << 20;
        compressed |= (uint32_t)g_sboxes[3][(block >> 40) & 0x3f] << 16;
        compressed |= (uint32_t)g_sboxes[4][(block >> 34) & 0x3f] << 12;
        compressed |= (uint32_t)g_sboxes[5][(block >> 28) & 0x3f] << 8;
        compressed |= (uint32_t)g_sboxes[6][(block >> 22) & 0x3f] << 4;
        compressed |= (uint32_t)g_sboxes[7][(block >> 18) & 0x3f];
        return (compressed);
    Next I have to implement the RSA section, which entails recreating the genrsa,
    rsa and rsautl Openssl commands with all their options, as well as creating my own
    random number generator. I fear I'm not quite up to the task
