Thread: I'm building my xor encryptor, help with nul character

  1. #1
    Registered User
    Join Date
    Aug 2007
    Posts
    85

    I'm building my xor encryptor, help with nul character

    Well, i noticed someone was building a xor encryptor but wasn't too successful, i decided to take up the challenge.

    The program itself is built and it works, however it has one big problem. Whenever the current key character is the same as the current 'clean' character, I get a \0 (pretty "doh" if you ask me, since that's the purpose of XOR). The big problem is that a nul character is also the char terminator.

    In my whole program, I'm using a single char variable, the input buffer while reading the file.

    Code:
            char ch;
            while( fin.get( ch ) ) {
                filedata.append( &ch );
                fsize++;
            }
    However, it doesn't want to write anything in filedata when it hits a 0.

    My input file holds the letters from a to z. my key is "abcde" (repeated over).
    So, my first file (clean) is 26 bytes. The encrypted one is 26 bytes. So the data from the string, holding nul characters is written down.

    The problem is that my decrypted file (which i get by running the program again through the last generated file) is 21 bytes, and it starts from f. My logical explanation is that the input buffer skips the null characters, believing that it holds a zero-length string.

    What can I do?

  2. #2
    Registered User
    Join Date
    Aug 2007
    Posts
    85
    Solved, thanks to swoopy in another thread.
    New file reading sequence looks like this:
    Code:
            char ch;
            while( fin.get( ch ) ) {
                filedata += ch;
                fsize++;
            }
    This one looks more binary safe.
    I still wonder if there's a way to use get() with a string.

  3. #3
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    You can't read binary data directly into a string, because the string doesn't expose it's internal data storage.

    You could use a vector<char> and read into that. Will you always know the number of bytes in the file? If so, you can read them all in at one time. You can then assign the contents of the vector<char> buffer to your string, or just leave them in there and work on the vector.

  4. #4
    Registered User
    Join Date
    Aug 2007
    Posts
    85
    Done!
    Anyone who has some time and experience, do check out the code and let me know what do you think of my coding style/errors/etc.

    Code:
    #include <cstdlib>
    #include <iostream>
    #include <string>
    #include <fstream>
    
    using namespace std;
    
    void usage() {
            cout << "xorc - XOR enCrypt" << endl << endl;
            cout << "usage: xorc [-parameters] data|input_file key|key_file out_file" << endl;
            cout << "example: xorc -iork input.txt ignored out.txt";
            cout << endl << "xorc encrypts data with key if no parameters specified by XOR-ing them together\n";
            
            cout << endl << "Parameters:" << endl << endl << "i - second argument is treated as input file";
            cout << endl << "k - third argument is treated as key file";
            cout << endl << "    by default it adds a .xor after the extension, or removes it if found";
            cout << endl << "x - replace with input file. remember to forget the key";
            cout << endl << endl << endl << "             Coded by izua, 11-aug-2007";
    }
    
    string xorstr(string data, string key) {
    //this function XORs data with key, by repeating key if needed  
        int klen = key.length();
        int dlen = data.size();     
        int z;
        string result; 
        
    //process every character of data
        for (z=0; z < dlen; z++) {
            result += (key[z % klen] ^ data[z]);
            //omg. this does the whole thing
            //key[z % klen] gets the coresponding key-character for the current data character
            //(repeats it if it's shorter than data. that is xored with current data character
            //to get the result, which is added to a variable, called "result". lol @ above
        }
        
    //we're done, return
        return(result);
    }
    
    string readfile(string filename) {
    //reads a named file
        string filedata;
        //open the file in binary
        ifstream fin(filename.c_str(),ios::binary);
        char ch;
        
        if (!fin.good()) {
        //not good
            cout << "\bError: Unable to open " << filename << endl;
            fin.close();
            return "xorcerr-can't open";
        }
        
        //get each character
        while( fin.get( ch ) ) {
            filedata += ch;
        }
      
        //done! cleanup and return
        fin.close();
        return filedata;
    }
    
    int main(int argc, char *argv[])
    {
    //main
        bool    arg_i=false;
        bool    arg_o=false;
        bool    arg_k=false;
        bool    arg_t=false;
        bool    arg_r=false;
        bool    arg_x=false;
        bool    arg=false;
        
        string  key = "";
        string  filename = "";
        string  buf = "";
        string  encdata = "";
        string  data = "";    
    
        //first, we check to see if we have parameters
        string temp;
        for (int z=0; z<argc; z++) {
            temp=argv[z];
            if (temp.substr(0,1) == "-") {
                arg=true;
                //set flags according to what parameters are passed
                arg_i=(temp.find("i") != string::npos)?true:false;
                arg_k=(temp.find("k") != string::npos)?true:false;
                arg_x=(temp.find("x") != string::npos)?true:false;
                
                //set vars according to determined parameter positions     
            }
        }
        
        //cout << "i" << arg_i << "k" << arg_k << "x" << arg_x << endl;
        
        //check if parameters are correct
        if (argc<4 or argc>5) {
        //instruct the user how to do this, than terminate the app
            usage();
            goto hell;
        } else {
            if ((argc == 4 and arg == false) or argc == 5) {arg_o=true;}
        
        //start doing something useful
         
    //read input file (if any)
            if (arg_i) {
                cout << "Reading " << argv[2] << " into input" << endl;
                data=readfile(argv[2]);
                if (data=="xorcerr-can't open") goto hell;  
                if (data.length()<1) {cout << "Error: input file is invalid"; goto hell; }   
            } else {
                //nope, we're not reading a file. input is second parameter if we have arguments, first if not
                data=(arg)?argv[2]:argv[1];   
            }
    
    //read key file if required
            if (arg_k) {
                cout << "Reading " << argv[2] << " into keyfile" << endl;
                key=readfile(argv[3]);
                if (key=="xorcerr-can't open") goto hell;
                if (key.length()<1) {cout << "Error: the keyfile is invalid"; goto hell; }
            } else {
                //nope, we're not reading a file. input is second parameter if we have arguments, first if not
                key=(arg)?argv[3]:argv[2];   
            }
            
    
    //encrypt it. whoa. really long line
            encdata = xorstr(data, key);
    
    
    //this whole bunch of code computes a new filename - derived from input file if any, as output file if given
    //or raises an error if no file given
    //get filename passed (if any)
        if (arg_o) {
            //passed filename if supplied
            filename=(arg)?argv[4]:argv[3];
        } else if (arg_i and !arg_o) {
            //input filename.xor if no output supplied
            filename=((arg)?argv[2]:argv[1]);
            //check if filename already has xor extension
            if (filename.substr(filename.length()-4,4) == ".xor") {
                //yes, it has. remove it
                filename = filename.substr(0,filename.length()-4);   
            } else {
                //neah, it doesn't have. add it
                filename+=".xor"; 
            }
        } else if (!arg_i and !arg_o) {
            cout << "Error: no file specified";
            goto hell;
            //error if neither input or output specified
        }
    
    
    
    
    
        //open the stream and write data
        ofstream fout(filename.c_str(),ios::binary);
        fout << encdata;
        fout.close();
            
        //announce end
        cout << "Data written to " << filename << "!" << endl;
            
    
        }
     		
        return EXIT_SUCCESS;
        
    hell:
        return EXIT_FAILURE;
    }

  5. #5
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    When passing string to function and do not plan to change it inside - probably better pass it as
    const reference to avoid unneeded copy like:
    Code:
    std::string xorstr(const std::string& data, const std::string& key);
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  6. #6
    Registered User
    Join Date
    Aug 2007
    Posts
    85
    what's the advantage of using const strings over normal strings in that case?

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by _izua_ View Post
    what's the advantage of using const strings over normal strings in that case?
    It tells the compiler and anyone else reading the code that it's guaranteed that the string is going to be unchanged in this function. The compiler is helped to generate better code, whilst the reader understands better what happens to the string in the function. Also, the compiler will tell you if you try modifying a const variable, which can be useful if you do want to be sure that your variable isn't changed in the function.

    --
    Mats

  8. #8
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466

    Here's my version

    I got bit by the XOR bug, so here's my version. Tested for .MPG and .JPG and .TXT files so far.

    Code:
    // Simple XOR encryption program
    // Command line usage:
    // xorc <key> <file(s)>
    // Program will encrypt and overwrite specified files
    
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    static const int MAX_KEY_LEN = 10; /* Maximum key string length */
    
    int main(int argc, char **argv)
    {
      char key[MAX_KEY_LEN + 1];
      int k;
      unsigned keylen;
      FILE *infile;
      int i;
      char c;
      
      if (argc == 1) { /* Didn't supply key or files */
        printf("xorc - XOR enCrypt\nProgrammed by Damon McNeill\nUsage: xorc <key> <file(s)>\n");
        exit(1);
      }
      else if (argc == 2) { /* Didn't supply input files */
        printf("No input file(s). Run \"xorc\" for help.\n");
        exit(1);
      }
      if (strlen(argv[1]) > MAX_KEY_LEN) { /* Key too long */
        printf("Key is limited to %u characters.\n", MAX_KEY_LEN);
      }
    
      /* Store the key and it's length */
      strncpy(key, argv[1], MAX_KEY_LEN);
      keylen = strlen(key);
      
      /* Process all input files */
      for (i = 2; i < argc; ++i) {
        printf("Processing %s", argv[i]);
        fflush(stdout);
        infile = fopen(argv[i], "r+b"); /* Open file for binary read/write */
        if (!infile)
          printf("Could not open file.\n");
        else {
          /* Find the file size in bytes */
          fseek(infile, 0, SEEK_END); /* Go to end of file */
          long size = ftell(infile); /* Get the number of bytes */
          fseek(infile, 0, SEEK_SET); /* Go back to beginning */
          long tenth = size / 10;
          k = 0;
          /* Read one byte at a time from input file */
          while (fread(&c, 1, 1, infile) == 1) {
            long curr = ftell(infile); /* Get the current position */
            /* Print a . character every 10% through the file */
            if (curr % tenth == 0) { 
              putchar('.');
              fflush(stdout);
            }
            c = c ^ key[k]; /* Encrypt byte */
            k = k < keylen ? k + 1 : 0;  /* Keep key index in range */
            fseek(infile, -1, SEEK_CUR); /* Overwrite byte last read */
            if (fwrite(&c, 1, 1, infile) != 1) /* Write encrypted byte */
              break; /* Opps! an error occured */
          }
          if (feof(infile)) /* Made it to end of file with no errors */
            printf("Done.\n");
          else if (ferror(infile)) /* An error was encountered */
            perror(argv[i]); /* Print what the error was */
          fclose(infile); /* Close the file */
        }
      }
      return (EXIT_SUCCESS);
    }
    One problem I have yet to solve is if there is an error writing or reading the file while in the process, the resulting file will be corrupted with no access to the original file. For this reason, if you use this code, backup your file before using xorc.

    Also credit to OP for the filename

    If anyone could take a look at this and suggest an optimization (it runs really slow on multi-megabyte files) that would be awesome.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > strncpy(key, argv[1], MAX_KEY_LEN);
    This doesn't always append a \0, and since your key isn't pre-filled with \0, you have the potential of a very nasty bug.
    It's also completely unnecessary as you could just have said
    char *key = argv[1];
    and strip out all the artificial key length restrictions.

    > printf("Could not open file.\n");
    When you decide to make the program quieter, by not printing the name of every single file processed, then this should print the filename. Ditto on the perror() call later.

    The real performance killer is read/write one byte at a time. Use a BUFSIZ buffer instead.
    Doing a &#37; every single time as well is expensive, in comparison to the rest of the work inside the loop.

    > backup your file before using xorc.
    Why can't you do it in your code.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466
    Quote Originally Posted by Salem View Post
    > strncpy(key, argv[1], MAX_KEY_LEN);
    This doesn't always append a \0, and since your key isn't pre-filled with \0, you have the potential of a very nasty bug.
    It's also completely unnecessary as you could just have said
    char *key = argv[1];
    and strip out all the artificial key length restrictions.
    Noted. Although there was supposed to be an exit(1) command if the key argument was not less than MAX_KEY_LEN. If that is there then that nasty bug would be avoided.

    > printf("Could not open file.\n");
    When you decide to make the program quieter, by not printing the name of every single file processed, then this should print the filename. Ditto on the perror() call later.
    I like it telling the user which file is being processed and how far into each file it is (because it can take so long for large files, good to know it's not hanging)

    The real performance killer is read/write one byte at a time. Use a BUFSIZ buffer instead.
    Doing a % every single time as well is expensive, in comparison to the rest of the work inside the loop.
    Yes the division is expensive and def. slows down the prog.. I must remove the progress indicator code or think of a faster way to do it...

    So I should read a BUFSIZ char array each iteration and also write the same (of course)?

    > backup your file before using xorc.
    Why can't you do it in your code.
    I could, use a tempfile from the C std library. This would slow the code down even further tho.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > So I should read a BUFSIZ char array each iteration and also write the same (of course)?
    Yes, but use the result of fread() to determine how much of the buffer to use.

    > This would slow the code down even further tho.
    Perhaps, there are a lot of factors to take into consideration.
    The benefit of using a temp file is that you don't need all those fseek() calls, and the original file is untouched until you know you've encrypted it.

    If the temp file is on the same disk / controller, then there is going to be some thrashing along the way. A completely separate temp disk for "/tmp" would be significantly better.

    One advantage of the method you have at the moment is that there is no trace of the original file on the disk. If you do use a temp file approach, you need to ensure that you don't do something simple like remove(oldfile) without first trashing the contents of the file.

    Some element of speed is going to be lost anyway if you need to make sure that the original file exists in case of any failure, and is thoroughly destroyed in the case of success.

    > I must remove the progress indicator code or think of a faster way to do it...
    if ( ++count == tenth ) count = 0; // and other stuff
    is what I use in such cases.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466
    Cool, good ideas there. Thanks for the help.

  13. #13
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Daved View Post
    You can't read binary data directly into a string, because the string doesn't expose it's internal data storage.
    Depends what you mean by "directly into." A back_inserter is pretty efficient, even when applied to a string.

  14. #14
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    This is a reasonably efficient routine to raw-read an entire file into memory. Part of my standard bag:

    Code:
    std::istream &load_file_raw(std::istream &stream_p, std::string &data_p, size_t bufsize_p = 128*1024)
    {
       boost::scoped_array<char> buff(new char[bufsize_p]);
       char *bptr = buff.get();
       while(stream_p)
       {
          stream_p.read(bptr, bufsize_p);
          data_p.append(bptr, stream_p.gcount());
       }
       return stream_p;
    }
    The default buffer size is 128K, you can change as necessary. Just how efficient this is depends on your standard library. Most I've seen do a decent job.

  15. #15
    Registered User
    Join Date
    Aug 2007
    Posts
    85
    darn I have to make boost work in my IDE. I'm missing out so many cool features, starting with regular expressions

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 11
    Last Post: 10-07-2008, 06:19 PM
  2. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  3. Character handling help
    By vandalay in forum C Programming
    Replies: 18
    Last Post: 03-29-2004, 05:32 PM
  4. character occurrence program not working
    By Nutshell in forum C Programming
    Replies: 6
    Last Post: 01-21-2002, 10:31 PM
  5. Replies: 12
    Last Post: 01-12-2002, 09:57 AM