Thread: Is there an efficient way to toss binary data in C++?

  1. #1
    Registered User
    Join Date
    Jul 2007
    Posts
    2

    Is there an efficient way to toss binary data in C++?

    Below if the code I'm using. I wonder if anyone knows a faster way to do what I'm doing, which is taking in a file (piping it from a bash terminal like "./a.out < input > output") by reading its data byte by byte, and throwing out every other byte (so the final output is 1/2 the original).

    I find that that way I'm doing it is really slow (it takes about a second to process 1.2 Mbytes). Is there a way to do this faster? I think that reading the file in at once might speed things up so that cin doesn't have to be called over and over, but I don't know how to go about doing that.


    Code:
    #include<iostream>
    
    using namespace std;
    
    main()
    {int i=0;
    char ch;
    
    while(cin.get(ch))
            {if(i == 1)
                    {cout << ch;
                    i = 0;
                    }
            else if(i == 0)
                    {i = 1;
                    }
    
            else {exit(-1);}
    
            }
    
    }

  2. #2
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    The data from cin is already buffered. You will probably not get a performance improvement by using something other than .get().
    Callou collei we'll code the way
    Of prime numbers and pings!

  3. #3
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    Before optomizing, do try to figure out what the theoretical limits you are dealing with here.
    Code:
    #include <iostream>
    
    int main (void) {
       char ch;
       while (std::cin.get(ch)) {
          std::cout << ch;
       }
    }
    Callou collei we'll code the way
    Of prime numbers and pings!

  4. #4
    Registered User
    Join Date
    Jul 2007
    Posts
    2
    I'm pretty sure now, that constantly calling cin for every character (8 bytes) caused a lot of unnesseccary overhead. I read the data files all at once, and they are now processed almost as fast as they are generated. =)

    I modified code to copy a file and here is the result:

    Code:
    // Copy a file
    #include <fstream>
    using namespace std;
    
    int main () {
    
      char * buffer;
      char * buffer2;
      long size;
      int i = 0, j = 0;
    
      ifstream infile ("test.txt",ifstream::binary);
      ofstream outfile ("new.txt",ofstream::binary);
    
      // get size of file
    infile.seekg(0,ifstream::end);
      size=infile.tellg();
      infile.seekg(0);
    
      // allocate memory for file content
    buffer = new char [size];
    buffer2 = new char [size/2];
    
      // read content of infile
    infile.read (buffer,size);
    
    
    
      // filter every other byte
    
    while(i != size)
      {if((i&#37;2) < 1)
        {buffer2[j] = buffer[i];
        i++; j++;
        }
    
      else if((i%2) >= 1)
        {i++;}
    
      else {exit(-1);}
      }
    
    
    
      // write to outfile
    outfile.write (buffer2,size/2);
      
      // release dynamically-allocated memory
    delete[] buffer;
    delete[] buffer2;
    
      outfile.close();
      infile.close();
      return 0;
    }

  5. #5
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Code:
        {buffer2[j] = buffer[i];
    you'll overrun buffer2 because it's only half the size of buffer.
    Last edited by robwhit; 07-23-2007 at 03:03 PM.

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by yarft View Post
    I'm pretty sure now, that constantly calling cin for every character (8 bytes) caused a lot of unnesseccary overhead. I read the data files all at once, and they are now processed almost as fast as they are generated. =)
    Yep. You've discovered the good old speed-memory tradeoff. You gain speed, but use a lot more RAM.

    Buffered I/O layers are a spectrum, with completely unbuffered calls to the OS at one end, and completely buffered input (what you've implemented) at the other. iostreams is somewhere in between. Calling istream::get() repeatedly is much more efficient that calling the OS file reading function repeatedly, but not as efficient as simply reading the whole chunk in one shot.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Changing header data in binary file
    By neruocomp in forum C Programming
    Replies: 8
    Last Post: 11-14-2008, 07:30 PM
  2. Bitmasking Problem
    By mike_g in forum C++ Programming
    Replies: 13
    Last Post: 11-08-2007, 12:24 AM
  3. How to write image data to binary PGM file format(P5)?
    By tommy_chai in forum C Programming
    Replies: 6
    Last Post: 11-03-2007, 10:52 PM
  4. Binary comparison
    By tao in forum Windows Programming
    Replies: 0
    Last Post: 06-28-2006, 12:10 PM
  5. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM