Thread: Binary File Manipulation Speeds

  1. #1
    Registered User
    Join Date
    Apr 2006
    Posts
    6

    Binary File Manipulation Speeds

    Hello, I'm currently trying to write my own hex editor, however, I've noticed that my program is horribly slow at loading files. Anything in the range of 500kb is fine, but afterwards in the 1MB range, the delay to the completion of the whole operation is noticable in seconds, e.g a 4.5 MB file takes about 12 seconds to process on a good run. I know this is a problem, because I've used other hex editors that will load a 200+MB file in less than that time.

    I'm pretty sure this is due to the fact that my program reads the file byte by byte and performs an alogrithm on each byte to convert them into hexadecimal format. Currently, I'm reading in one unsigned char per byte in the file, them calling my function to convert that into a string object. I've tried reducing the number of reads, such as in reading 16 bytes at a time, but that still won't reduce the number of times I need to call the function to convert it to hex, and thus far, has not been able to produce any noticable gains in speed.

    I've cut out all of the GUI lines of code and altered the following code to work as a basic command line representation of only reading in the file, which is where I'm having bottleneck problems. I've been trying to optimize this section for days, but I can't figure out how it could be done. I'm sure there's something efficient that I should be doing, but I'm horribly unaware of it.

    Code:
    #include <iostream>
    #include <fstream>
    
    using namespace std;
    
    /* Convert Int To Hex String */
    string umulti_base(int input1, int input2) {
    
    /* Temporary Function Variable */
    string ans = "", bit = "0123456789ABCDEF";
    int incr = 1, value = input1, range = input2, count = 0;
    
    /* Grab Highest Power On Base */
    while((incr * range) <= value) {
    incr *= range;
    ++count;
    }
    
    while(count >= 0) {
    
    for(int x = range; --x >= 0;) {
    
    if((value - (incr * x)) >= 0) {
    ans += bit[x];
    value -= (incr * x);
    break;
    }
    
    }
    
    incr /= range;
    --count;
    }
    
    return ans;
    }
    
    int main() {
    
    /* Variables */
    string file_name;
    unsigned char mem;
    ifstream file;
    int file_size = 0, b = 0, e = 0, file_get = 0;
    
    /* Get File Name Input */
    cout<<"Enter File Name : ";
    cin>>file_name;
    cin.ignore();
    
    /* Open File */
    file.open(file_name.c_str(), ios::binary);
    
    /* If File Can Be Opened */
    if(file.is_open()) {
    
    /* Get File Size & Reset File Pointer */
    file.seekg(0, ios::beg);
    b = file.tellg();
    file.seekg(0, ios::end);
    e = file.tellg();
    file_size = e - b;
    
    file.clear();
    file.seekg(0, ios::beg);
    
    /* Cycle Through File Byte By Byte & Convert To Hexadecimal */
    for(int x = -1; ++x < file_size;) {
    
    /* Read 1 Byte Into Char */
    file.read((char*)(&mem), 1);
    
    /* Convert to Hex */
    cout<< umulti_base((int)mem, 16) << "\t";
    }
    
    cout<< "\nTask Complete \n";
    }
    
    /* If  File Cannot Be Opened */
    else {
    cout<<"File Could Not Be Opened \n";
    }
    
    return 0;
    }
    Sorry about the lack of indentation, I don't usually use it. umulti_base is the function that manually converts an integer into it's corresponding hex value as a string. The first parameter is the actual integer to be converted and the second integer is the conversion base. Any suggestions on how this can be more efficient?

  2. #2
    The larch
    Join Date
    May 2006
    Posts
    3,573
    What if you just used the std::hex manipulator to output numbers as hex?
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  3. #3
    Registered User
    Join Date
    Apr 2006
    Posts
    6
    Thanks for the reply. I guess my example code would have lead one to think that what with me using cout to output the string, my bad. Unfortunately, with the addition of all my GUI cruft (FLTK) I need to pass a value to a function that is const char*, hence I need the hex value represented as a string first and then convert it with c_str(). I guess I should have mentioned that beforehand, sorry.

    I have, however, looked into string streams and I have played around with them. Would using this constitute less overhead than my own umulti_base()? I'm also wondering if in fact there is a more efficient way of opening files into memory than what my code has. I'll try using string streams in addition to the std::hex manipulator for now to see if that lessens things up.

  4. #4
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> umulti_base

    I would avoid a general-purpose function in this case. In fact, don't use a function at all (or inline it) to cut down on the overhead of invoking it. Besides that, you're creating a temorary string object in a tight loop, which is only going to slow things down.

    >> I'm pretty sure this is due to the fact that my program reads the file byte by byte

    If memory allows, read the entire file into a buffer. If that's out of the question then consider using memory-mapped files (very OS-specific, of course).

    A very fast way to convert a byte to hex is to use a lookup table:

    Code:
    void print_bytes( char* ptr, size_t length )
    {
    	static char const table [ ] =  
    	{
    		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
    	};
    	
    	for( char* end = ptr + length; ptr != end; ++ptr )
    	{
    		cout << table[ ( *ptr >> 4 ) & 0xf ];
    		cout << table[ *ptr & 0xf ];
    	}
    }
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  5. #5
    Registered User
    Join Date
    Apr 2006
    Posts
    6
    >> I would avoid a general-purpose function in this case. In fact, don't use a function at all (or inline it) to cut down on the overhead of invoking it.

    I had figured as much, but I was really sure about inlining the function, but I guess that's going to have to be the case. Makes sense when you put it like that.

    >> If memory allows, read the entire file into a buffer. If that's out of the question then consider using memory-mapped files (very OS-specific, of course).

    I had also considered memory mapping, but because most methods are OS-specific, I didn't want to go that route, I'm trying to keep the code as OS agnostic as possible. The files I'm having problems with aren't that large anyway, you can count the MBs with the fingers on your hands. This is for an open-source project I just started, so I'll worry about files > 2GB later on. Reading the whole file into a buffer is something I can do right now however, and I'll definitely take your advice. Thanks very much for the loopup table, I think that's the thing that I need.

    EDIT : Okay, I finally got some decent speed! Many thanks Sebastiani, all of your tips helped reduce so much overhead. Converting everything to hex now no longer takes 99% of my CPU and the operation is almost instant whereas I was counting in seconds earlier. You can expect to get special thanks credit in the next release of the program. Again, thanks so much!
    Last edited by Shonumi; 04-19-2009 at 06:46 PM.

  6. #6
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    You're welcome.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  2. Replies: 3
    Last Post: 03-04-2005, 02:46 PM
  3. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  4. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM
  5. what does this mean to you?
    By pkananen in forum C++ Programming
    Replies: 8
    Last Post: 02-04-2002, 03:58 PM