# I/O Rates - MBs per second - what's good?

This is a discussion on I/O Rates - MBs per second - what's good? within the C++ Programming forums, part of the General Programming Boards category; I've written a C++ program to convert data from one data format to another. My benchmark program reads 136MB of ...

1. ## I/O Rates - MBs per second - what's good?

I've written a C++ program to convert data from one data format to another. My benchmark program reads 136MB of binary data, converts each record and writes it to a text file.

My first cut ran for 42 seconds. Not too impressive. I reviewed my logic and realized I was passing whole structures between high-use functions instead of pointers (or references). I changed that and the process time dropped to 21 seconds. Better.

To read the file, I was using istream.read(), but I was writing each line (of text) with ostream << text << endl. I rewrote it to buffer the data up and then do a ostream.write() and now the elapsed time is down to 11 seconds. For the numbers of records I'm processing, these are the rates:

Input Records = 1,066,766
Input file size: 136 MB
Output text file size: 184.1MB

Code:
```elapsed: 42 seconds
rate: 136MB / 42 = 3.238 MB per second

elapsed: 21 seconds
rate: 136 MB / 21 = 6.476 MB  per second

elapsed: 12 seconds
rate: 136 MB / 11 = 12.363 MB per second.```

If I remove the conversion logic, and just read the binary data, don't touch or move it, don't also write a 184 MB text file, and then write it back out from the same input buffer, it takes 2 seconds, for an I/O rate of 68MB per second.

Would you say the 12 MB per second rate I've reached is reasonable?

Thanks, Todd

2. This is definitely one of those things where you probably have "picked off the low-hanging fruit" - the very easy optimizations have been done.

You should use a profiler to figure out what parts of the code take up most of the time, and then it's a "seat of pants" decision whether you can optimize that more or not.

--
Mats

3. You should be able to get much closer to that 2 sec. / 68 MB/s by using asynchronous I/O. This will allow you to get work done while the OS/disk is busy doing work.

The down side is that it won't be std::fstream based, so you'll loose portability etc, etc.

Multiple threads with std::fstream is another option, but it probably won't yield as much speed up as asynchronous I/O in a single thread.

gg

4. lol! Just as I was reading your post (matsp) , I realized I could do another optimization in my translate table (from EBCDIC to ASCII). Your "low hanging fruit" keyed me into it. I asked myself where the program most likely spends most of its time, and the answer surely has to be in the character conversion routine. So, I initialized my translate table a bit differently, which allowed me to remove the conditional logic from the char_convert routine. I just reran it and now I'm at 10 seconds.

I'll look into a profiler. Never used one before. Thanks for the suggestion.

Todd

5. Originally Posted by Codeplug
You should be able to get much closer to that 2 sec. / 68 MB/s by using asynchronous I/O. This will allow you to get work done while the OS/disk is busy doing work.

The down side is that it won't be std::fstream based, so you'll loose portability etc, etc.
..
gg
Well, unfortunately, I need the portability.

I do have some functions that use rather long parm lists, and they are called a lot (a lot = for every record). I'll look into shortening these parm list down to bare minimum.

Todd

6. >> I'll look into shortening these parm list down to bare minimum.
Play with your profiling tools first

gg

7. Also, post (the principle of if it's long) your EBCDIC to ASCII code.

--
Mats

8. Originally Posted by Codeplug
>> I'll look into shortening these parm list down to bare minimum.
Play with your profiling tools first

gg
Profiling gives you hard, indisputable data, but it's possible to find hot spots through careful analysis of the source code as well.

Before using a profiler or other debugging tool you should try to form your own idea of what you are going to discover beforehand. My first boss and I used to take bets about what we were going to see happen during debugging sessions.

9. Here's one version of the table. I have two tables - a 7-bit ASCII table and a UTF-8 table. Heres the 7-bit table. (a global var - char[256])

Code:
```void init_translate_table_ascii7() {
int i, j ;
// Initialize the ascii translate table
for (i=0 ;  i < sizeof(ascii_chars) ;  ++i ) ascii_chars[i] = i ;  // set each char to itself

// Initialize the ebcdic table with the ascii character set
j = (int) '0' ;
for (i = 0xF0 ; i <= 0xF9 ; ascii_chars[i] = j , i++, j++ ) ;   // put 0-9  in F0 through F9
j = (int) 'A' ;
for (i = 0xC1 ; i <= 0xC9 ; ascii_chars[i] = j , i++, j++ ) ;   // put A-I  in C1 through C9
for (i = 0xD1 ; i <= 0xD9 ; ascii_chars[i] = j , i++, j++ ) ;   // put J-R  in D1 through D9
for (i = 0xE2 ; i <= 0xE9 ; ascii_chars[i] = j , i++, j++ ) ;   // put S-Z  in E2 through E9
j = (int) 'a' ;
for (i = 0x81 ; i <= 0x89 ; ascii_chars[i] = j , i++, j++ ) ;   // put a-i  in 81 through 89
for (i = 0x91 ; i <= 0x99 ; ascii_chars[i] = j , i++, j++ ) ;   // put j-r  in 91 through 99
for (i = 0xA2 ; i <= 0xA9 ; ascii_chars[i] = j , i++, j++ ) ;   // put s-z  in A2 through A9

ascii_chars[0x40] = ' ' ;   ascii_chars[0x4B] = '.' ; 	ascii_chars[0x4C] = '<' ;
ascii_chars[0x4D] = '(' ; 	ascii_chars[0x4E] = '+' ; 	ascii_chars[0x4F] = '|' ;

ascii_chars[0x50] = '&' ; 	ascii_chars[0x5A] = '!' ; 	ascii_chars[0x5B] = '\$' ;
ascii_chars[0x5C] = '*' ;  	ascii_chars[0x5D] = ')' ; 	ascii_chars[0x5E] = ';' ;
ascii_chars[0x5F] = '^' ;

ascii_chars[0x60] = '-' ; 	ascii_chars[0x61] = '/' ;   ascii_chars[0x6A] = '|' ;
ascii_chars[0x6B] = ',' ; 	ascii_chars[0x6C] = '&#37;' ; 	ascii_chars[0x6D] = '_' ;
ascii_chars[0x6E] = '>' ; 	ascii_chars[0x6F] = '?' ;

ascii_chars[0x79] = '`' ; 	ascii_chars[0x7A] = ':' ; 	ascii_chars[0x7B] = '#' ;
ascii_chars[0x7C] = '@' ; 	ascii_chars[0x7D] = '\'' ;  ascii_chars[0x7E] = '=' ;
ascii_chars[0x7F] = '"' ;

ascii_chars[0xA1] = '~' ;

ascii_chars[0xBD] = ']' ;

ascii_chars[0xC0] = '{' ;

ascii_chars[0xD0] = '}' ;

ascii_chars[0xE0] = '\\' ;

// These next couple lines are "data forgiveness" lines.  They convert non-displayable ebcdic to chars to ascii blanks.
ascii_chars[0x00] = ' '  ;                                    // Translate Binary zero to a blank
//for ( i = 0x01 ; i < 0x40 ; ascii_chars[i] = ' ' , i++ ) ;  // Convert ebcdic 0x01 - 0x3F to a blank.
}```

10. And here's the actual code for picking up the translated value:

Code:
```		if (!pic->b_numeric) {
n = offset+len ;                         // Use "n" to keep the calculations out of the "for" loop condition.
for (int k = offset ; k < n ; ++k ) {

*outbuf++ = ascii_chars[binary_data[k]] ;   // Get the translated input character.
}

}```

11. Ok, I can't think of any way to improve on a global array[256] - it should be as fast as it can be.

--
Mats

12. The first thing to time is just read the file and write the file, without any transformation of the data.

If that takes say 10 of the 11 seconds you're seeing, then you're not going anywhere. The entire program is I/O bound and there's not a lot you can do about that within your code.

13. Originally Posted by Salem
The first thing to time is just read the file and write the file, without any transformation of the data.

If that takes say 10 of the 11 seconds you're seeing, then you're not going anywhere. The entire program is I/O bound and there's not a lot you can do about that within your code.
I think he said it takes 2 seconds when he did that.

14. So I see (now).

15. I'm reading up on Shark and will probably use it for profiling since I'm developing on a Mac under XCode. (Tiger)

Thanks for all the feedback. I'm not going to make any more changes until I profile it.

Todd

Page 1 of 2 12 Last