file IO speed over network

This is a discussion on file IO speed over network within the C++ Programming forums, part of the General Programming Boards category; Bit of a dufus question here, but just in case... I have a program that uses a lot of file ...

  1. #1
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425

    file IO speed over network

    Bit of a dufus question here, but just in case...

    I have a program that uses a lot of file io, often working with files of many thousands of lines, it runs a lot slower over the local network though.

    My question is, is this loss of speed entirely down to network speed itself, or is there anything i could build in to improve performance when running the application from a shared drive?
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Your typical hard-drive has a transfer rate of ~1000 MBits per second, whereas ethernet is usually 10 or 100 MBits/sec max. An megabit is about 1 kilobyte [correction, a megabit is about 122 kB].
    Last edited by MK27; 09-02-2011 at 09:15 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    May 2010
    Posts
    2,742
    My question is, is this loss of speed entirely down to network speed itself, or is there anything i could build in to improve performance when running the application from a shared drive?
    This really depends on how you are accessing the files. Reading a file character by character is slower than reading a large section of the file at a time. So you may be able to improve speed by loading the file, or parts of the file, into a large buffer and working with the buffered data.

    Jim

  4. #4
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    You could also cache the files locally in memory or on disk. Check the last modified timestamp of the file(s) on the network share at some interval, and if newer update the cache.
    Quote Originally Posted by MK27 View Post
    An megabit is about 1 kilobyte.
    1 MBit is 125 or 128 KByte depending on if you count Kilo as 1000 or 1024
    Last edited by _Mike; 09-02-2011 at 08:40 AM. Reason: misspelled count as could :/

  5. #5
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by _Mike View Post
    1 MBit is 125 or 128 KByte depending on if you could Kilo as 1000 or 1024
    Actually we're both wrong, lol. A megabit (100000 bits) is 12500 bytes, which is 12.2 kB (using a 1024 byte kB).

    Quote Originally Posted by jimblumberg View Post
    Reading a file character by character is slower than reading a large section of the file at a time.
    And that difference will probably be compounded on a network, presuming each read requires some kind of in-protocol packet exchange as overhead, which is going to be a number of bytes.

    Ie, if you read one byte at a time and there are 9 bytes of packet overhead per request, your 100 MBits/sec connection will be 10 Mbits effectively. If you are reading a line at a time, the lines are ~40 bytes, and the overhead is 8 bytes, you will get at most 80% of the network maximum.

    If the files are not huge, stat() them and read the entire thing in at once. Do not use getline().
    Last edited by MK27; 09-02-2011 at 08:24 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #6
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Quote Originally Posted by MK27 View Post
    Actually we're both wrong, lol. A megabit (100000 bits) is 12500 bytes, which is 12.2 kB (using a 1024 byte kB).
    You're missing a 0 in mega And if you use SI-units for mega then you'd have to use it for kilo as well.
    1 Byte is 8 bit.
    Using SI-units for kilo and Mega; 1 Mbit is 1000 kbit. 1000/8 = 125
    Using power-of-two units; 1 Mbit is 1024 kbit. 1024/8 = 128

  7. #7
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    Do not use getline().
    This is what i was thinking, could you outline any alternative? I can get filesize, read into a buffer, but how to extract the lines after that? as i need to provide a rowcount as part of the reporting
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  8. #8
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by _Mike View Post
    You're missing a 0 in mega
    Yeah, more than once since I did that on a calculator too. I need a less mathematically impaired brain. :/

    Quote Originally Posted by rogster001 View Post
    This is what i was thinking, could you outline any alternative? I can get filesize, read into a buffer, but how to extract the lines after that? as i need to provide a rowcount as part of the reporting
    Stringstreams also have a getline() function. Read into string buffer, apply stringstream, and all your current parsing code can work with that.

    To be honest though, I doubt that will make much difference in terms of getting the network speed to better match the speed of a hard drive -- but it is probably a more polite use of the network & equipment. Imagine if you had three computers reading three different files off the same hard drive at the same time, one line at a time.
    Last edited by MK27; 09-02-2011 at 09:07 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  9. #9
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by CommonTater View Post
    Mega = 1024 kilo.
    Power of two is only used with regard to memory, not network payloads. So mega- WRT to network speeds is always 1 000 000, not 1 048 576 (1024 kB or 1 MB of memory).

    But since the program using the network places the data in memory, I would say
    there are 1 000 000/8/1024 ~= 122 memory kB in a network megabit.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by CommonTater View Post
    At least on windows the filesize reported from the network is powers of 2... and since that's the size of the buffer you need intermediate math, regardless of how well contrived, isn't going to be very helpful.
    That may well be, because the system wants to be consistent in reporting file sizes and those should be measured in memory units. But WRT to networking, a megabit is considered to be exactly 1 000 000 bits. Not 1024 kB (which is 5% more). So when you see speeds reported as "100 Mbits/sec", that's what it is -- there is no ambiguity.

    Why do people always have to complicate things so much... You get a file size of 12412322 you need a buffer size of 12412322... and, guess what, they designed it that way for our convenience.
    Yeah, a byte is a byte is a byte. I was just being a stickler.
    Last edited by MK27; 09-02-2011 at 09:55 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    thanks all for the suggestions, will get something worked in as a test.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  12. #12
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by [
    commontater[/]]
    Do you really think it helps the op when he asks a question then gets 30 messages filled with us arguing trivialities amongst ourselves?
    ROTFL!! This is like water calling the rain wet.


    I agree it's trivial, but it takes two to tango.

    [few minutes later: evidentially someone decided it was best to go back and delete all his previous posts...for more insight into someone's pathology see: this post, if someone doesn't decide to erase that one too, lol]
    Last edited by MK27; 09-02-2011 at 11:01 AM. Reason: further insanity
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by MK27 View Post
    ROTFL!! This is like water calling the rain wet.


    I agree it's trivial, but it takes two to tango.
    There... you win... happy now?

  14. #14
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    Well in conclusion here i tried a buffered version and effectively its the same speed over the network, maybe a little quicker. I ended up using the char buffer directly rather than via an istringstream as it seemed to be taxing the memory of my pc and going a bit bonkers otherwise.

    The data was a .csv of
    file size 494mb!
    1.7million rows
    Once the data is in the buffer then it outputs the data in 2 or 3 seconds, but the loading time prior to this is about 56 seconds, so all told i think this is about the same as my original version just looping until ifstream eof and getline direct from ifstream. Obviously the buffer version is much better if i was to go on and do some parsing or needed to reread etc.

    Here is the relevant bit of code i decided to use:

    Code:
       
    //opened file ok....
    //..
    
                cout << "\n\nLoading file..\n";
                int length = 0;
                inFile.seekg (0, ios::end);
                length = inFile.tellg();
                inFile.seekg (0, ios::beg);
    
                char* buffer = NULL;
                buffer = new char[length];
                inFile.read(buffer, length);
                inFile.close();
    
                cout << "\n\nCounting rows..\n";
    
                int count = 0;
                for(int i = 0; i < length; i++)
                {
                    if(buffer[i] == '\n')
                    count++;
                }            
                cout << "\n\nTotal Rows: " << count << "\n\n";
                
                delete[] buffer;
    Last edited by rogster001; 09-05-2011 at 05:55 AM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  15. #15
    Nor
    Nor is offline
    h ֆhr s Nor's Avatar
    Join Date
    Nov 2001
    Posts
    299
    you have a bottle neck in the code you shown,
    that will effectly slow your application down,
    as well as use a lot more memory than is required.

    If your sending small files(under a meg) then you have a good sniplet here.
    if your sending very large files(over 100 megs) then try something like this.

    Your loading they entire file at once,
    Counting all the '\n' 's
    then sending it accross the network.

    So say your file load time is 5 seconds.
    your counting time is next to none,
    and your transfer time is 15 seconds.

    that will be 20 seconds to complete a request.

    You could knock that time down to 15+ seconds easly by buffering your input in a different manner.

    i've added // comments // to what I added
    Code:
       
    //opened file ok....
    //..
                #define blocksize 1048576 // One Megabyte, 1024*1024 //
                cout << "\n\nLoading file..\n";
                int length = 0;
                int count = 0; // moved this to before the loop //
                inFile.seekg (0, ios::end);
                length = inFile.tellg();
                inFile.seekg (0, ios::beg);
    
                char* buffer = NULL;
                buffer = new char[blocksize];
                int read; // I'm not sure if inFile.read will return a read count, because I don't know the object type // 
                while( !inFile.eof() ){ // loop till eof is found //
                    read = inFile.read(buffer, blocksize);
                    cout << "\n\nCounting rows..\n";
                    
                    for(int i = 0; i < read; i++)
                    {
                        if(buffer[i] == '\n')
                            count++;
                    }            
                    //your network send code.
                }  
                inFile.close();  
                delete[] buffer; // I do belive this is wrong, shouldn't it be 'delete buffer'??? please someone comment
                cout << "\n\nTotal Rows: " << count << "\n\n";
    Try to help all less knowledgeable than yourself, within
    the limits provided by time, complexity and tolerance.
    - Nor

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. to receive image file through network port
    By nadeem athani in forum Networking/Device Communication
    Replies: 1
    Last Post: 03-03-2011, 07:50 PM
  2. Speed of pointers vs. speed of arrays/structs
    By Kempelen in forum C Programming
    Replies: 32
    Last Post: 06-27-2008, 10:16 AM
  3. file transfer across network
    By kris.c in forum Networking/Device Communication
    Replies: 6
    Last Post: 06-17-2006, 01:08 PM
  4. Network File Copy in DR DOS
    By daron in forum C Programming
    Replies: 3
    Last Post: 09-30-2005, 01:49 AM
  5. File I/O on Network drive, Quick Question
    By kransk in forum C++ Programming
    Replies: 3
    Last Post: 11-24-2003, 08:17 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21