Thread: flushing the file io?

  1. #1
    Linguistic Engineer... doubleanti's Avatar
    Join Date
    Aug 2001
    Location
    CA
    Posts
    2,459

    flushing the file io?

    Hello,

    I was wondering when it is appropriate to flush the file IO? I have a program which does it every 10 seconds along with a progress update. The problem is that when the flushes are called (and it's generating about 200 k / s), it takes quite a while.

    The loop goes something like this:

    processing (parsing) in memory
    flush

    and this happens every ten seconds.

    Seems to be that about half or two thirds of the time is spent in processing and a third of the time or so. Again it is generating a lot of data, however I don't see why the computer doesn't just send a signal to the file IO subsystem for the flush and continue processing right away.

    By the way, I'm doing this in Java (please don't ask why, for development purposes this was a much better decision), and am using FileWriter.

    Thanks, and nice to back here with some programming questions for once.
    hasafraggin shizigishin oppashigger...

  2. #2
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    The answer depends upon what you want to do with the data. Is it absolutely necessary that the data be flushed ASAP? If not, just write it and worry about flushing at the end.

    What you could do is wrap the FileWriter with a BufferedWriter, which is recommended:

    http://java.sun.com/j2se/1.5.0/docs/...redWriter.html

    In general, a Writer sends its output immediately to the underlying character or byte stream. Unless prompt output is required, it is advisable to wrap a BufferedWriter around any Writer whose write() operations may be costly, such as FileWriters and OutputStreamWriters. For example,

    PrintWriter out
    = new PrintWriter(new BufferedWriter(new FileWriter("foo.out")));

    will buffer the PrintWriter's output to the file. Without buffering, each invocation of a print() method would cause characters to be converted into bytes that would then be written immediately to the file, which can be very inefficient.
    Although oddly enough, it seems that the FileWriter has its own internal buffering (or perhaps the underlying system I'm on is buffering the output of the FileWriter). I don't pretend to understand what's going on under the hood in this case. I know it seems incorrect to buffer what appears to be an already buffered stream, but for the sake of portability, you might want to consider doing this. In addition, you can specify the size of the buffer that a BufferedWriter will use, so you have some more control over the specifics of when it'll flush.
    Last edited by MacGyver; 04-22-2007 at 11:37 PM.

  3. #3
    Linguistic Engineer... doubleanti's Avatar
    Join Date
    Aug 2001
    Location
    CA
    Posts
    2,459
    Thanks for the quick reply MacGyver. No clue why my message was moved to the Tech Board (I'm a mod myself), but as it were...

    On closer inspection of my dusty research project, I did wrap it in a buffered writer. Since I do get about 200 k / sec, and my buffer I use is about 5 megs, it should be okay and it seems like if I flush it every 10 seconds, it's faster than the buffer fills up. I took out those lines to do the flush and the throughput seems to have increased, but I think I'm going to look into this more. Thanks, and I hope to hear other ideas.
    hasafraggin shizigishin oppashigger...

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Wouldn't there be a way to find out the optimal time at which to flush the buffer? If not 10 seconds, then perhaps some longer time, e.g., 25 seconds (5000K/200K), or when it is done, whichever comes first.

    I suspect that whoever moved this thread did so as it is Java specific rather than on C++.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Linguistic Engineer... doubleanti's Avatar
    Join Date
    Aug 2001
    Location
    CA
    Posts
    2,459
    That's true. I recall that the reason I flushed buffers manually was because some files that I write to for statistics on the main data are really small in comparison to the main data files themselves.

    Well, for now I just took out the flush, assuming that Java probably knew better'n I did when to flush it. But yeah you're right I should be able to linearly predict when to flush it.

    The reason this is so critical is because it runs on a gigantic data set and as it stands I have to break up the data set so that it doesn't output one huge file. The file it generates from the data is about 60 gigs depending on the settings and I have it so that it breaks it up into 64 meg chunks, a lot of files, yes, but you can easily browse them for data validation, which also is a pain.

    It's a parser for English, and works on a corpus of 6 million sentences. And it takes days to run! =( Anyone else?

    Thanks laserlight!
    hasafraggin shizigishin oppashigger...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. gcc link external library
    By spank in forum C Programming
    Replies: 6
    Last Post: 08-08-2007, 03:44 PM
  2. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 04:54 PM
  3. Batch file programming
    By year2038bug in forum Tech Board
    Replies: 10
    Last Post: 09-05-2005, 03:30 PM
  4. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  5. File IO with .Net SDK and platform SDK
    By AtomRiot in forum Windows Programming
    Replies: 5
    Last Post: 12-14-2004, 10:18 AM