Thread: Using Copy-on-write For Fast Copy?

  1. #1
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273

    Using Copy-on-write For Fast Copy?

    Hello,

    I have an interesting quandry that I can't find any concrete discussion about, so hopefully someone here can help:-

    I'm trying to organise inter-process communication (Win32, although the mechanics are internally similar to Linux), between a process that creates image data, and one or more processes that read from it. The image data is fairly substantial (taking up around 8 MB) and is ideally updated several times a second.

    There needs to be synchronisation between the processes so that the readers aren't currently reading when the writer wants to update. With this in mind, I would like the readers to be able to pull information as fast as possible.

    At the moment, the data is shared between the processes using a shared memory section (CreateFileMapping/MapViewOfFile). The pages are using PAGE_READWRITE protection semantics, but I was thinking that if I change this to PAGE_WRITECOPY, then do dummy writes from each reader process, they would very quickly receive a private copy of each page of the data. This may be faster than using memcpy() myself.

    So, can anyone explain what happens when a process writes to a page using copy-on-write semantics? Does the (NT) kernel still do VirtualAlloc for a new page and then memcpy() in kernel mode, or does it do something cleverer and faster?

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Do the readers need to modify the chunk of data produced by the producer? Or is that chunk read-only while the readers create something new from it?

    gg

  3. #3
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    It just needs to be read. The reader processes more-or-less push it onto the network verbatim. The writer is not network-aware.

    Thinking about this during those epiphanal moments you get while riding the bus, PAGE_WRITECOPY is like a one-shot trigger, so once the copy of the page is made by each reader process, I would need the writer to tear down the shared section and create another in order for it to trigger again, and notify the reader processes of the new handle... :/

    I also spent some time digging into the modern Linux virtual memory manager, and when it does copy-on-write (in response to a write fault on what is otherwise a read-only page), it does a memcpy() specifically using SSE instructions on Intel. I'd expect clang (my compiler) to also think about doing this. The main difference would be that it would be done in kernel mode, so might be a teeny bit faster.

  4. #4
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    If readers only read, then I would avoid any copying.

    How is the data partitioned? Does each reader get a unique piece to work on? Or do all readers read and process all the same data?

    gg

  5. #5
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    The problem with working directly on the data would be that the writer process would be beholden to whatever's going on in the sockets code.
    It's all the same stuff (it's the display of an SDL application, I am trying to remote a virtual screen).

    I really should get round to running some tests on the mechanics of this, just need to get back to my desk

  6. #6
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    So each reader process is a remote view of the app? So should the writer thread never stall? Meaning that some remote readers may "miss" a frame due to the writer being faster than the readers?

    gg

  7. #7
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    My current design is that the writer process has its original display in private memory, then opportunistically copies to the shared memory section if it finds that no processes are currently reading (checking an event object). If they are, it sets a second event object indicating that no other readers should begin reading and then returns to whatever else it was doing. It will try again after it has performed another internal update.

    This will necessarily mean that the readers will at some point be "locked out" from updates. I think this is the most appropriate design for remoting a display, but will probably need a different design for remoting audio, where stalls are much more obvious.

  8. #8
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Does the writer share multiple frames with readers, or just one? I was trying to think of a design that can allow the writer to produce something even while readers are busy with the last frame produced. At least 2 frames in shared memory should allow some overlap of work between writer and producers.

    gg

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 11-20-2013, 01:34 PM
  2. Replies: 2
    Last Post: 10-13-2013, 07:36 PM
  3. Fast 1D to 2D Array Copy
    By Joe Joe in forum C++ Programming
    Replies: 4
    Last Post: 06-09-2011, 01:18 PM
  4. Fast memory copy and realloc()
    By intmail in forum Linux Programming
    Replies: 3
    Last Post: 10-27-2006, 03:46 PM
  5. FAST method to copy char array?
    By MitchellH in forum C++ Programming
    Replies: 4
    Last Post: 10-09-2005, 08:19 AM

Tags for this Thread