Thread: Is fork() a resource hog?

  1. #1
    Registered User HalNineThousand's Avatar
    Join Date
    Mar 2008
    Posts
    43

    Is fork() a resource hog?

    While reading up on fork(), I read one source that said that essentially the current process was duplicated. The wording made me think that, essentially, a duplicate of the current program was created in memory. I would think that would mean if I have one main process and fork(), that the child process is an exact image of the parent, but in a different memory location, which would mean every time a process forks, it'll take twice the memory it did before since it has to duplicate itself.

    Is this right? If so, I would think fork() would be a function to avoid when something else can be done instead, such as pthreads().

    Does fork() create an entire image of the parent thread, taking up as much in memory and resources?

    Thanks!

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by HalNineThousand View Post
    While reading up on fork(), I read one source that said that essentially the current process was duplicated. The wording made me think that, essentially, a duplicate of the current program was created in memory. I would think that would mean if I have one main process and fork(), that the child process is an exact image of the parent, but in a different memory location, which would mean every time a process forks, it'll take twice the memory it did before since it has to duplicate itself.
    In ancient times, that was true. These days we have virtual paging and copy-on-write. A forked copy of a process shares every single page with its parent, until either one of them tries to write to the page, at which time it is duplicated into both processes.

    So short answer, fork() is not a resource hog. However, it takes some time to perform its work (esentially copying all resource handles as well as the parent process's page table into the new process).

  3. #3
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    fork() is never a function to avoid. fork() is a function to use when you need to create a new process. When you don't need to create a new process, you don't use fork().

    That said, if you want to execute tasks in parallel, multithreading is usually a better choice than forking off child processes.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  4. #4
    Registered User HalNineThousand's Avatar
    Join Date
    Mar 2008
    Posts
    43
    I prefer threading, but I found, from an earlier question, that fork() is needed in creating a daemon. Now I'm working on a program that will do some audio recording, so it will execute a command for the encoding program (I'm not about to get into audio encoding at this point, I know that would be a large field to cover and no matter what I include, someone will whine about a codec not supported or some compression issue or something else). Then, when the recording is done, it'll kill that task. I can do that with fork() but not with threads since anything after a successful exec() command won't be executed.

    I just wasn't sure, after what I had read, if each fork() would basically double the resource usage of the program.

    Thanks, both of you, for the replies and help.

  5. #5
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I was experimenting with fork() last week. Pretty interesting critter. It seems to duplicate EVERYTHING.

    I had malloc'ed some storage prior to the fork. Then, in each process, parent and child, I used and freed the storage and it worked just fine.

    Also, the addresses obtained for the same resources after the fork, were identical as well.
    Mainframe assembler programmer by trade. C coder when I can.

  6. #6
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    In the worst case, at least the code itself is shared among processes, but in practice, linux is designed to play the dirty tricks with memory needed after fork() and even malloc().

    The kernel just pretends the memory is there, but in truth... well, something like living in The Matrix
    And 'OOM Killer' is something like Agent Smith.

    These two links will tell you about it more than you ever wanted to know:

    1. http://ubuntuforums.org/showthread.php?t=617256
    2. http://www.linuxdevcenter.com/pub/a/...of-memory.html

    Igor
    Last edited by idelovski; 04-15-2008 at 07:33 AM.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Right, one of the tricks is "Copy-on-write". What happens in fork() is that the entire page-table of the parent process is copied to the child process, but in "read-only" mode. For code, this will remain so until the process either does exec() [which is a very common scenario] or exit(). For data, whenever a piece of data is written to by the child or parent, it gets duplicated before the actual write is allowed through - that means that the stack, heap and anything else is exactly the same in both processes until such a point that it's written to any local data.

    It's quite resource hungry to copy large amounts page-tables of code or data: one page table-entry (4 or 8 bytes) per 4KB, and one more for every 2MB, plus one 1GB. Note that the code&data may use many different 2MB section, even if the total usage is no more than 1MB (for example, code may be in one 2MB section, data in another, stack in another and heap in a fourth section of 2MB).

    In the old days, before "copy-on-write" was available to the kernel developers, the whole code and data sections had to be copied directly - that's VERY expensive.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by matsp View Post
    In the old days, before "copy-on-write" was available to the kernel developers, the whole code and data sections had to be copied directly - that's VERY expensive.
    To solve this problem the designers of BSD invented vfork() which copied all resources EXCEPT the process data area, which was directly shared by both processes, under the assumption that the forking process is going to immediately exec() another program (in which case, copying the data area would be a waste of time anyway). But since copy-on-write was developed, vfork() has become moot and is the same as fork().

    If you vfork()'d and broke the rule (the rule being "call exec() immediately") you ran the risk of corrupting the parent process's memory space.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by brewbuck View Post
    If you vfork()'d and broke the rule (the rule being "call exec() immediately") you ran the risk of corrupting the parent process's memory space.
    Which makes for those nice hard to debug bugs, where sometimes it works, other times not, depending on your luck and timing of the two processes. Faster, but dangerous indeed.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Sorting out a resource manager
    By psychopath in forum Game Programming
    Replies: 1
    Last Post: 11-10-2008, 07:12 PM
  2. CreateProcess with Resource of executable, not the Filename
    By Ktulu in forum Windows Programming
    Replies: 4
    Last Post: 11-04-2006, 01:07 AM
  3. Generic Resource_Manager WIP with lots TODO
    By Shamino in forum C++ Programming
    Replies: 19
    Last Post: 02-01-2006, 01:55 AM
  4. fork(), exit() - few questions!
    By s3t3c in forum C Programming
    Replies: 10
    Last Post: 11-30-2004, 06:58 AM
  5. Serial Communications in C
    By ExDigit in forum Windows Programming
    Replies: 7
    Last Post: 01-09-2002, 10:52 AM