Thread: Architecture for multi-processing application in C: fork or fork + exec

  1. #1
    Registered User
    Join Date
    Feb 2014
    Posts
    3

    Architecture for multi-processing application in C: fork or fork + exec

    Hi all!

    My question is about more philosophical than technical issues.

    Before begin let me to describe my objective: multiprocess (not multithread) program with one "master" process and N "worker" processes. Program is linux-only, async, event-based web-server, like nginx. So, the main problem is how to spawn "worker" processes, that will be contain threads.


    In linux world there are two ways to create process:

    1). fork() (or clone(), or sys_clone(), not necessary in this case)

    2). fork() + exec*() family

    A short description for each way and what confused me in each of them.

    First way with fork() is dirty, because forked process has copy (...on-write, i know) of parent memory: signal handlers, variables, file\socket descriptors, environ and other, e.g. stack and heap. In conclusion, after fork i need to...hmm..."clear memory", for example, disable signal handlers, socket connections and other horrible things, inherited from parent, because child has access to a lot of data that he was not intended - breaks encapsulation, and many side-effects is possible.

    The general way for this case is run infinite loop in forked process to handle some data and do some magic with socket pair, pipes or shared memory for creating communication channel between parent and child before and after fork(), because socket descriptors reopen in child and used same socket as parent.

    Also, this is nginx-way: it has one executable binary, that use fork() for spawn child process.


    The second way is similar to first, but have a difference with usage one of exec*() function in child process after fork() for run external binary. One important thing is that exec*() loads binary in current (forked) process memory, automatic clear stack, heap and do all other nasty job, so fork will look like a clearly new instance of program without copy of parent memory or something other trash.

    There has another problem with communication establishing between parent and child: because forked process after exec*() remove all data inherited from parent, that i need somehow create a socket pair between parent and child. For example, create additional listen socket (domain or in another port) in parent and wait child connections and child should connect to parent after initialization.


    The first way is simple, but confuse me, that is not a clear process, just a copy of parent memory, with many possible side-effects and trash, and need to keep in mind that forked process has many dependencies to parent code. Second way needs more time to support two binary, and not so elegant like single-file solution, but much more secure and stable. Maybe, the best way is use fork() for process create and something to clear it memory without exec*() call, but I cant find any solution for this. The clone() syscall, that possibly may be used to create process, has CLONE_VM flag, that determine how parent memory will be used in fork: child share parent memory (like thread) or copy it, but clone() does not have any flag, that might be means something like "create process and dont share or copy parent memory".


    In conclusion, I need help to decide which way to use: create one-file executable binary like nginx, and use fork(), or create two separate files, one with "server" and one with "worker", and use fork() + exec*(worker) or system() N times from "server", and want to know for pros and cons for each way, or maybe I missed something.


    P.S. Also, i know that linux does not have any hard difference between process and thread, like Windows, but, the main case for process for me is isolated virtual memory (thread share it with parent) and best CPU utilization, scalability and stability of application. So, i decided, that process in my case cant be replaced with threads, and i need to find some way for process spawn.

    Cheers,
    Alex.

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  3. #3
    Registered User
    Join Date
    Feb 2014
    Posts
    3
    This is a second. Its a problem?

    As i see, guys from SO cant understand my question. Also, before post here, i check a many-many posts in SO and google about fork and co and did not found anything useful or any deep analysis of such a problem as I have.

  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    This is a second. Its a problem?
    O_o

    The behavior is problematic in my opinion. How do I know you aren't wasting my time? How do I know you aren't going to decide "You don't understand my question." before moving to a third forum? (You gave "stack overflow" a day before deciding they couldn't answer the question and went elsewhere to post.) How much time do I have before you decide to move to a third forum?

    *shrug*

    The etiquette for support forums like this is giving one forum at a time enough time to legitimate consider your question.

    If you don't see an answer from a forum the size of "stack overflow", you should consider that maybe your question is flawed.

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,662
    Can you tell us what's wrong with this approach?
    Code:
    #define WORKER_POOL_SIZE 10
    
    void worker ( int command_fd, int response_fd ) {
        while ( 1 ) {
            // wait for instructions on command_fd
        }
    }
    
    int main ( ) {
        pid_t workers[WORKER_POOL_SIZE];
        int ms_pipes[WORKER_POOL_SIZE][2];  // master to servant
        int sm_pipes[WORKER_POOL_SIZE][2];  // servant to master
        for ( i = 0 ; i < WORKER_POOL_SIZE ; i++ ) {
            pipe( ms_pipes[i] );
            pipe( sm_pipes[i] );
            workers[i] = fork();
            if ( workers[i] == 0 ) {
                close( ms_pipes[i][1] );
                close( sm_pipes[i][0] );
                worker( ms_pipes[i][0], sm_pipes[i][1] );
                exit(1);  // should never get here
            } else {
                close( ms_pipes[i][0] );
                close( sm_pipes[i][1] );
            }
        }
        // now do whatever you want here for the master
        return 0;
    }
    The worker pool is created at startup, so there is no messy cleanup to be done.
    There is no clumsy use of exec().



    FWIW, if you did really want to use exec, you can easily re-run a copy of yourself (well the server part) with say
    Code:
    int main ( int argc, char *argv[] ) {
        if ( argc <= 1 ) {
            execl( argv[0], "--server", (char*)0 );
        }
        if ( argc > 1 && strcmp( argv[1], "--server" ) == 0 ) {
            // do server code here
        }
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Feb 2014
    Posts
    3
    phantomotap, let me try to explain my behavior.

    There are two general type of questions about programming:

    Answer for question of first type is often simple quote from man\MSDN\spec or something same. You may copy-paste from man, you may refactor man text to more clear view before post, you may wrote answer from your own memory, but in all this cases answer is still "consant", because...because man is contstant, spec is sonstant, docs is constant, and yout answer should exclude any misunderstood or wrong misinterpretation of source of answer info.

    Answer for question of second type need your own experience, because man`s cant cover all possible variants of usage or havent detailed explanation of possible component usage. Man is follows the principle of single responsibility, and dont show the "best practices" and not teach you for a good programming skills.

    This is abstract, go to examples.

    First type: "explain how to fork() work", "how i can create process in linux", "how i can print to terminal" - all of this questions is simply googled, man pages of fork, pthread and printf is fully coverage all possible cases in this questions.
    Second type: "what a best architecture for *objective*", "how to tune MySQL server for my loads", "should i use apache or nginx for my web-site", "is WPF or Qt is best for my case". Try to feel the difference: if you dont have experience with software design, or mysql config, or dont work with apache & nginx, or not experienced in WPF and Qt frameworks - you cannot gave a good answer.

    Ok, this is my butthurt, go to reality.

    I know SO community. I well know SO community. I read it at least 5 years. 1.5 years ago i parse all SO posts meta data and can see correlation between question creation time, views, and time when best answer was written. Oh god, I hired to my company few professionals from SO. And a huge amount of issues in SO is question of first type. This is no good or bad, this is a fact. And SO often simply cant help me to find good answer, because before post some question I use google to find solution, and if nothing found, i consider that my question possibly of second type. And my experience tells me that SO probably cant help me. I write this not for you to think I'm God Of Programming or something, I only try to explain that in the real world there are many exceptions of netiquette

    ...and we go to end of tunnel.

    As you see, my question need answers, supported by answerer experience, not man quote or something. So, whats wrong, if i post question in, for example, two places, and see twice as many responses of professionals? If one man in SO say "no, *this way* is wrong, because..." and another man in this forum say "yeah, *this way* is good, because..."? Each of them has own part of experience with my problem and cant know of other nuances, that know other professionals in other forum`s.

    P.S. According to etiquette, produce offtopic in thread is bad idea, let's stop this holy war there. If i'm mistaken something or do you have something to argue - please send me a personal message.


    Salem
    , thanks for your reply.

    The firts approach with process poll is wrong for me for several reasons:

    1. Before start workers application need to do a lot of work. For example, parse config, that include a number of workers and some additional data for them. Also, application is distributed, and, after config read, app should be permanently connected to the "mothership" and listen some events, for example, config changes. And i have many other startup issues.
    2. Startup activity from p.1 implies that application load in memory something like "configReader", "configParser", "networkLayer", "connector", "logger" etc. All of them will duplicate in child.
    3. Next problem is how to increase or decrease workers thru application lifecycle. Also, worker may crash or hang, and server should restart it (kill old, start new), or runtime config may change.

    The second approach with self-exec is more useful for me, it solves all three problems described above but this design a bit not elegant. Anyway, thank you for solution, I'll think about it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Fork,exec equivalent in Windows
    By arka.sharma in forum C Programming
    Replies: 4
    Last Post: 09-28-2011, 01:33 PM
  2. Fork() and exec() help
    By TuXaKoS in forum C Programming
    Replies: 3
    Last Post: 11-01-2010, 11:35 AM
  3. Question about fork and exec
    By steli89 in forum Linux Programming
    Replies: 4
    Last Post: 04-13-2010, 07:17 AM
  4. fork/exec vi problem
    By Overworked_PhD in forum C Programming
    Replies: 3
    Last Post: 10-15-2009, 01:12 AM
  5. fork + exec
    By vipul_vgp in forum C Programming
    Replies: 3
    Last Post: 03-18-2009, 08:00 AM

Tags for this Thread