Thread: clone problems

  1. #1
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11

    clone problems

    Hello,

    I have an example of a method using openmp to create several threads which add "1" about 1 million times to an integers pointer, it's called "number*".

    My task is to do the same using the clone call. I did this, but my calling process always counts to two million instead of one.

    The openmp version shows a for - loop in which the increment method is called 2 times, the result is unstable due the lack of synchronization, but that is not my problem. It results in about 5% more than 1 million.

    This is the increment method:
    Code:
    void increment(uint64_t* number, uint64_t end ){
        for( uint64_t i = 0; i < end; i++ )
        {
            *number += 1;
        }
    }

    So I created a new loop and added the clone call. The loop counts to two, but my result is not 1 million + 5%, it is always 2 million. The clone procedure got the CLONE_VM flag and so the two threads use the same address space, but when calling the increment method, both of them count on their own.

    My Idea is to use a shared counting variable, but I'm not allowed to modify the increment method in any way.

    What can I do to solve this problem?

    The-Forgotten
    Last edited by The-Forgotten; 12-03-2011 at 10:26 AM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    It's all down to the granularity of your interleaving.

    OpenMP might be interleaving at the instruction level between two cores.

    But a clone()'ed process is in the hands of the OS scheduler.
    One could easily finish this small task before the other gets a chance to run again.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by The-Forgotten View Post
    The clone procedure got the CLONE_VM flag and so the two threads use the same address space, but when calling the increment method, both of them count on their own.
    I don't see any locking mechanism used here, if the threads are CLONE_VM, the same rules apply as with any other kind of threads (on linux, all threading is implemented using the "clone" system call, aka sys_clone, including the "clone" userspace library call you are using).

    If you are not using locks, this code is very wrong. Post the whole program!
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    Code:
        void* childstack = malloc(CHILD_STACK_SIZE);
    
        int i;
        for (i = 0; i < THREADS; i++) {
            printf("%i", clone(adapter, childstack + CHILD_STACK_SIZE - 1,
                 CLONE_VM | CLONE_THREAD, arg ));
        }
    It always returns me -1, which means there is an error, but i don't know that's the problem ...

    it doesn't even call the adapter method, it seems to crash before ....

    Adding any synchronization stuff is not required for my task, it's just the "openmp => clone"
    Last edited by The-Forgotten; 12-03-2011 at 11:15 AM.

  5. #5
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by The-Forgotten View Post
    Adding any synchronization stuff is not required for my task,
    You read and write to a shared variable from multiple threads without "any synchronization stuff". Do you know what UNDEFINED BEHAVIOUR is?

    There is no point trying to debug code which intentionally invokes undefined behaviour. Good luck.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #6
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    Well, the synchronization will be step two, but assignment one is to create this the way it is wanted, assignment two is the synchronization. At the moment there is no shared variable, so there is no need for synchronization.

    You are right, but my problem is, I want the problem you are talking about thats one step further then I am...

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by The-Forgotten View Post
    At the moment there is no shared variable
    I'll take your word for that -- it sounded like they were incrementing the same variable based on your description. Why don't you just post the whole program? It can't be much more than 100 lines. No one wants to throw darts blindfolded, it is a waste of time.
    Last edited by MK27; 12-03-2011 at 02:33 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    The problem about posting the whole program is that following: After the deadline my teachers will "google" phrases of my program, so posting the entire program might cause much trouble.

    I think I fixed the "no shared variable" problem and, as you said, the program went hell without synchronization. Everytime I have another result. The strange thing is, the openmp version shows me "wrong results" of about 1 million + 5% ... the "clone" version shows me wrong results between 400 and 900.

    Might this be related to the lack of synchronization?

    BTW: I think it's only about 50 lines, so you are right, but as I said, posting it with that few lines is quite dangerous for me :S I'm aware of the fact that it is hard to help with that less information...
    Last edited by The-Forgotten; 12-03-2011 at 03:10 PM.

  9. #9
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    I think i got the error, but how to solve it?

    How to tell the "waitpid" function to wait for all processes created by clone?

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by The-Forgotten View Post
    The problem about posting the whole program is that following: After the deadline my teachers will "google" phrases of my program, so posting the entire program might cause much trouble.
    I'm presuming you have some time to go at school. I would get the exact details of this policy if I were you. We don't do homework for people here, and to date I have not heard of a single person getting in trouble for using cboard (and it has been a much discussed issue). We aren't doing anything that you could not get from a tutor or TA, and some of the regulars do work in academia, so they are sensitive about this.

    If your school or professor has a policy: do not use the internet, period, I guess that is that, but it seems ridiculous and extreme. More likely, the issue is: I don't want to find out you copied your code/got it verbatim from someone else.

    Look into that, because forums like this one are a great and legitimate resource.

    I think I fixed the "no shared variable" problem and, as you said, the program went hell without synchronization. Everytime I have another result.
    This is still not clear to me: are, or are not, the clones incrementing the same counter? As in: could you give the variable a completely different name, just to be sure there is no confusion?

    The "undefined behavior" issue is that without locks, the system clone/threading API cannot guarantee anything about the state of a variable that is concurrently accessed by more than one thread. In theory, this could even mean that a partial write is done by one thread then another partial write by another, etc, and the whole mess ends up in the same physical address. The only truly discrete unit in programming is the bit. There are (generally) 32 bits in an int. If thread A loads those bits on core 1, then thread B loads the same into core 2, then thread A changes them and core 1 puts only half of them back into RAM and core 2 does the same then core 1 overwrites what core 2 did -- the point is that without synchronization, it is a simultaneous crap shoot.

    I haven't used openmp, so I can't comment on that, but again, AFAIK all multi-threading on linux relies on the kernel's version of clone.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    Well, the policy is almost similar. If I post my code here and someone copies it, both of us will have the trouble. They don't make a difference between the people copying the code and the people uploading it, both will be punished.

    At the moment the program finishs before the threads have completed their work, so the result is only about 500. If I add more threads, making the parent slower, the result grows, so I need some kind of waitpid(for all cloned children).

  12. #12
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    Code:
        siginfo_t info;     printf("%i", waitid(P_ALL, 0, &info, WEXITED));

    This always returns me -1, so there is an error, but I want to wait it for all clones....

    Code:
    pid = clone(adapter, childStack, CLONE_VM | CLONE_THREAD | CLONE_SIGHAND, &data);
    This returns me a valid pid, so the cloned process is running
    Last edited by The-Forgotten; 12-04-2011 at 05:06 AM.

  13. #13
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    The return value of clone() is a thread id, not a pid.

    Quote Originally Posted by The-Forgotten View Post
    Code:
        siginfo_t info;     printf("%i", waitid(P_ALL, 0, &info, WEXITED));

    This always returns me -1, so there is an error, but I want to wait it for all clones....
    Did you try perror?

    Code:
    	if (waitid(P_ALL, 0, &info, WEXITED) == -1) {
    		perror("waitid");
    	}
    I bet the outcome is:
    waitid: No child processes

    Did you notice this in the man page?

    Quote Originally Posted by man clone
    The low byte of flags contains the number of the termination signal sent to the parent when the child dies. If this signal is specified as anything other than SIGCHLD, then the parent process must specify the __WALL or __WCLONE options when waiting for the child with wait(2). If no signal is specified, then the parent process is not signaled when the child terminates.
    If you add SIGCHLD to your flags for clone, and remove CLONE_THREAD | CLONE_SIGHAND, waitid() will work. However, you have to wait for all the threads, so something like:

    Code:
    	while (waitid(P_ALL, 0, &info, WEXITED) != -1) sleep(1);
    BTW, I just noticed in your code from in post #4, you submit the same address for the child stack to every thread, and it looks like you are maybe still doing that. That is very wrong. Each child needs it's own stack!

    Code:
    	unsigned char stack[NUM_THREADS][STACKSZ];
    Which might explain part of your problem. However, what I said about the lack of synchronization is true, and if your "&data" from post #12 refers to the counter, that is a shared variable.

    I tried this:

    Code:
    #include <stdio.h>
    #include <sched.h>
    #include <sys/wait.h>
    #include <unistd.h>
    #include <stdlib.h>
    
    #define STACKSZ 8192
    #define FLAGS CLONE_VM | SIGCHLD
    
    int CountMax;
    
    int thread (void *arg) {
    	int *x = arg, i;
    	for (i=0; i<CountMax; i++) {
    		(*x)++;
    	}
    	return 0;
    }
    
    int main(int argc, const char *argv[]) {
    	unsigned char stack[2][STACKSZ];
    	int n = 0;
    
    	CountMax = strtol(argv[1], NULL, 0);
    
    	clone(thread, &(stack[0][STACKSZ-1]), FLAGS, (void*)&n);
    	clone(thread, &(stack[1][STACKSZ-1]), FLAGS, (void*)&n);
    
    	siginfo_t info;
    	while (waitid(P_ALL, 0, &info, WEXITED) != -1) {
    		printf("%d %d %d\n", info.si_signo, info.si_errno, info.si_code);
    		sleep(1);
    	}
    	printf("%d\n", n);
    
    	return 0;
    }
    Here's what happens with increasing values for "CountMax":

    Code:
    localhost C # ./a.out 10
    17 0 1
    17 0 1
    20
    localhost C # ./a.out 100
    17 0 1
    17 0 1
    200
    localhost C # ./a.out 1000
    17 0 1
    17 0 1
    2000
    localhost C # ./a.out 10000
    17 0 1
    17 0 1
    15635
    localhost C # ./a.out 100000
    17 0 1
    17 0 1
    105023
    localhost C # ./a.out 1000000
    17 0 1
    17 0 1
    1304691
    localhost C # ./a.out 10000000
    17 0 1
    17 0 1
    11071507
    17 is SIGCHLD , and none of them indicate an error. Notice with lower numbers, the outcome is as expected, but with higher ones, it is messed up.

    Most likely, this is because with the lower values, each thread completes its task cleanly on its first run thru the scheduler, and the second thread is queued after the first one has finished. With larger numbers, the threads cannot do that -- they need more than one opportunity and must alternate, the chance of them running on two cores simultaneously is increased, etc -- so "n" becomes more prone to get messed up for the reasons described in post #10.

    For values in the range 2-3000 (your mileage may vary), the outcome was often (but not always) good. For values >=5000, it never came out correctly. If that indicates something about the granularity of the scheduler, there is very little chance 1000000 will not end up anything but corrupted.
    Last edited by MK27; 12-04-2011 at 07:50 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #14
    Registered User
    Join Date
    Nov 2011
    Location
    Karlsruhe, Germany
    Posts
    11
    That's great, thanks a lot.
    Is clone still creating threads without "THREAD_CLONE" ?
    My assignment says, I have to use threads, not processes to make this.

    Well, in my opinion it works and so it is fine, but I'm not sure whether my teacher will share my opinion.

    I got another question, as already told, assignment 2 is to use a semaphore to archive synrchonization.

    I did this the following way, let me explain at your example:

    Code:
    int thread (void *arg) {
            sem_wait(&sem);
            int *x = arg, i;
            for (i=0; i<CountMax; i++) {
                 (*x)++;
            }
            sem_post(&sem);     return 0; }
    and

    Code:
    sem_destroy(&sem)
    at the end,
    and
    Code:
    sem_init(&sem, 0, 1);
    at the very start of my program.

    Now my result is not 1 million, it is always exactly 2 million :S
    using three threads, it is 3 million :S

    BTW: Thanks for the advice regarding the child stack, you were right, I used the same for all threads.
    Strange thing is: If I use the same child stack for all threads, they count "together" to 1 million, as expected, but if they get their own child stacks, they all count to 1 million on their own :S

    So when using the same child stack just covers an error with another one, I'd like to do it the right way, but how to do that?

    And I have another question, how to know how much space my child stack will need? You allocated 8192, I know that this is 2¹³, but why not 2¹⁴ or 2¹²?
    Last edited by The-Forgotten; 12-04-2011 at 08:15 AM.

  15. #15
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by The-Forgotten View Post
    That's great, thanks a lot.
    Is clone still creating threads without "THREAD_CLONE" ?
    My assignment says, I have to use threads, not processes to make this.
    I guess that depends on how you define "thread" in relation to "process". In a non-system specific way, I'd say if they share the same address space (eg, via CLONE_VM), they are functionally threads. However, in a *nix specific way, I'd say anything with its own pid is a process...which man clone favours the term "process". However, something about the depreciation of CLONE_PID tells me that is too simplistic. If there is still time, lol, you should ask.

    My reason for telling you to remove CLONE_THREAD was:

    Quote Originally Posted by man clone
    When a CLONE_THREAD thread terminates, the thread that created it using clone() is not sent a SIGCHLD (or other termination) signal; nor can the status of such a thread be obtained using wait(2). (The thread is said to be detached.)
    WRT to the POSIX threads API, which on linux is implemented using sys_clone(), threads are not necessarily detached.

    For me here (kernel 3.0.6) using CLONE_THREADS implied some details have been left out of the manpage. Eg, "CLONE_THREAD | CLONE_SIGHAND" alone craps out with an "Invalid argument" error. CLONE_VM seems to also be required.

    Furthermore, the "parent" and two children are now actually peers, and AFAICT, they are all done with first one to exit. Eg:

    Code:
    #include <stdio.h>
    #include <sched.h>
    #include <sys/wait.h>
    #include <unistd.h>
    #include <stdlib.h>
    
    #define STACKSZ 819200
    #define FLAGS CLONE_THREAD | CLONE_SIGHAND | CLONE_VM
    
    struct args {
    	int *n;
    	int id;
    };
    
    int CountMax, Done[2];
    
    int thread (void *arg) {
    	struct args *me = arg;
    	int i;
    	fprintf(stderr, "%d start\n", me->id);
    	for (i=0; i<CountMax; i++) {
    		(*(me->n))++;
    	}
    	Done[me->id] = 1;
    	return 0;
    }
    
    int main(int argc, const char *argv[]) {
    	unsigned char stack[2][STACKSZ];
    	int n = 0;
    	struct args eg[2] = { { &n, 0 }, { &n, 1 } };
    
    	CountMax = strtol(argv[1], NULL, 0);
    
    	if (clone(thread, &(stack[0][STACKSZ-1]), FLAGS, (void*)&eg[0]) == -1) {
    		perror("clone1");
    	}
    	if (clone(thread, &(stack[1][STACKSZ-1]), FLAGS, (void*)&eg[1]) == -1) {
    		perror("clone2");
    	}
    
    	while (!Done[0] && !Done[1]) {
    		fprintf(stderr, "%d\n", n);
    	}
    
    	fprintf(stderr,"DONE!");
    
    	return 0;
    }
    Not using a sleep() in the main while loop is nasty, but otherwise it will not happen (unless you have parallel sleeps in thread()). Some output:

    Code:
    localhost C # ./a.out 10000
    1 start
    0
    8905
    localhost C # ./a.out 10000
    0
    0
    0
    0
    0 start
    0
    1566
    2923
    4050
    5168
    6284
    7401
    8503
    9615
    localhost C #
    This is all about one thread finishing first, then all the others exit with it regardless.

    Of course, this model is still not properly synchronized, which makes this limitation much more troublesome.

    Now my result is not 1 million, it is always exactly 2 million :S
    using three threads, it is 3 million :S
    That's what semaphors are for

    BTW: Thanks for the advice regarding the child stack, you were right, I used the same for all threads.
    Strange thing is: If I use the same child stack for all threads, they count "together" to 1 million, as expected, but if they get their own child stacks, they all count to 1 million on their own :S
    Not sure what you mean by that? The total is 2 million instead of 1? Maybe think about how you are counting.

    I have to take off for the day, unfortunately.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Clone function
    By gustavosserra in forum C++ Programming
    Replies: 6
    Last Post: 01-03-2004, 05:57 PM
  2. my breakout clone
    By lambs4 in forum Game Programming
    Replies: 12
    Last Post: 09-03-2003, 02:16 PM
  3. AGS Tetris Clone
    By Damascus in forum Game Programming
    Replies: 1
    Last Post: 03-07-2003, 05:17 PM
  4. First Human clone
    By Commander in forum A Brief History of Cprogramming.com
    Replies: 56
    Last Post: 12-30-2002, 04:46 PM