The return value of clone() is a thread id, not a pid.
Originally Posted by
The-Forgotten
Code:
siginfo_t info; printf("%i", waitid(P_ALL, 0, &info, WEXITED));
This always returns me -1, so there is an error, but I want to wait it for all clones....
Did you try perror?
Code:
if (waitid(P_ALL, 0, &info, WEXITED) == -1) {
perror("waitid");
}
I bet the outcome is:
waitid: No child processes
Did you notice this in the man page?
Originally Posted by
man clone
The low byte of flags contains the number of the termination signal sent to the parent when the child dies. If this signal is specified as anything other than SIGCHLD, then the parent process must specify the __WALL or __WCLONE options when waiting for the child with wait(2). If no signal is specified, then the parent process is not signaled when the child terminates.
If you add SIGCHLD to your flags for clone, and remove CLONE_THREAD | CLONE_SIGHAND, waitid() will work. However, you have to wait for all the threads, so something like:
Code:
while (waitid(P_ALL, 0, &info, WEXITED) != -1) sleep(1);
BTW, I just noticed in your code from in post #4, you submit the same address for the child stack to every thread, and it looks like you are maybe still doing that. That is very wrong. Each child needs it's own stack!
Code:
unsigned char stack[NUM_THREADS][STACKSZ];
Which might explain part of your problem. However, what I said about the lack of synchronization is true, and if your "&data" from post #12 refers to the counter, that is a shared variable.
I tried this:
Code:
#include <stdio.h>
#include <sched.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#define STACKSZ 8192
#define FLAGS CLONE_VM | SIGCHLD
int CountMax;
int thread (void *arg) {
int *x = arg, i;
for (i=0; i<CountMax; i++) {
(*x)++;
}
return 0;
}
int main(int argc, const char *argv[]) {
unsigned char stack[2][STACKSZ];
int n = 0;
CountMax = strtol(argv[1], NULL, 0);
clone(thread, &(stack[0][STACKSZ-1]), FLAGS, (void*)&n);
clone(thread, &(stack[1][STACKSZ-1]), FLAGS, (void*)&n);
siginfo_t info;
while (waitid(P_ALL, 0, &info, WEXITED) != -1) {
printf("%d %d %d\n", info.si_signo, info.si_errno, info.si_code);
sleep(1);
}
printf("%d\n", n);
return 0;
}
Here's what happens with increasing values for "CountMax":
Code:
localhost C # ./a.out 10
17 0 1
17 0 1
20
localhost C # ./a.out 100
17 0 1
17 0 1
200
localhost C # ./a.out 1000
17 0 1
17 0 1
2000
localhost C # ./a.out 10000
17 0 1
17 0 1
15635
localhost C # ./a.out 100000
17 0 1
17 0 1
105023
localhost C # ./a.out 1000000
17 0 1
17 0 1
1304691
localhost C # ./a.out 10000000
17 0 1
17 0 1
11071507
17 is SIGCHLD , and none of them indicate an error. Notice with lower numbers, the outcome is as expected, but with higher ones, it is messed up.
Most likely, this is because with the lower values, each thread completes its task cleanly on its first run thru the scheduler, and the second thread is queued after the first one has finished. With larger numbers, the threads cannot do that -- they need more than one opportunity and must alternate, the chance of them running on two cores simultaneously is increased, etc -- so "n" becomes more prone to get messed up for the reasons described in post #10.
For values in the range 2-3000 (your mileage may vary), the outcome was often (but not always) good. For values >=5000, it never came out correctly. If that indicates something about the granularity of the scheduler, there is very little chance 1000000 will not end up anything but corrupted.