1. ## Double-threaded woes with binary tree traversal!

Hello All,

I'm building a binary tree to traverse with two threads in order, printing out the values of the leaves as it goes. So we have the typical binary tree, I'm giving full children and we're going down like 3 levels.

It's too hard to make a figure so I'll just say, we have root which is one then we have the next level which two more then we have the next level which is 4 more then we have the last level which 8 more total.

According to valgrind I leak 15 allocations which is good because 1+2+4+8 = 15.

Single-threaded, my code works. But when I try to double thread it, valgrind tells me :

==5195== at 0x0: ???
==5195== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==5195==
==5195==
==5195== Process terminating with default action of signal 11 (SIGSEGV)
==5195== at 0x0: ???

Here's my code below :
Code:
```#include <stdio.h>
#include <stdlib.h>

struct node {

int index;

struct node *parent;
struct node *children[2];
};

void* traverse(struct node *tree) {

if (!tree) return NULL;
traverse(tree->children[0]);
if (!tree->children[0] && !tree->children[1]) printf("%d\n", tree->index);
traverse(tree->children[1]);
}

struct node* buildTree(void) {

struct node *root = malloc(sizeof(*root));

for (int i=0; i<2; i++) {

root->children[i] = malloc(sizeof(*root));

for (int j=0; j<2; j++) {

root->children[i]->children[j] = malloc(sizeof(*root));

for (int k=0; k<2; k++) {

struct node *tmp = root->children[i]->children[j]->children[k];
tmp = malloc(sizeof(*root));

tmp->index = 4*i+2*j+k;

tmp->children[0] = NULL;
tmp->children[1] = NULL;

root->children[i]->children[j]->children[k] = tmp;
}
}
}

return root;
}

int main(void) {

struct node *root = buildTree();

return 0;
}```

2. I think I got it, it's the printing statement I have! Did you know it's impossible for two threads to print to the same line in the console at the same time? Omfg, it's like two people trying to write in the same line. So how do I fix this then, you guys?

Each threads runs to completion independently but I don't know how to set a wait on the print or store the values to be printed later.

3. Could you just store the values into two separate arrays and wait until both threads complete before printing anything?

Is there a mutex or semaphore synchronization method that you could use so that both threads could share print or share an array to store values into?

4. > pthread_create(&shears[0], NULL, traverse(root->children[0]), NULL);
Erm, you're supposed to pass a function pointer to pthread, not the result of calling the function.

Say
Code:
```   pthread_create(&shears[0], NULL, traverse, root->children[0]);
> void* traverse(struct node *tree)
This should be
Code:
```void* traverse(void *t) {
struct node *tree = t;
}```

5. Weird, then does my code work? If I comment out one of the threads and let it run though say the root as an index, it actually prints out the leaves.

6. It "works" because you actually do a traversal BEFORE trying to create a thread.

Code:
```void* traverse(struct node *tree) {
if (!tree) return NULL;
traverse(tree->children[0]);
if (!tree->children[0] && !tree->children[1]) printf("%d\n", tree->index);
traverse(tree->children[1]);
}```
You said it returned a void*, but in fact it will ultimately return garbage (although you seem to get lucky and the innermost return NULL is magically propagated all the way to the outermost 'falling off the end of the function'.

So this does a traversal, and then tries to create a thread with basically
Code:
` pthread_create(&shears[0], NULL, NULL, NULL);`

7. What is the essence of line 47 of your code?

8. Oh, line 47 was to avoid an obfuscation of the code. Well, I suppose it failed that purpose if you're asking. It's because writing that many pointers to pointers was annoying. Obviously, my allocation procedure is pretty inelegant but that wasn't the purpose of my code. What I wanted to do was, create a routine that would use multiple threads to traverse independent parts of a tree and identify leaves.

You see, I have this plan, right? It's pretty amazing. The general gist I get from the meshing community is that insertion > other methods. Construction has its own merits but the big thing seems to be insertion or at least, my hero Volker Springel was all like, "Oh yeah, I'ma stick it in... that tetrahedron!" with his code Arepo which is tetrahedral mesh + physics 'n' stuffs.

Idk how he does his stuff, but it basically works like this, for every insertion in a tetrahedron, 4 children are created. Granted, there are special cases of this but the number of children is capped off at 4. There is no such thing as going from 1 to 5, at least not directly.

So, what I want to do is, create a quadtree of pointers and traverse the poop out of it, searching for valid insertion spots and start counting. Doing this single-threaded is kind of slow so I want to quadruple thread it because all I have is 4 cores. But I'll write it to scale with more cores later. I'm hoping it'll be easy to do with pthreads but eh. Even though I graduated, my account on the school's super computer is still active and I think they've hyperthreaded it so I have 32 threads available. Or they gave me two CPUs, Idk.

And yes, I know about work-load balance and I'm trying to work on that but omg, it's a tough problem.

Either way, I took Salem's advice and he forgot to mention the most important detail, the use of pthread_join because for like 15 minutes, I was nerd raging beyond all belief.

So, I must say, thank you very much, Salem and now my code works perfectly all the time. Though I have encountered a race condition in that the second thread will sometimes finish before the first. I know there's a semaphore solution to this or is it a mutex? I think a mutex is like a lock on a structure to prevent two threads from acting on the same data but a semaphore makes sure a thread is finished before another one is, right? But when I compile with the world's longest command, "gcc -g -Wall -Wextra -lpthread -std=c11 -o tree tree.c", it works perfectly.

Code:
```#include <stdio.h>
#include <stdlib.h>

struct node {

int index;

struct node *parent;
struct node *children[2];
};

void traverse(struct node *tree) {

if (!tree) return;
traverse(tree->children[0]);
if (!tree->children[0] && !tree->children[1]) printf("%d\n", tree->index);
traverse(tree->children[1]);
}

void *shear(void *t) {

traverse(t);

return NULL;
}

struct node* buildTree(void) {

struct node *root = malloc(sizeof(*root));

for (int i=0; i<2; i++) {

root->children[i] = malloc(sizeof(*root));

for (int j=0; j<2; j++) {

root->children[i]->children[j] = malloc(sizeof(*root));

for (int k=0; k<2; k++) {

struct node *tmp = root->children[i]->children[j]->children[k];
tmp = malloc(sizeof(*root));

tmp->index = 4*i+2*j+k;

tmp->children[0] = NULL;
tmp->children[1] = NULL;

root->children[i]->children[j]->children[k] = tmp;
}
}
}

return root;
}

int main(void) {

struct node *root = buildTree();

return 0;
}```
Note that I call traverse from the side for easier control. Dealing with thread returns is annoying as balls and keeping traverse separate should allow for easier manipulations. Ty again for all the help

9. Okay, I have a more important question, how do I call a thread more than once? I have pthread_create and then pthread_join but I don't know how to call shears[0] again if I wanted to. I've tried something like :

Code:
```   pthread_join(shears[0], NULL);
But it doesn't do anything. How can I spam the threads I create? I checked and pthread_join doesn't kill the thread so it should still exist. How do I call it again? Do I have to recreate it?

Edit: Sorry, answered my own question. I can only re-run the thread if I recreate it and rejoin it. In my googling adventures I heard of something called a threadpool. Does anyone know anything about this or what it is? I found a threadpool.c but it's big and scary. From the way it's named, it sounds like it'd be a library routine to constantly be recreating threads which is what I'm doing anyway.

10. Thread pool pattern - Wikipedia, the free encyclopedia
Do you need lots of threads?
Is the work performed by a thread small in relation to the cost of creating a thread on demand?
Will each thread work on it's own data (your example fails this test)?

11. Alright, here's what I'm hoping to do and I'm not sure if it's the best way because I haven't seen anyone else do it before and I don't know if that's because it's a horribly bad idea or because it's clever.

The thing is, if you insert a new point inside a tetrahedron it becomes a vertex and if it's directly inside, it splits it into 4, i.e. the insertion point forms a tetrahedron with every face of the parent (4 faces = 4 children, 3 points from the face and 1 from the inserted point form the 4 points required to create a tetrahedron).

But if two tetrahedra share a face, they both get split into 3 for a 2-to-6 flip. If a point is inserted on the edge, a 1-to-2 flip is performed and because more than two tetrahedrons can share an edge, it's really an n-to-2n flip. So I'm doing multiple insertions and the way my code works, it has a fracturing function which works well, it returns an allocated set of children of n-size (basically, 2, 3 and 4) and as far as I can tell, my tetrahedrons haven't collided yet. It helps to draw this stuff. Keep in mind that we maintain the same principle as with a 1-to-4 flip, it's just that when a point's on the surface, it makes a zero-volume tetrahedon. It's easiest to draw the face which is a triangle and put a point in it. Then connect the dots appropriately and you should see.

So here's what needs to happen. I have a point that I want to insert into an already huge mesh. I need to find a place to put it. I need to find out if it's on the surface and if it is, which tetrahedron shares that surface? I need to find out if it's on an edge and if so, how many others and which ones share that same edge? Once the location is done, I need to split all the tetrahedrons found.

Even in my code now, if a point is inserted on the surface of a tetrahedron that has a shared face, I split the first one, then the second and link the faces properly once more. And if 5 tetrahedrons shared an edge, I would need to handle 5 insertions to yield the appropriate 10.

My proposed solution?

Simple. We know that in an ideal world, every node becomes a parent and has 4 kids who share a face with all their siblings and just one face with their aunts and uncles (ironically enough, the same face the parent shares with the aunt or uncle). If this sounds familiar, it's because it is! It's a quadtree, obviously!

So what I'm going to create is a quadtree of pointers to tetrahedrons, using any parent as a ghost cell and a navigation tool. The root is the initial tetrahedron that encompasses the whole point set to be inserted and the tree is built from there. The only valid tetrahedrons in the mesh are the leaves.

And when my tetrahedrons are created, they store all their plane data. Using this, it's incredibly quick to detect if a point is on the surface, inside or outside. A surface detection is the same as an edge detection. Though storing that much data is very costly. Especially if you use long doubles 'cause you think it makes you look kewl.

So what my idea was, use a quadtree to keep track of parents and their children and then use multiple threads to traverse it and find valid children. Should a direct insertion be found, we stop all other threads and finish the insertion. We then find out where the next particle goes. Should this one return more than one leaf as a valid one, we can easily check how many faces of a tetrahedron share this point which should be 2, i.e. an edge is the intersection of two finite faces at their boundary. This is only useful in the case of returning 2 leaves as we'd need to differentiate between a 2-to-4 flip (edge) and a 2-to-6 flip (surface). Should we return more than 2 leaves then we automatically know that either our code is broken (no more than one tetrahedron can a share face with this code) or we have one edge being shared by a lot of tetrahedrons.

The traversal will be optimized by following the echoes of the past, the ghost cells. We have the root and its set of children. From a parent, we send out threads equal to the number of children. If a thread finds a valid child, it continues its search along a specific branch (if we have children[0], [1], [2] and [3], we send the parent thread down children[0] and create three more for [1], [2] and [3]). The parent thread sends out additional threads equal to the number of valid children - 1. If a thread gets to a leaf and it still hasn't found anything, it terminates.

The idea is, if a point is inside a ghost parent then one of its children must contain it. But which one? Well, that's what the threads are for!

From there, we can do threaded allocations and whatnot. And then try and multithread face sharing. We use face-sharing for generating an additional mesh on top of the tetrahedral one. I think face-sharing navigation would fail the edge test a lot more or it'd be doing something similar to what I'm already doing.

Does any of this seem sensible? Or is going by faces the way to do it? Because I could probably just check for a 1-to-3 or 1-to-2 return and then start checking out neighbours from there. Oh, but that's harder to do because you can't point to every tetrahedron in your mesh at once and trying to find out where to put a random point might be hard.

Also, I cannot claim that I got here by myself. A poster here helped me through this quite a bit so I must give credit to him.

12. So what my idea was, use a quadtree to keep track of parents and their children and then use multiple threads to traverse it and find valid children. Should a direct insertion be found, we stop all other threads and finish the insertion.
You should be thinking of threads in terms of "goods train" rather than "motorcycle courier". You're not going to be able to efficiently start and stop them in nanoseconds after processing only a handful of instructions.

```struct node *root[MAX_THREADS];