Can I get some feedback on this paper I wrote?

**MutantJohn** · 06-13-2015

I think I'm one of the first to write this type of meshing code on the GPU. I wrote a quick and dirty paper on it. I wanted it to be accessible to people who know C/C++ and understand the basics of threads and parallelism.

Can you guys give me some feedback on it? Even if you think it sucks, please tell me why.

https://github.com/LeonineKing1199/R...ulus-paper.pdf

Edit : For example, if something doesn't make sense or needs clarification, please let me know! This stuff can go on and on and on so I tried to keep it light. Let me know if it's too light.

**Epy** · 06-13-2015

Is this supposed to be a real research paper, or documentation? The language is very informal and not very "technical" in nature, I took it as informal documentation for your project.

**Mario F.** · 06-13-2015

Hmm... I think your process falls apart on what you call the Point Redistribution phase. It seems way too expensive for any serious modeling. But I need more details:

Must you use bucket hashing for nominating points? This technique doesn't seem to fit well with your need to later test for fractured faces. Ideally you would want something that could be all stored inside the same data structure as new points are being nominated. Your tetra tuple structure should alone contain all the necessary information to locate fractured faces. This would save you one kernel thread. No need to load any tetrahedra.

Explain to me the orientation tests. It's been my experience over the years that you can generally greatly reduce the complexity of spacial testing, by implementing smarter algorithms that apply such techniques as heuristics and early exits.

...

Epy is right. This is a technical paper. You need to just and only present the Problem, the Solution and the Method. Drop anything that isn't directly related to these three elements. We aren't used to read papers with personal considerations. Leave that stuff only for presentations.

EDIT: Also, nothing of what you present is directly related to CUDA. It can be implemented on any GPU, and even a CPU. You should drop, for the benefit of the method, the implied architectural limitation on the title. Someone not using CUDA will read your paper title and will skip it entirely.

**MutantJohn** · 06-14-2015

Hmm... I didn't actually think an informal personal touch would be so out of place lol XD

When I was writing it, I was like, "Won't people be bored if I start talking about simplicial complexes and half-spaces and all that?"

I guess the answer is no O_o

Must you use bucket hashing for nominating points? This technique doesn't seem to fit well with your need to later test for fractured faces.

Yes. I think. I'm fairly confident. The problem with degeneracies is that it allows one point to be on multiple tetrahedra. This makes choosing which point to insert a freaking nightmare if you're trying to do it all in a massively parallel way.

By using bucket hashing, I'm able to detect potential double fractures in a simple way. The concept is, one thread per bucket, iterate it and use atomicCAS() to check for overlapping bucket contents. Once all the conflicting points are resolved, we can re-iterate the bucket and actually set the proper values of nm.

Explain to me the orientation tests. It's been my experience over the years that you can generally greatly reduce the complexity of spacial testing, by implementing smarter algorithms that apply such techniques as heuristics and early exits.

The orientation tests are an implementation of the original matrix determinant presented in the paper. I think the only time I ever call them in the source code is at the point redistribution phase.

This is necessary though.

When a tetrahedron is fractured, the original volume is split into a small set of subdomains that span the original. The original spatial domain encompasses a set of points. These need to be redistributed among the new subdomains. I think what I have now is the best I could think of at the time. Get the number of potential writes, prefix sum to get offsets and then do the actual testing and writing across 3 separate GPU kernel calls.

I got the gist from the Arepo paper but this is where they really go into it : http://matthias-mueller-fischer.ch/p...rCollision.pdf

The algorithm is here in section 4.4 :

Barycentric coordinates with respect to x0: Weexpress p in new coordinates β = (β1, β2, β3)Twith respect to a coordinate frame, whose origin coincideswith x0 and whose axis coincide with theedges of t adjacent to x0: p = x0 + Aβ,, whereA = [x1 − x0, x2 − x0, x3 − x0] is a 3 by 3 dimensionalmatrix. The coordinates β of p in thisnew coordinate frame are: β = A−1(p − x0).Now the point p lies inside tetrahedron t, ifβ1 ≥ 0, β2 ≥ 0, β3 ≥ 0 and β1 + β2 + β3 ≤ 1.

I think I'm going to use this but I'm not sure how to detect when to switch to the exact/arbitrary arithmetic mode which is what the Shewchuk predicates are. Or rather, they exist in the Shewchuk code from the my github.

Btw, thanks for actually reading the paper and giving me some criticism!

Seriously, I really do want help with the point redistribution scheme.

**MutantJohn** · 06-14-2015

Btw, do you guys wanna guess what the Nvidia Visual Profiler tells me the biggest bottleneck is?

*drumroll*

It's this part!!!!

Code:

template<typename T>
struct tuple_comp
{
    __host__ __device__
    bool operator()(const thrust::tuple<T, T, T, T, T> t,
                    const thrust::tuple<T, T, T, T, T> v)
    {
        return ((unsigned& ) thrust::get<0>(t)) < ((unsigned& ) thrust::get<0>(v));
    }
};

	    // Sort everything so that positives are up front
	    thrust::sort(thrust::make_zip_iterator(
	                    thrust::make_tuple(thrust::device_ptr<int>(pa),
	                                       thrust::device_ptr<int>(ta),
	                                       thrust::device_ptr<int>(la),
	                                       thrust::device_ptr<int>(fs),
	                                       thrust::device_ptr<int>(nm))),
	                 thrust::make_zip_iterator(
	                    thrust::make_tuple(thrust::device_ptr<int>(pa + array_capacity),
	                    				   thrust::device_ptr<int>(ta + array_capacity),
	                    				   thrust::device_ptr<int>(la + array_capacity),
	                    				   thrust::device_ptr<int>(fs + array_capacity),
	                    				   thrust::device_ptr<int>(nm + array_capacity))),
	                 tuple_comp<int>());

**Mario F.** · 06-14-2015

I've read the paper and they themselves admit the performance limitations of the current algorithm. All in all, seems like an interesting option under controlled scenarios in which you wish to implement collision-detection for volumetric projections, like galaxy formation and collisions in Arepo, or making a better gel in Portal 2.

But you seem more interested in rigid objects, is that right? Tetrahedra meshes are an option. But if what you are working on is exploring ways of rendering rigid objects from volumetric data, know that voxels are already taking over in this area. Much of the professional CAD industry, for instance, already implements volume rendering through voxels. The slowpoke is the 3D printing industry that is taking forever to ship printers that can take voxel data as input, so the fional render needs to be converted to STL.

**MutantJohn** · 06-15-2015

I think you made a good point, Mario. My redistribution scheme is a solid 3 GPU kernels. I can condense it down to just one if I knew how to write safely. The first 2 kernels are literally just there to calculate the write indices...

Hmm...

Okay, I know there's a range of available indices that I can write to. What if I used a random number generator and just prayed that there were no collisions?

**Mario F.** · 06-15-2015

Originally Posted by MutantJohn

Okay, I know there's a range of available indices that I can write to. What if I used a random number generator and just prayed that there were no collisions?

Pray Programming is a possibility. But you'll have to work out the details of God's protocols.

The problem you have here is that you are running 3 threads per fractured tetra. This cannot be. As I said earlier, the problem exists because you aren't storing the whole data in the tetra struct. You need to reconsider your data structures in order to avoid the overhead processing you are experiencing now. Go back to how you are storing your tetras.

**MutantJohn** · 06-15-2015

Hmm... Do you mean like storing an associated point set per tetrahedron?

Something like:

Code:

struct tetra
{
    int v[4]; // vertices
    int n[4]; // neighbors

    point *enclosed_pts;
    int num_enclosed_pts;
};

?

Thread: Can I get some feedback on this paper I wrote?

Thread Tools

Search Thread

Display

Can I get some feedback on this paper I wrote?

Similar Threads

Could someone help me with this for loop I wrote?

A little program I wrote for fun !

So I wrote a class.

Can someone look over the program I wrote?

How to prove you wrote it