Nah, I'll just post it here because I think it makes the most sense to do so. I'm going to preface this by saying, my abstractions were largely created for my use for my specific needs. I call it a "library" because it's a set of reusable components but I haven't felt a strong obligation to abstract it away from the main project.
I make meshes. So I need to represent tetrahedra. To do that, I need vertices and I need functions that operate on a set of vertices. One of the cool things I think I did with C++ was how I templated the point structure. I need a convenient way of representing a Cartesian triple. I also want to avoid global typedefs if I want to switch between float, double and maybe someday long double.
I wound up with this:
Code:
template <
typename T,
typename = std::enable_if_t<std::is_floating_point<T>::value>>
struct point_type
{
using type = T;
};
template <>
struct point_type<float>
{
using type = float3;
constexpr float static const max_coord_value = FLT_MAX;
};
template <>
struct point_type<double>
{
using type = double3;
constexpr double static const max_coord_value = DBL_MAX;
};
template <typename T>
using point_t = typename point_type<T>::type;
I had forgotten that I included max coordinate values but CUDA has some pretty nifty aligned POD types. float3 and double3 both have members x, y and z. This allows me to easily mix up point types in a somewhat readable way. I also don't lose any performance for this abstraction which is good.
I've also implemented a light matrix class. This is one part where I definitely know my abstraction is lacking. You declare the matrix as `matrix<float, row_value, col_value` but you have to access it yourself manually using 2d-as-1d array conventions, if that's the right way of saying it. Basically, you need to do: `m[row_idx * num_cols + col_idx]` manually.
This is for a couple of reasons, I'm internally using a 1d array for storage anyway because it's a nice contiguous chunk that I can access rapidly. I was nervous that an intermediate object would incur some overhead to create the 2d access pattern. And for some reason, a method didn't really seem attractive or natural. I have a horribly outdated test file that kind of shows the matrix class was meant to work. There's also some old pivoting stuff in there that I need to take out.
My code technically only needs to implement a 5x5 matrix at the most so when it came time to implement the determinant, I just brute force-implemented it using some template specializations:
Code:
template <typename T>
__host__ __device__
auto det(matrix<T, 2, 2> const& m) -> T
{
return m.data[0] * m.data[3] - m.data[1] * m.data[2];
}
template <typename T>
__host__ __device__
auto det(matrix<T, 3, 3> const& m) -> T
{
array<T, 9> const& d = m.data;
return (
d[0] * d[4] * d[8] +
d[1] * d[5] * d[6] +
d[2] * d[3] * d[7] -
d[2] * d[4] * d[6] -
d[1] * d[3] * d[8] -
d[0] * d[5] * d[7]);
}
// atrocious performance
// is literally O(n!) where n is
// matrix rank
template <typename T, int N>
__host__ __device__
auto det(matrix<T, N, N> const& m) -> T
{
array<T, N * N> const& d = m.data;
matrix<T, N - 1, N - 1> buff{ 0 };
T det_value{ 0 };
for (int col = 0; col < N; ++col) {
int buff_size = 0;
for (int i = 1; i < N; ++i) {
for (int j = 0; j < N; ++j) {
if (j == col)
continue;
buff[buff_size] = d[i * N + j];
++buff_size;
}
}
T const det_term = d[col] * det(buff);
det_value += (col % 2 == 0 ? det_term : -det_term);
}
return det_value;
}
So, I'm skipping a lot of details and ideas about the core algorithm but I don't want to bog this post down with all that. The point is, C++ makes it easy to write constrained yet generic code and operator and function overloading can allow you to write abstractions that map to something like you'd actually write. It makes sense to multiply matrix A and B with A * B. Same for vectors. It makes sense to add two points p and q with p + q.
I can also make writing routines to take the orientation of a tetrahedron (basically, is d above or below the plane abc spanned by tetrahedron abcd) and calculate if a point p is inside the circumsphere drawn by tetrahedron abcd with something like:
Code:
// another built-in CUDA type, is exactly 4 ints and has members x, y, z and w
using tetra = int4;
enum class orientation { positive = 1, zero = 0, negative = 2 };
// Routine that calculates whether the point d
// is above the triangle spanned by abc
template <typename T>
__host__ __device__
auto orient(
point_t<T> const& a,
point_t<T> const& b,
point_t<T> const& c,
point_t<T> const& d)
-> orientation
{
matrix<T, 4, 4> const m{ 1, a.x, a.y, a.z,
1, b.x, b.y, b.z,
1, c.x, c.y, c.z,
1, d.x, d.y, d.z };
T const det_value = det<T, 4>(m);
auto const not_equal_to_zero = !eq<T>(det_value, 0.0);
if (det_value > 0.0 && not_equal_to_zero) {
return orientation::positive;
} else if (!not_equal_to_zero) {
return orientation::zero;
} else {
return orientation::negative;
}
}
// Calculate the magnitude of a point (i.e. vector)
template <typename T>
__host__ __device__
auto mag(point_t<T> const& p) -> T
{
return (
p.x * p.x +
p.y * p.y +
p.z * p.z);
}
// Function that calculates whether or not p is contained
// in the sphere circumscribed by the tetrahedron abcd
template <typename T>
__host__ __device__
auto insphere(
point_t<T> const& a,
point_t<T> const& b,
point_t<T> const& c,
point_t<T> const& d,
point_t<T> const& p)
-> orientation
{
matrix<T, 5, 5> const m{
1.0, a.x, a.y, a.z, mag<T>(a),
1.0, b.x, b.y, b.z, mag<T>(b),
1.0, c.x, c.y, c.z, mag<T>(c),
1.0, d.x, d.y, d.z, mag<T>(d),
1.0, p.x, p.y, p.z, mag<T>(p) };
auto const det_value = det<T, 5>(m);
auto const not_equal_to_zero = !eq<T>(det_value, 0.0);
if (det_value > 0.0 && not_equal_to_zero) {
return orientation::positive;
} else if (!not_equal_to_zero) {
return orientation::zero;
} else {
return orientation::negative;
}
}
Not like I claim this is "good" code by any means. But it does the job for now. I also have no idea how to handle floating points numbers. I've been very fortunate so far, all I've tested is nice integral Cartesian grids for triangulation. My code doesn't have any divisions in it so the arithmetic is exact because the tested domain size will be smaller than most GPUs will allow. There's now 8 GB GPUs so I'm actually really happy templates make it possible to easily switch between smaller and larger datatypes.
CUDA also has their version of the STL because it requires it, unfortunately. It's worth it though because Thrust is basically Boost but probably smaller. This entire directory is basically just a bunch of Thrust calls. But Thrust allows me to do things like operate on zip_iterators and higher-order functions allow me write parallel-friendly code.
This one was actually created with the help from the Nvidia forum. Basically, I represent point and tetrahedron associations as a series of aligned arrays. Basically, pa and ta = point association and tetrahedron association which translates into, point pa[idx] is inside or on tetrahedron ta[idx]. We represent the locational relationship with a bit-encoded integer stored in the array la for location association. I decided that when updating associations, it'd be easiest to do stupid offset calculations where dead indices have the value -1. I can partition all 3 arrays at once and get the new size from a Thrust call. Doing this in pure CUDA C would've basically been a nightmare.