This looks pretty trivial.
Code:
#include <cstddef>
#include <cmath>
#include <algorithm>
const std::size_t NUM_POINTS = (sizeof(xy) / sizeof(xy[0]) - 1) / 2;
struct point_with_theta
{
double x, y, theta;
void set(double ax, double ay) {
x = ax;
y = ay;
theta = (x == 0 && y == 0) ? M_PI : std::atan2(y, x);
}
bool operator <(const point_with_theta &o) const {
return theta < o.theta;
}
};
point_with_theta points[NUM_POINTS];
int main()
{
for(std::size_t i = 0; i < NUM_POINTS; ++i) {
points[i].set(xy[i*2], xy[i*2+1]);
}
std::sort(points, points+NUM_POINTS);
// Points are now in the correct order. Feel free to split them into two arrays or plot them.
}
You may have to define M_PI yourself - I always forget if it's predefined.
May not be the fastest solution, but it's probably one of the shortest. Complexity is O(N*log(N)). Cache coherency during sorting depends on the std::sort implementation, but should generally be quite nice, since quicksort is pretty good at that stuff. Cache coherency during the copying should be excellent, unless atan2 needs all the cache for itself.
It may be possible to speed up the copying with some SIMD, but copying the values out is tricky with the interleaved format, so it may not be worth it. (Also, the three doubles are inappropriately aligned for SIMD.) In addition, the x87 FPU directly supports atan, while the SSE engine doesn't. An interesting variation would be to split the three components into separate arrays, but this requires you to write your own sort algorithm or a writable zip_iterator (Boost's zip_iterator isn't writable). If you have the parallel arrays, you can SIMD-walk over them to calculate theta, assuming you can write a proper atan2 with SIMD instructions. The case distinction necessary may make this difficult.
If you use the FPATAN x87 instruction, you can omit the zero-check, as the result of that operation is defined for all-zero arguments.