So I figured out that
gcc -D_GNU_SOURCE -std=c99 -fopenmp -Wall *.c -lm
will compile everything.
Here are some things I suggest you look at.
1. Make some functions out of that 350 line main() of yours. Especially make the code sections you want to be parallel into separate functions so you know for sure the minimal set of local variables which need to be marked private in your #pragmas.
2. Before you even got to this point, you should have run the entire code through a memory checker such as valgrind to check for any out of bound accesses on all your allocated memory.
3. There are over 20 omp pragmas in the code. This is too many for you to suddenly realise "it doesn't work". Start with one obvious case where the code can be made parallel and work from there.
4. Are you printing any results inside parallelised sections? Because stdio (or FILE* based I/O) may not be thread safe.
Regarding global variables.
These are your globals at the moment.
Code:
$ for i in *.o ; do nm -A $i | egrep -v ' [UTt] ' ; done
lmmin_int64.o:0000000000000000 R lm_control_double
lmmin_int64.o:0000000000000060 R lm_control_float
lmmin_int64.o:0000000000000000 D lm_infmsg
lmmin_int64.o:0000000000000080 D lm_shortmsg
lmmin_int64.o:00000000000000e8 d p1.4216
lmmin_int64.o:0000000000000750 r __PRETTY_FUNCTION__.4283
It would be an idea to turn those D variables (initialised data) into read-only data.
Code:
$ git diff
diff --git a/lmmin_int64.c b/lmmin_int64.c
index a78e580..485420e 100644
--- a/lmmin_int64.c
+++ b/lmmin_int64.c
@@ -113,7 +113,7 @@ const lm_control_struct lm_control_float = {
/* Message texts (indexed by status.info) */
/******************************************************************************/
-const char* lm_infmsg[] = {
+const char* const lm_infmsg[] = {
"found zero (sum of squares below underflow limit)",
"converged (the relative error in the sum of squares is at most tol)",
"converged (the relative error of the parameter vector is at most tol)",
@@ -128,7 +128,7 @@ const char* lm_infmsg[] = {
"stopped (break requested within function evaluation)",
"found nan (function value is not-a-number or infinite)"};
-const char* lm_shortmsg[] = {
+const char* const lm_shortmsg[] = {
"found zero",
"converged (f)",
"converged (p)",
@@ -652,7 +652,7 @@ void lm_lmpar(const int64_t n, double* r, const int64_t ldr, const int64_t* Pivo
int64_t i, iter, j, nsing;
double dxnorm, fp, fp_old, gnorm, parc, parl, paru;
double sum, temp;
- static double p1 = 0.1;
+ static const double p1 = 0.1;
/*** Compute and store in x the Gauss-Newton direction. If the Jacobian
is rank-deficient, obtain a least-squares solution. ***/
diff --git a/lmstruct_int64.h b/lmstruct_int64.h
index a98a3c0..0ae00b1 100644
--- a/lmstruct_int64.h
+++ b/lmstruct_int64.h
@@ -113,8 +113,8 @@ extern const lm_control_struct lm_control_float;
/* Preset message texts. */
-extern const char* lm_infmsg[];
-extern const char* lm_shortmsg[];
+extern const char* const lm_infmsg[];
+extern const char* const lm_shortmsg[];
__END_DECLS
#endif /* LMSTRUCT_H */
Which gives us
Code:
$ for i in *.o ; do nm -A $i | egrep -v ' [UTt] ' ; done
lmmin_int64.o:0000000000000000 R lm_control_double
lmmin_int64.o:0000000000000060 R lm_control_float
lmmin_int64.o:00000000000003e0 R lm_infmsg
lmmin_int64.o:0000000000000500 R lm_shortmsg
lmmin_int64.o:0000000000000848 r p1.4216
lmmin_int64.o:0000000000000850 r __PRETTY_FUNCTION__.4283
This will ensure that any shared data is really read-only.
The __PRETTY_FUNCTION__ is caused by your use of assert.
What about this?
Code:
void free_filter(bessel *lpfilter)
{
free(lpfilter->dcof);
free(lpfilter->ccof);
#pragma omp parallel
{
free(lpfilter->temp[omp_get_thread_num()]);
}
free(lpfilter->temp);
free(lpfilter);
}
Is a barrier implied before you get to free(lpfilter->temp) or not?
If there isn't a barrier, then the code is broken.
> I don't have anything resembling a MWE, but without even knowing where to start looking for the bug I wouldn't know how to put one together.
But you do have some test data files and config.txt (or whatever else is necessary to make a test run).
Anything is better than nothing.