Need help to convert an one dimensional array into a two dimensional array and print like a matrix.
input: 34 50 2 4 90 33 7 80 9
output: A is a 3x3 matrix
34 50 2
4 90 33
7 80 9
please help :)
Printable View
Need help to convert an one dimensional array into a two dimensional array and print like a matrix.
input: 34 50 2 4 90 33 7 80 9
output: A is a 3x3 matrix
34 50 2
4 90 33
7 80 9
please help :)
What is your idea and what have you tried?
I am pretty new with C programming so not much. I could take an array input as string and convert it into an integer array. but not sure how to make it two dimensional array. Is it possible to input an matrix like this and print the output like following??
Enter matrix A: [1 2 3 ;4 5 6 ;7 8 9 ]
A is a 3*3 matrix:
1 2 3
4 5 6
7 8 9
??
It is generally frowned upon to post a complete solution like this. Since OP is likely dealing with homework of some sort, giving him a solution does not help him learn in the slightest. He will be back next week with his next assignment.
Also, from what i can gather, the solution needs to work with input of any size, not just a 3x3 matrix.
OP:
You need to figure out how large the resulting matrix/2D array is going to be. If a user inputs 9 numbers, it will be 3x3, if a user inputs 16 numbers it will be 4x4. So given n numbers, you need to find a number x such that x * x = n. How would you solve this on paper?
The next step is allocating an appropriate 2D array to store this matrix, for this you will need the functions malloc() and free(). Give it a try and come back if you need help with something specific.
thanks a lot for your help!!!!
C uses row-major order, which means that arrays
have exactly the same storage representation. They have the exact same size in bytes,Code:int a[3][3] = { { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8, 9 }};
int b[9] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
sizeof a == sizeof b == 9 * sizeof (int)and their memory representation is exactly the same,memcmp(a, b, sizeof a) == 0
The way C compilers internally implement two-dimensional arrays, is very simple:
This extends to higher number of dimensions, too. For example,Code:a[row][column] = b[row * columns + column]
Code:#define SIZE1 2
#define SIZE2 3
#define SIZE3 5
#define SIZE4 2
int a[SIZE4][SIZE3][SIZE2][SIZE1];
int *const b = (int *)a;
a[i4][i3][i2][i1] == b[(((i4) * SIZE3 + i3)*SIZE2 + i2)*SIZE1 + i1];
If you intend to experiment with matrices and linear algebra, I recommend the following structure:
which allows you to transpose a matrix with just swapping rows and cols, and rowstep and colstep. Initially, use colstep=1 and rowstep=cols, to have data match the native C array representation.Code:typedef struct {
long rows;
long cols;
long rowstep;
long colstep;
double *data; /* data[rows*rowstep + col*colstep] */
} matrix_t;
If you are familiar with C dynamic memory management, then
allows you to define matrices and vectors that dynamically refer to each others' content. For example, one matrix can be a submatrix of another; or a one-row or one-column matrix (a vector) can be the diagonal elements of a square matrix.Code:typedef struct {
long refcount;
size_t size;
double data[];
} owner_t;
typedef struct {
long rows;
long cols;
long rowstep;
long colstep;
double *data; /* data[rows*rowstep + col*colstep] */
owner_t *owner;
} matrix_t;
For multithreaded programs, the owner_t needs special handling, but can be made quite thread-safe. (In practice, the easiest way to do it is to define helper functions that you must use to (temporarily) copy a matrix. It's all very clean, and very interesting, but only if you are interested in multithreaded programming. Surprisingly, none of the libraries I've yet seen do this, though!)
Although libraries like GNU Scientific Library make linear algebra and matrix and vector manipulation much easier, I believe that writing at least some of the functions and structures needed when you start, helps you understand the limitations and benefits of the libraries later on.
Wait, there's a GNU Scientific Library and you knew about this Nominal and you didn't tell me? So that one topic I had about taking the inverse of a matrix, there's already a routine written by the GNU people and you let me code that anyway? Omg, I was nerd raging so hard about having to figure out how take a stupid inverse. Bah. Oh well. Live and learn, I guess. Also, I am happy I did learn how to do it all by myself but yay, I never ever ever have to do it again!
<3 u Nominal.
Can you give an example of using the owner field?
Also, you supposedly shouldn't end your own type names with _t.
Reference: Reserved Names - The GNU C Library
"Names that end with ‘_t’ are reserved for additional type names."
Then again, they also say:
"Names beginning with a capital ‘E’ followed a digit or uppercase letter may be used for additional error code names."
That's obviously too broad as it disallows, e.g., EAST, which will surely never become an error code name.
BTW, your comment on double *data says rows where it should say row (I fixed it in the above quote).
Code:#include <stdio.h>
#include <stdlib.h>
#define MATRIX(m,r,c) ((m)->data[(r)*(m)->rowstep + (c)*(m)->colstep])
typedef struct {
long refcount;
size_t size;
double data[];
} owner_t;
typedef struct {
long rows, cols;
long rowstep, colstep;
double *data;
owner_t *owner;
} matrix_t;
matrix_t *MatrixNew(long rows, long cols) {
int i;
matrix_t *m = malloc(sizeof *m);
if (!m) return NULL;
m->rows = rows;
m->cols = cols;
m->rowstep = cols;
m->colstep = 1;
m->data = malloc(rows * cols * sizeof *m->data);
if (!m->data) {
free(m);
return NULL;
}
for (i = 0; i < rows * cols; i++)
m->data[i] = 0;
m->owner = NULL;
return m;
}
void MatrixInvert(matrix_t *m) {
long t = m->rows; m->rows = m->cols; m->cols = t;
t = m->rowstep; m->rowstep = m->colstep; m->colstep = t;
}
void MatrixDelete(matrix_t *m) {
free(m->data);
free(m);
}
int main(void) {
matrix_t *m = MatrixNew(3, 4);
if (!m) {fprintf(stderr, "MatrixNew failed\n"); exit(1);}
MATRIX(m, 1, 2) = 42;
printf("%.2f\n", MATRIX(m, 1, 2));
MatrixInvert(m);
printf("%.2f\n", MATRIX(m, 2, 1));
MatrixDelete(m);
return 0;
}
Good catch! Absolutely right. As you had fixed in the quote, the comment should be /* data[row*rowstep + col*colstep] */.
It is true that the standard C library developers recommend that, but it's a "soft" recommendation, a warning of what kind of names they might need to reserve in the future. In other words, I read their recommendation as "if we define a new type, it will be named with a _t suffix; if we define a new error code, it will be defined with an E prefix", and so on.
For libraries, say for example "Nominal Animal's Library", or nal, the recommended type names are nal_matrix and nal_owner. The library abbreviation prefix helps avoid namespace pollution, although I have a nagging feeling nal_ is already used .. :)
For example code, I often use _t suffix for types to make the code easier to read.
If you have a better type name prefix or suffix I should use instead -- as some kind of suffix or prefix is needed to distinguish type names from variable names, making example code easier to read --, I'd be more than happy to hear suggestions! ;)
Sure. For simplicity, let's keep this to single-threaded code.
That is, you can use these safely from multiple threads, as long as you manipulate a matrix (or shared matrices; that is, owners) from a single thread at a time.
First, we need the types. I'll also define safe accessors, in cases where you might have out-of-bounds column or row numbers:
The getter and setter are more for documentation, rather than actual use.Code:#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
typedef struct {
long refcount;
size_t size;
double data[];
} nal_owner;
typedef struct {
long rows;
long cols;
long rowstep;
long colstep;
double *data;
nal_owner *owner;
} nal_matrix;
/* Safe matrix accessor (getter).
* Returns 0.0 if row,col is outside the matrix m, otherwise the matrix element.
*/
double nal_matrix_get(const nal_matrix *const m, const long row, const long col)
{
if (row >= 0L && col >= 0L && m && row < m->rows && col < m->cols && m->data)
return m->data[row * m->rowstep + col * m->colstep];
else
return 0.0;
}
/* Safe matrix accessor (setter).
* Returns 0.0 if row,col is outside the matrix m, otherwise the value set.
*/
double nal_matrix_set(nal_matrix *const m, const long row, const long col, const double value)
{
if (row >= 0L && col >= 0L && m && row < m->rows && col < m->cols && m->data)
return m->data[row * m->rowstep + col * m->colstep] = value;
else
return 0.0;
}
Next, we need a function that creates a data owner of requested size. This one tries to verify that the size does not overflow, but as I rewrote this one from scratch, it might be buggy.
Every time a matrix is no longer needed, it must be destroyed. Here is the destructor. Note that the owner is only destroyed when the refcount drops to zero.Code:/* Create a new owner structure, large enough to hold a matrix of the specified size.
*/
static nal_owner *nal_owner_create(const long row_count, const long col_count)
{
const size_t rows = (size_t)row_count,
cols = (size_t)col_count;
const size_t size = rows * cols;
const size_t bytes = size * sizeof (double);
const size_t total = bytes + sizeof (nal_owner);
nal_owner *owner;
/* Size too small? */
if (row_count < 1L || col_count < 1L) {
errno = EINVAL;
return NULL;
}
/* Size too large? */
if (size / rows != cols ||
bytes / sizeof (double) != size ||
bytes >= total) {
errno = ENOMEM;
return NULL;
}
owner = malloc(total);
if (!owner) {
errno = ENOMEM;
return NULL;
}
owner->refcount = 1L; /* Referenced by caller! */
owner->size = size; /* row_count*col_count doubles */
return owner;
}
When an independent matrix is created, the associated owner structure is also created:Code:/* Free/destroy a matrix.
* Does not affect other matrices sharing the data.
*/
void nal_matrix_destroy(nal_matrix *const m)
{
if (!m)
return;
if (m->owner)
if (m->owner->refcount-- < 1L) {
m->owner->size = 0;
free(m->owner);
}
m->rows = 0L;
m->cols = 0L;
m->rowstep = 0L;
m->colstep = 0L;
m->data = NULL;
m->owner = NULL;
}
Certain operations, like transpose, are trivial:Code:/* Create a new independent matrix.
* Returns 0 if success, errno error code otherwise
* (EINVAL or ENOMEM).
*/
int nal_matrix_create(nal_matrix *const m, const long rows, const long cols)
{
nal_owner *owner;
if (!m)
return errno = EINVAL;
m->rows = 0L;
m->cols = 0L;
m->rowstep = 0L;
m->colstep = 0L;
m->data = NULL;
m->owner = NULL;
if (rows < 1L || cols < 1L)
return errno = EINVAL;
owner = nal_owner_create(rows, cols);
if (!owner)
return errno;
m->rows = rows;
m->cols = cols;
m->rowstep = cols;
m->colstep = 1;
m->data = owner->data;
m->owner = owner;
return 0;
}
We can create a linked duplicate of a matrix; that is, create a matrix that shares its data with another matrix:Code:/* Transpose matrix m.
* Returns 0 if success, errno otherwise.
*/
int nal_matrix_transpose(nal_matrix *const m)
{
if (m) {
const long rows = m->rows,
cols = m->cols,
rowstep = m->rowstep,
colstep = m->colstep;
m->rows = cols;
m->cols = rows;
m->rowstep = colstep;
m->colstep = rowstep;
return 0;
} else
return errno = EINVAL;
}
As you can see, duplicating a matrix, or referring to the data in a matrix from another matrix, involves just incrementing the refcount of the data owner.Code:/* Duplicate matrix, keeping the data shared between the two.
*/
int nal_matrix_dup(nal_matrix *const d, const nal_matrix *const m)
{
if (!d)
return errno = EINVAL;
if (!m || m->rows < 1L || m->cols < 1L || !m->owner || m->owner->refcount < 1L) {
d->rows = 0L;
d->cols = 0L;
d->rowstep = 0L;
d->colstep = 0L;
d->data = NULL;
d->owner = NULL;
return errno = ENOENT;
}
m->owner->refcount++;
d->rows = m->rows;
d->cols = m->cols;
d->rowstep = m->rowstep;
d->colstep = m->colstep;
d->data = m->data;
d->owner = m->owner;
return 0;
}
In a nutshell: the owner owns the data, and the matrices just refer to it.
A plain duplicate/shared view is very dull; that's why we have rowstep and colstep. They allow very interesting shared matrices to be created:
While nal_matrix_row_of(), nal_matrix_column_of(), and nal_matrix_diagonal_of() could be easily written as a wrappers around nal_matrix_part_of(), I wanted to have a couple more examples of interesting shared matrix manipulations, in case it would help others to understand the logic.Code:/* Set d to be a row vector of matrix m.
* Returns 0 if success, errno otherwise.
*/
int nal_matrix_row_of(nal_matrix *const d, const nal_matrix *const m, const long row)
{
if (!d)
return errno = EINVAL;
d->rows = 0L;
d->cols = 0L;
d->rowstep = 0L;
d->colstep = 0L;
d->data = NULL;
d->owner = NULL;
if (!m || !m->owner || m->owner->size < 1)
return errno = EINVAL;
if (row < 0L || row >= m->rows)
return errno = EINVAL;
m->owner->refcount++;
d->rows = 1;
d->cols = m->cols;
d->rowstep = 0;
d->colstep = m->cols;
d->owner = m->owner;
d->data = m->data + row * m->rowstep;
return 0;
}
/* Set d to be a column vector of matrix m.
* Returns 0 if success, errno otherwise.
*/
int nal_matrix_column_of(nal_matrix *const d, const nal_matrix *const m, const long col)
{
if (!d)
return errno = EINVAL;
d->rows = 0L;
d->cols = 0L;
d->rowstep = 0L;
d->colstep = 0L;
d->data = NULL;
d->owner = NULL;
if (!m || !m->owner || m->owner->size < 1)
return errno = EINVAL;
if (col < 0L || col >= m->cols)
return errno = EINVAL;
m->owner->refcount++;
d->rows = m->rows;
d->cols = 1;
d->rowstep = m->rowstep;
d->colstep = 0;
d->owner = m->owner;
d->data = m->data + col * m->colstep;
return 0;
}
/* Set d to be a part of matrix m.
* Returns 0 if success, errno otherwise.
*/
int nal_matrix_part_of(nal_matrix *const d, const nal_matrix *const m,
const long row, const long col,
const long rows, const long cols,
const long down_rowstep, const long down_colstep,
const long right_rowstep, const long right_colstep)
{
const long lastrow = rows - 1L,
lastcol = cols - 1L;
const long upper_right_row = row + lastcol * right_rowstep,
upper_right_col = col + lastcol * right_colstep,
lower_left_row = row + lastrow * down_rowstep,
lower_left_col = col + lastcol * down_colstep,
lower_right_row = row + lastrow * down_rowstep + lastcol * right_rowstep,
lower_right_col = col + lastrow * down_colstep + lastcol * right_colstep;
if (!d)
return errno = EINVAL;
d->rows = 0L;
d->cols = 0L;
d->rowstep = 0L;
d->colstep = 0L;
d->data = NULL;
d->owner = NULL;
if (!m || !m->owner || m->owner->size < 1)
return errno = EINVAL;
/* Empty? */
if (rows < 1L || cols < 1L)
return errno = EINVAL;
/* Is the entire part within the matrix m? */
if (row < 0L || row >= m->rows ||
col < 0L || col >= m->cols ||
upper_right_row < 0L || upper_right_row >= m->rows ||
upper_right_col < 0L || upper_right_col >= m->cols ||
lower_left_row < 0L || lower_left_row >= m->rows ||
lower_left_col < 0L || lower_left_col >= m->cols ||
lower_right_row < 0L || lower_right_row >= m->rows ||
lower_right_col < 0L || lower_right_col >= m->cols)
return errno = EINVAL;
m->owner->refcount++;
d->rows = rows;
d->cols = cols;
d->rowstep = m->rowstep * down_rowstep + m->colstep * down_colstep;
d->colstep = m->rowstep * right_rowstep + m->colstep * right_colstep;
d->data = m->data + row * m->rowstep + col * m->colstep;
d->owner = m->owner;
return 0;
}
/* Set d to be the diagonal of matrix m.
* Returns 0 if success, errno otherwise.
*/
int nal_matrix_diagonal_of(nal_matrix *const d, const nal_matrix *const m)
{
if (!d)
return errno = EINVAL;
d->rows = 0L;
d->cols = 0L;
d->rowstep = 0L;
d->colstep = 0L;
d->data = NULL;
d->owner = NULL;
if (!m || !m->owner || m->owner->size < 1)
return errno = EINVAL;
m->owner->refcount++;
if (m->rows <= m->cols)
d->rows = m->rows;
else
d->rows = m->cols;
d->cols = 1;
d->rowstep = m->rowstep + m->colstep;
d->colstep = 0;
d->owner = m->owner;
d->data = m->data;
return 0;
}
To help debugging stuff, we'll need to print the matrices:
A common format for matrices is to have the number of rows and number of columns precede the actual data. Here is a function that reads such matrices:Code:void nal_matrix_fprint(FILE *const out, const nal_matrix *const m,
const char *const prefix,
const char *const format,
const char *const suffix)
{
if (out && m && m->data) {
long r, c;
for (r = 0L; r < m->rows; r++) {
if (prefix && *prefix)
fputs(prefix, out);
for (c = 0L; c < m->cols; c++) {
if (c > 0L)
fputc(' ', out);
fprintf(out, format, m->data[r * m->rowstep + c * m->colstep]);
}
if (suffix && *suffix)
fputs(suffix, out);
fputc('\n', out);
}
}
}
For illustration, here is a main() that reads a matrix (number of rows, number of columns, followed by each matrix element in row-major order, i.e. western written test order) from standard input, and manipulates it a bit.Code:int nal_matrix_fread(FILE *const in, nal_matrix *const m)
{
long rows, cols, row, col;
if (!in || !m)
return errno = EINVAL;
if (fscanf(in, " %ld %ld", &rows, &cols) != 2 || rows < 1L || cols < 1L)
return errno = EDOM;
if (nal_matrix_create(m, rows, cols))
return errno;
for (row = 0L; row < rows; row++)
for (col = 0L; col < cols; col++)
if (fscanf(in, " %lf", &m->data[row * m->rowstep + col * m->colstep]) != 1)
return errno = EDOM;
return 0;
}
As long as you nal_matrix_destroy() every matrix you create, you don't need to worry about which matrices are shared; there is no distinction between shared and unique matrices.Code:int main(void)
{
nal_matrix m, m1, m2, m3;
long i;
/* Read a matrix from standard input.
* It is preceded by the number of rows, and the number of cols.
*/
if (nal_matrix_fread(stdin, &m)) {
if (errno == EDOM)
fprintf(stderr, "Invalid matrix data.\n");
else
fprintf(stderr, "%s.\n", strerror(errno));
return 1;
}
printf("Read a %ld-row, %ld-column matrix:\n", m.rows, m.cols);
nal_matrix_fprint(stdout, &m, "\t[ ", "%9.4f", " ]");
printf("\n");
if (nal_matrix_diagonal_of(&m1, &m)) {
fprintf(stderr, "nal_matrix_diagonal_of(): %s.\n", strerror(errno));
return 1;
}
printf("Column vector of diagonal elements:\n");
nal_matrix_transpose(&m1);
nal_matrix_fprint(stdout, &m1, "\t[ ", "%9.4f", " ]");
printf("\n");
if (nal_matrix_dup(&m2, &m)) {
fprintf(stderr, "nal_matrix_dup(): %s.\n", strerror(errno));
return 1;
}
if (!nal_matrix_part_of(&m3, &m, m.rows - 2, m.cols - 2,
m.rows - 2, m.cols - 2,
-1, 0,
0, -1)) {
printf("Center elements mirrored:\n");
nal_matrix_fprint(stdout, &m3, "\t[ ", "%9.4f", " ]");
printf("\n");
nal_matrix_destroy(&m3);
}
if (!nal_matrix_part_of(&m3, &m, 0, m.cols / 2, 1+m.rows/2, 1+m.cols/2, 1, -1, 1, 1)) {
printf("Checkerboard odd tiles, rotated 45 degrees:\n");
nal_matrix_fprint(stdout, &m3, "\t[ ", "%9.4f", " ]");
printf("\n");
nal_matrix_destroy(&m3);
}
nal_matrix_destroy(&m);
for (i = 0L; i < m1.cols; i++)
m1.data[i * (m1.rowstep + m1.colstep)] = -m1.data[i * (m1.rowstep + m1.colstep)];
printf("Original matrix transposed, diagonal elements negated:\n");
nal_matrix_transpose(&m2);
nal_matrix_fprint(stdout, &m2, "\t[ ", "%9.4f", " ]");
printf("\n");
nal_matrix_destroy(&m2);
printf("Negated diagonal elements as a column vector:\n");
nal_matrix_fprint(stdout, &m1, "\t[ ", "%9.4f", " ]");
printf("\n");
nal_matrix_destroy(&m1);
return 0;
}
If thread safety is wanted, then the owner structures need to be chained, so that instead of immediately freed, they are only moved to a garbage chain. A couple of functions are used to manipulate an atomic counter (or semaphore, although semaphores may have a pretty low maximum count), say nal_lock() and nal_unlock(), which are used around accesses to matrix data via a temporary variable (and inside many of the library functions above); the garbage chain is then emptied whenever nal_unlock() is called without any other nal_lock()s pending. Some fields, like owner refcount, need either locking or atomic accesses, too.
More complex operations, like matrix multiplication, benefits from copying the matrix data to temporary storage that optimizes access patterns. The overhead of the above nal_matrix structure compared to those used by most libraries like GSL, that have only a stride (index = column + row * stride), seems to be neglible for all but smallest matrices.
If performance is important, smallest matrices -- especially 2x2, 3x3, 4x4 and 3x4 -- are best special-cased anyway.
The above example code has lots of defensive setting/clearing of structure fields that can safely be omitted in a real library. There are also lots of optimizations that can be done, and probably quite a few bugs too; I did just rewrite the above from scratch. (I'm terrible at finding copies of stuff I've already posted. I know I've posted the above already, but I just couldn't find it.)
Everyone is absolutely free to use the code shown in any way they like. I don't think it is creative enough to be copyrightable, but if it is, then it is free for any use, as long as you don't blame me if it breaks.
Questions?
Thanks, Titular Beast!
I'll play with the code and get back to you.
It may be a while....
Finally took a look through your code. Very nice. Very clear. I didn't find any bugs.
I get the basic idea. nal_owner owns the actual data. nal_matrix points into this data and uses its strides to determine the elements and order. As you say, "In a nutshell: the owner owns the data, and the matrices just refer to it."
I think a little output is called for.
I changed a couple places where you said "column" to "row".
And clearly I changed the format specifier to make the output neater.
I was looking at the following list of open-source matrix algebra software. It's interesting how many of them are written (at least in part) in Fortran. I've never looked at that language.Code:Enter matrix data:
5 5 1 2 3 4 5 6 7 8 9 0 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Read a 5-row, 5-column matrix:
[ 1 2 3 4 5 ]
[ 6 7 8 9 0 ]
[ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
Column vector of diagonal elements:
[ 1 ]
[ 7 ]
[ 13 ]
[ 19 ]
[ 25 ]
Row vector of diagonal elements:
[ 1 7 13 19 25 ]
Center elements mirrored:
[ 19 18 17 ]
[ 14 13 12 ]
[ 9 8 7 ]
Checkerboard odd tiles, rotated 45 degrees:
[ 3 9 15 ]
[ 7 13 19 ]
[ 11 17 23 ]
Original matrix transposed, diagonal elements negated:
[ -1 6 11 16 21 ]
[ 2 -7 12 17 22 ]
[ 3 8 -13 18 23 ]
[ 4 9 14 -19 24 ]
[ 5 0 15 20 -25 ]
Negated diagonal elements as a row vector:
[ -1 -7 -13 -19 -25 ]
http://www.netlib.org/utk/people/Jac...rra/la-sw.html
A large majority of scientific programs that use matrix algebra rely on BLAS and LAPACK if you dig deep enough. There are many implementations, of course, like Intel Math Kernel Library, AMD Core Math Library, ATLAS, and OpenBLAS, that provide the same interfaces (or similar enough to use via wrappers).
Fortran is very traditional language for scientists. It does have certain features, like array slicing (and internal array slice representation), and most functions can assume unaliased data (comparable to GNU __restrict or C99 restrict keywords), which cause many to claim that Fortran is faster than C for scientific calculations; the reality is quite a bit more complicated. (Simply put, you need real C skills to produce C that beats Fortran for typical linear algebra and matrix algebra operations. So, usually it is more efficient to let human scientists use Fortran instead.)
One can achieve huge time savings for example for matrix-matrix multiplication via pre-copying the data into optimal order, and then using SSE2/3/4 or AVX vectorization (via GNU-style vectorized types; so no need for intrinsics, and no ties to any specific architecture!) with current-generation CPUs.
Unfortunately, the other BLAS/LAPACK library interfaces I mentioned only support one data order. The listed implementations are also more focused on using multiple cores for each task, than optimizing the efficiency of the single-core functions. The assumption seems to be that each CPU (as opposed to CPU core) is dedicated for a single computation task -- and that's idiotic. We can have our CPUs do much more work, if we keep data in constant flow at all levels, and that means doing more than one task per CPU, and sometimes even more than one task per CPU core. Adding internal threads, OpenMP-like, will just never yield maximum performance! I also suspect that Intel's and AMD's "turbo modes", temporarily overclocking cores of parts of cores, is an extension of that stupid paradigm. Looking at the history of CPU and microcontroller evolution, that kind of approach does seem to have a finite life expectancy. (As Linux kernel developers say, "policy" belongs in userspace, not in Kernel, and definitely not in the CPU itself.)
There are a lot of extensions and warts to the above that would make it much easier to implement matrix algebra in C. For example, optional per-thread error handler stacks to avoid having to check return values for every single operation (as a stack so more complex stacks can easily set their own temporary handler, and clean up afterwards), or pool-based allocations so that all memory related to a specific task can be destroyed at once instead of each entity at a time, and per-thread work areas allocated from the current per-thread pool but reused until the pool is destroyed. All this should not only boost the performance of the code, but also the productivity of the programmer.
Practical thread safety requires a "lock" to be taken while holding a temporary reference to matrix data, something like
Similarly, if a programmer wants to use a temporary pointer to the data of a matrix, they just need to use the two calls around the lifetime of that reference.Code:int nal_matrix_matrix_multiply_big(nal_matrix *const result, const nal_matrix *const left, const nal_matrix *const right)
{
... check left and right matrix dimensions, allocate result matrix and temporary work areas ...
nal_lock();
... copy contents of left and right to temporary storage in optimum order ...
nal_unlock();
... do computation, save result in result ...
}
The two functions do not actually hold any lock, they simply protect against destroying an owner while there might be temporary references to it. Implementation is either something like a mutex lock and unlock in both calls (no lock held in between), or possibly an atomic counter + mutex or futex in the contended case; very lightweight, anyway.
Adding a memfill(ptr, size, count), which is the opposite case of memmove() wrt. memcpy(), would help initializing the matrices and vectors efficiently. Typical loops with double assignments happen to be quite slow in practice. ([I]memfill() copies the first size bytes to the following (count-1)*size bytes.)
I believe the above are enough to give a good boost in code speed, while being simple enough for Fortran-coders to switch to C (or even to C++, as the C library should be easily interfaced to C++ too), but the socio-historical-political inertia in academia has completely thwarted all my attempts of introducing this.
If there is further interest in this, feel free to ask for details or even PM me; I'd be more than happy to help or elaborate further. But, I don't think I have the strength to fight against windmills to actually introduce, push, and maintain such a library. It is amazing how much even scientists instinctively avoid change. :(