# Thread: Select Slices of a Multidimensional Array?

1. ## Select Slices of a Multidimensional Array?

Hi all. I'm a biologist by training but now work as a "computational biologist" on an open-source science project (written in C) for a national lab, so I apologize in advance if this question has an obvious answer.

One of the operations we want to support is to have a "consumer" copy out arbitrary slices of a multidimensional array from a "producer". For example, let's suppose we have a 4x5 2D array
 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

These are exposed to the library as a linear array of size 20: 10, 20, 30, 40, 50, 60,..., 200

The user code (the consumer) passes in offsets and counts (basically coordinates) that it wants to select:

Code:
```start[2] = {0, 3}; //(x start, y start)
count[2] = {3, 2};  //(x count, y count)
```
This means that for the x dimension, starting at the 0 position, give me 3 (x coordinate ranges are [0:2], and for the y dimension, starting at position 2, give me 2 (y coordinate ranges then are [3,4]).

This should result in:
130, 140, 150, 170, 180, 190

copied into the users buffer (which will be the length of 6 integers).

So what is known: we know the dimension sizes of the array (4x5), the number of dimensions (2), and the "coordinates" that the user wants.
The number of array dimensions can be any number... 1, 2, 3.. 6? 100? (i.e., x[10][5][5][30]...[x]..) as it's quite common in scientific apps to have very large dimensions for your arrays.
This is C code, and I honestly can't think of an algorithm and turn it into code to solve the problem. I come from a biology background, so I'm not too experienced with the algorithmic thinking side of coding.
Does anyone have any suggestions on how to solve this? Much help is appreciated!

Edit: the array dimensions, and number of dimensions, is not known at compile time. The problem is trivial if I write specific functions for 2d arrays, 3d arrays, but I feel like there should be a general solution to this that will work for any number of dimensions.

Edit 2: edited the text for better clarification.

2. Originally Posted by gzhispanic
Hi all. I'm a biologist by training but now work as a "computational biologist" on an open-source science project (written in C) for a national lab, so I apologize in advance if this question has an obvious answer.

One of the operations we want to support is to have a "consumer" copy out arbitrary slices of a multidimensional array from a "producer". For example, let's suppose we have a 4x5 2D array
 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

These are exposed to the library as a linear array of size 20: 10, 20, 30, 40, 50, 60,..., 200

The user code (the consumer) passes in offsets and counts (basically coordinates) that it wants to select:

Code:
```start[2] = {0, 2}; //(x start, y start)
count[2] = {3, 2};  //(x count, y count)
```
This means that for the x dimension, starting at the 0 position, give me 3 (x coordinate ranges are [0:2], and for the y dimension, starting at position 2, give me 2 (y coordinate ranges then are [3,4]).

This should result in:
130, 140, 150, 170, 180, 190

copied into the users buffer (which will be the length of 6 integers).

So what is known: we know the dimension sizes of the array (4x5), the number of dimensions (2), and the "coordinates" that the user wants.
The array dimensions can be any number of dimensions... 1, 2, 3.. 6? 100? as it's quite common in scientific apps to have very large dimensions for your arrays.
This is C code, and I honestly can't think of an algorithm and turn it into code to solve the problem. I come from a biology background, so I'm not too experienced with the algorithmic thinking side of coding.
Does anyone have any suggestions on how to solve this? Much help is appreciated!

Edit: the array dimensions, and number of dimensions, is not known at compile time. The problem is trivial if I write specific functions for 2d arrays, 3d arrays, but I feel like there should be a general solution to this that will work for any number of dimensions.
If I understand this correctly, You're wanting to print out an entire row depending on the column offset. What I would probably do is allocate memory space for whatever it is that you're doing that makes it so it's not statically declared. Then, I would calculate what the maximum column is and the maximum row and define them as constants. Once you've done that, you can get user input with two numbers or however many coordinates you need, and loop on that row until you hit the maximum column.

3. Thanks for the reply. However, I don't fully understand what you mean. The real code isn't restricted to just 2D arrays; I just wrote it as such because it's an easier example to put in text.

4. What is the representation of these arbitrary dimensional arrays in your project?

I would think the generic function will be easier to write as recursive.

Code:
```N  - dimension
start[N] = {s0,s1,...sN-1}
count[N] = {c0,c1,...cN-1}

f(X,N,count,start)
{
for(x=s0;x<s0+c0;x++)
Append(f(X[x],N-1,start+1,count+1);
}```
X would be N dimensional array
X[x] will be N-1 dimensional sub-array at index x

And recursion will stop at N == 1 with trivial memmove

So the question is - Is it possible in your representation to pass X and X[x] as the same parameter to the function?

5. Originally Posted by vart
What is the representation of these arbitrary dimensional arrays in your project?

I would think the generic function will be easier to write as recursive.

Code:
```N  - dimension
start[N] = {s0,s1,...sN-1}
count[N] = {c0,c1,...cN-1}

f(X,N,count,start)
{
for(x=s0;x<s0+c0;x++)
Append(f(X[x],N-1,start+1,count+1);
}```
X would be N dimensional array
X[x] will be N-1 dimensional sub-array at index x

And recursion will stop at N == 1 with trivial memmove

So the question is - Is it possible in your representation to pass X and X[x] as the same parameter to the function?
Thanks for the quick reply! I agree, recursion is definitely the way to go here.

As to your first question: When the arrays get passed into the library, they're just standard C arrays. My understanding is that in C, arrays are linear, so in my above example, it would be:
10, 20, 30, 40, 50, 60, 70.....200

They pass the array in, and they tell us how many dimensions there are (in this case 2), and the sizes of each dimensions (4,5). It is a generalized library that is used by quite a large number of scientific applications, so the code beneath it can't really be too specific.

As to your second question: I'm not fully understanding the concept of X[x] yet. Suppose x = 1. In the example I posted, would X[x] then be 50,50,70,80?

6. Edit: Duplicate post.

7. Originally Posted by gzhispanic
Suppose x = 1. In the example I posted, would X[x] then be 50,50,70,80?
50,60,70,80

8. You are right, 50,60,70,80.

I am not sure though how I would code the X[x] idea to work for any number of dimensions.

9. Hmm, actually, I think I understand how this works. I think you might be on to something. Let me try to wrap my head around this a bit more. I think this is on the write track!!

10. Originally Posted by vart
What is the representation of these arbitrary dimensional arrays in your project?

I would think the generic function will be easier to write as recursive.

Code:
```N  - dimension
start[N] = {s0,s1,...sN-1}
count[N] = {c0,c1,...cN-1}

f(X,N,count,start)
{
for(x=s0;x<s0+c0;x++)
Append(f(X[x],N-1,start+1,count+1);
}```
X would be N dimensional array
X[x] will be N-1 dimensional sub-array at index x

And recursion will stop at N == 1 with trivial memmove

So the question is - Is it possible in your representation to pass X and X[x] as the same parameter to the function?
So, I have a question, and it's probably a stupid one. If an array is passed to you as just a pointer (i.e., int *array) as in this case, would X[1] then be the value 20 from my example?

So basically, I am writing some extensions beneath this API. The API receives these arrays as just straightup pointers (i.e., int *array) and the number of dimensions, and the dimension sizes. Essentially, it's

Code:
`write_array(void *array, int ndims, int *dim_sizes, int type);`
There are a fixed number of types that can be passed in (i.e., uint64_t, double, float, int, etc, etc), so I can cast array as needed. I left out that detail because I did not think it was relevant, so that may have been a mistake on my part

(I guess it is indeed important to know how the n-dimension array gets passed in via a single pointer! My apologies!)

11. Originally Posted by gzhispanic
copy out arbitrary slices of a multidimensional array from a "producer"
That is certainly doable, although the arbitrary number of dimensions is a bit of a pain.

Is there a reason you punish yourself with linear arrays? Or is this just code that is being converted from Fortran to C? Or are you or your colleagues more proficient in Fortran than C? (I'm not teasing or critizising you, I'm just trying to understand why you're going in this direction.)

I would use a data structure instead:
Code:
```#define _ISO_C99_SOURCE
typedef double data_t;

typedef struct {
long    size;
long    refcount;
data_t  data[];
} owner_t;

typedef struct {
long    dimensions;
long    *size;
long    *step;
owner_t *owner;
data_t  *data;
} array_t;

/* Each array_t variable needs to be initialized via
*     array_t  a = ARRAY_INIT;
* or
*     array_t  a;
*     array_init(&a);
*/
#define ARRAY_INIT { 0, NULL, NULL, NULL, NULL }

static void array_init(array_t *const arrayptr)
{
if (arrayptr)
*arrayptr = (array_t)ARRAY_INIT;
}

/* array_ptr() returns a pointer to the element,
* or 'outside' if the element is outside the array.
*/
static inline data_t *array_ptr(const array_t *const arrayptr, const long coord[], data_t *const outside)
{
long index = 0;
long i;

/* No array? */
if (!arrayptr || arrayptr->dimensions < 1L)
return outside;

for (i = 0; i < arrayptr->dimensions; i++)
if (coord[i] >= 0 || coord[i] < arrayptr->size[i])
index += coord[i] * arrayptr->step[i];
else
return outside;

return arrayptr->origin + index;
}

/* array_get() and array_set() return array_outside,
* if the referred element is outside bounds.
*/
static data_t array_outside = 0.0;

static inline data_t array_get(const array_t *const arrayptr, const long coord[])
{
return *array_ptr(arrayptr, coord, &array_outside);
}

static inline data_t array_set(const array_t *const arrayptr, const long coord[], const data_t value)
{
data_t *const ptr = array_ptr(arrayptr, coord, NULL);
if (ptr)
return *ptr = data;
else
return array_outside;
}```
If you look carefully, you can see that each dimension has their own step. (The step is in data units, not bytes; so 1 means one data_t element.)

This means you can have your data in any order (compare row-major and column-major for 2D arrays), without modifying the data. You can even swap dimensions.

The fact that the data is stored separately from the array means you can have more than one array referring to the same data. (As long as you destroy every array you no longer need, the data is discarded when the last array referring to it is discarded.)

In your case, if your consumers were to consume data in array_t format, then you wouldn't need to copy any data; you would just create another array_t referring to the same data. (These are usually called views, by the way.)

As long as the view is regular, you can use any data order. You can pick any plane, or even a diagonal. This is like C or Python slicing on steroids.

What this data structure cannot do, is fold multidimensional data into lower dimensions. It can trivially select a lesser-dimensional slice, it just cannot fold it; it cannot create a linear sequence from more than one dimension. That's why all consumers would have to consume the array_t format data on input.

12. Originally Posted by Nominal Animal
That is certainly doable, although the arbitrary number of dimensions is a bit of a pain.

Is there a reason you punish yourself with linear arrays? Or is this just code that is being converted from Fortran to C? Or are you or your colleagues more proficient in Fortran than C? (I'm not teasing or critizising you, I'm just trying to understand why you're going in this direction.)

I would use a data structure instead:
Code:
```#define _ISO_C99_SOURCE
typedef double data_t;

typedef struct {
long    size;
long    refcount;
data_t  data[];
} owner_t;

typedef struct {
long    dimensions;
long    *size;
long    *step;
owner_t *owner;
data_t  *data;
} array_t;

/* Each array_t variable needs to be initialized via
*     array_t  a = ARRAY_INIT;
* or
*     array_t  a;
*     array_init(&a);
*/
#define ARRAY_INIT { 0, NULL, NULL, NULL, NULL }

static void array_init(array_t *const arrayptr)
{
if (arrayptr)
*arrayptr = (array_t)ARRAY_INIT;
}

/* array_ptr() returns a pointer to the element,
* or 'outside' if the element is outside the array.
*/
static inline data_t *array_ptr(const array_t *const arrayptr, const long coord[], data_t *const outside)
{
long index = 0;
long i;

/* No array? */
if (!arrayptr || arrayptr->dimensions < 1L)
return outside;

for (i = 0; i < arrayptr->dimensions; i++)
if (coord[i] >= 0 || coord[i] < arrayptr->size[i])
index += coord[i] * arrayptr->step[i];
else
return outside;

return arrayptr->origin + index;
}

/* array_get() and array_set() return array_outside,
* if the referred element is outside bounds.
*/
static data_t array_outside = 0.0;

static inline data_t array_get(const array_t *const arrayptr, const long coord[])
{
return *array_ptr(arrayptr, coord, &array_outside);
}

static inline data_t array_set(const array_t *const arrayptr, const long coord[], const data_t value)
{
data_t *const ptr = array_ptr(arrayptr, coord, NULL);
if (ptr)
return *ptr = data;
else
return array_outside;
}```
If you look carefully, you can see that each dimension has their own step. (The step is in data units, not bytes; so 1 means one data_t element.)

This means you can have your data in any order (compare row-major and column-major for 2D arrays), without modifying the data. You can even swap dimensions.

The fact that the data is stored separately from the array means you can have more than one array referring to the same data. (As long as you destroy every array you no longer need, the data is discarded when the last array referring to it is discarded.)

In your case, if your consumers were to consume data in array_t format, then you wouldn't need to copy any data; you would just create another array_t referring to the same data. (These are usually called views, by the way.)

As long as the view is regular, you can use any data order. You can pick any plane, or even a diagonal. This is like C or Python slicing on steroids.

What this data structure cannot do, is fold multidimensional data into lower dimensions. It can trivially select a lesser-dimensional slice, it just cannot fold it; it cannot create a linear sequence from more than one dimension. That's why all consumers would have to consume the array_t format data on input.
Thanks for the reply. It will take me a bit to wrap my head around this!

The library and API themselves are written in C, but the codes that use it can be C/C++, Fortran, Matlab, Java.. I think they even have some ability to run python codes on it (but don't quote me on that). I'm writing this extension "beneath" this C API.

The arrays from the "producer" CAN come over the wire. I think the library was written so that any array can be written, so that's why they're passed in as a simple void* (the user also specifies which type the elements are, i.e., TYPE_DOUBLE).

I've kind of simplified the problem in my earlier post.. if I can get my head wrapped around some algorithm to solve this, I can have a good starting point when coding it.

I'm going to mull over what you wrote and most likely will have some further questions. I really appreciate everyone's help here!!!

Edit: Also, the user ("consumer) passes in the buffer to which the data is copied. In my example, I believe the user will allocate space for 6 elements and then pass in that. I have to copy data from the producers into the consumer's buffer.

13. Here's another idea for you.
Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void slice2(int *a, int nDims, int *askips, int *b, int *start, int *count, int *bskips) {
int i;
if (nDims == 1)
memcpy(b, a + *start, *count * sizeof(*a));
else
for (i = *start; i < *start + *count; i++)
b + (i-*start) * *bskips, start+1, count+1, bskips+1);
}

void slice(int *a, int nDims, int *dims, int *b, int *start, int *count) {
int i;
int *bskips = malloc(sizeof(*bskips)*(nDims-1));

for (i = nDims - 3; i >= 0; i--)

bskips[nDims-2] = count[nDims-1];
for (i = nDims - 3; i >= 0; i--)
bskips[i] = bskips[i+1] * count[i+1];

slice2(a, nDims, askips, b, start, count, bskips);

free(bskips);
}

void print2(int *a, int nDims, int *dims, int *skips) {
int i;
if (nDims == 1) {
for (i = 0; i < *dims; i++) printf("%3d ", a[i]);
printf("\n");
}
else
for (i = 0; i < *dims; i++)
print2(a + i * *skips, nDims - 1, dims+1, skips+1);
}

void print(int *a, int nDims, int *dims) {
int i;
int *skips = malloc(sizeof(*skips)*(nDims-1));
skips[nDims-2] = dims[nDims-1];
for (i = nDims - 3; i >= 0; i--)
skips[i] = skips[i+1] * dims[i+1];
print2(a, nDims, dims, skips);
printf("\n");
free(skips);
}

int main(void) {
int a[24] = {
0,   1,   2,   3,
10,  11,  12,  13,
20,  21,  22,  23,

100, 101, 102, 103,
110, 111, 112, 113,
120, 121, 122, 123};

int dims[3] = {2, 3, 4};

int b[8];
int start[3] = {0, 1, 1};
int count[3] = {2, 2, 2};

print(a, 3, dims);
slice(a, 3, dims, b, start, count);
print(b, 3, count);

return 0;
}```

14. Originally Posted by gzhispanic
the user ("consumer) passes in the buffer to which the data is copied.
Okay, in that case you do need to copy the data anyway.

I do believe this is a working generic implementation:
Code:
```#include <string.h>

size_t copy(void *const         dstptr,
const void *const   srcptr,
const size_t        unit,
const size_t        dimensions,
const size_t *const size,
const size_t *const start,
const size_t *const count)
{
char *const dst = dstptr;
const char *src = srcptr;
size_t      total = 1;
size_t      index;

/* Count total units to copy, and
* move starting pointer. */
{   size_t stride = unit;
size_t d = dimensions;
while (d-- > 0) {
src += stride * start[d];
stride *= size[d];
total *= count[d];
}
}

/* Copy each element, using index
* as the 'counts'-basis number defining
* the coordinates as offset to src. */
for (index = 0; index < total; index++) {
size_t coords = index;
size_t stride = unit;
size_t offset = 0;
size_t d = dimensions;

/* Decompose coords into count "digits",
* using those as coordinates offset to src. */
while (d-- > 0) {
offset += stride * (coords % count[d]);
coords /= count[d];
stride *= size[d];
}

/* Copy the unit we located. */
memcpy(dst + index * unit, src + offset, unit);
}

}```
It uses the fact that the index of the destination element is
index = CD-1 iD-1 + CD-2 iD-2 + ... + C1 i1 + C0 n0
where D is the number of dimensions, C are the counts of elements along each dimension, and n define the coordinate along that dimension;
0 ≤ n[sub]i[/I] < Ci for 0 ≤ i ≤ D-1, and
0 ≤ index ≤ ∏C - 1

The size of the element in the unit parameter, for example sizeof (double).

Another approach is to use a temporary array for the current coordinates (relative to start). This works like an old-fashioned odometer, and therefore avoids the division and modulus operators; the inner loop is also much shorter, only iterating when the coordinates are reset (i.e. when the "odometer" digits actually change). Otherwise it's very similar -- it just avoids the division+modulus and longer looping for temporary memory used:

Code:
```#include <string.h>

size_t copy(void *const         dstptr,
const void *const   srcptr,
const size_t        unit,
const size_t        dimensions,
const size_t *const size,
const size_t *const start,
const size_t *const count)
{
char       *dst = dstptr;
const char *src = srcptr;
size_t      coord[dimensions];
size_t      total = 1;

/* Count total units to copy, clear coordinates,
* move starting pointer. */
{   size_t stride = unit;
size_t d = dimensions;
while (d-- > 0) {
src += stride * start[d];
stride *= size[d];
total *= count[d];
coord[d] = 0;
}
}

if (total < 1)
return 0;

do {
size_t d = dimensions - 1;
size_t stride = unit;

memcpy(dst, src, unit);
dst += unit;

coord[d]++;
src += stride;

while (d > 0 && coord[d] >= count[d]) {
src += stride * (size[d] - count[d]);
stride *= size[d];
coord[d--] = 0;
coord[d]++;
}
} while (coord[0] < count[0]);

}```
I only lightly tested the above; please test them thoroughly before using in production.

Also, you might wish to add checks in the initial dimension loop, to verify the start coordinate and end coordinates are within the dimension size:
(start[d] >= 0 && start[d] < size[d] && start[d] + count[d] <= size[d])
You might also specify the size of dst, and only do the memcpy() if the data fits in the specified buffer.

Here's the main() I used for testing:
Code:
```#include <stdio.h>
#include <string.h>

/* copy() implementation */

#define N3 5
#define N2 4
#define N1 3

int main(void)
{
double  data[N3][N2][N1];
double  linear[N3*N2*N1];
size_t  size[3] = { N3, N2, N1 };

size_t  start[3] = { 3, 1, 0 };
size_t  count[3] = { 2, 1, 3 };
size_t  n, i;

int     i3, i2, i1;

for (i3 = 0; i3 < N3; i3++)
for (i2 = 0; i2 < N2; i2++)
for (i1 = 0; i1 < N1; i1++)
data[i3][i2][i1] = 111.0 + i3*100.0 + i2*10.0 + i1*1.0;

printf("Original data:\n");
for (i3 = 0; i3 < N3; i3++) {
for (i2 = 0; i2 < N2; i2++) {
for (i1 = 0; i1 < N1; i1++)
printf(" %3g", data[i3][i2][i1]);
printf("\n");
}
printf("\n");
}

n = copy(linear, data, sizeof (double), 3, size, start, count);

printf("start = { %d, %d, %d }\n", (int)start[0], (int)start[1], (int)start[2]);
printf("count = { %d, %d, %d }\n", (int)count[0], (int)count[1], (int)count[2]);
printf("Linear data:\n");
for (i = 0; i < n; i++)
printf(" %3g", linear[i]);
printf("\n\n");

return 0;
}```
Feel free to use any of the above code in any way you wish. I think it is too obvious to be copyrightable (not enough "creative" input; many other C programmers will produce the same code given the same desired behaviour), or else in the public domain (or closest equivalent in your legal system).

15. Originally Posted by oogabooga
Here's another idea for you.
Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void slice2(int *a, int nDims, int *askips, int *b, int *start, int *count, int *bskips) {
int i;
if (nDims == 1)
memcpy(b, a + *start, *count * sizeof(*a));
else
for (i = *start; i < *start + *count; i++)
b + (i-*start) * *bskips, start+1, count+1, bskips+1);
}

void slice(int *a, int nDims, int *dims, int *b, int *start, int *count) {
int i;
int *bskips = malloc(sizeof(*bskips)*(nDims-1));

for (i = nDims - 3; i >= 0; i--)

bskips[nDims-2] = count[nDims-1];
for (i = nDims - 3; i >= 0; i--)
bskips[i] = bskips[i+1] * count[i+1];

slice2(a, nDims, askips, b, start, count, bskips);

free(bskips);
}

void print2(int *a, int nDims, int *dims, int *skips) {
int i;
if (nDims == 1) {
for (i = 0; i < *dims; i++) printf("%3d ", a[i]);
printf("\n");
}
else
for (i = 0; i < *dims; i++)
print2(a + i * *skips, nDims - 1, dims+1, skips+1);
}

void print(int *a, int nDims, int *dims) {
int i;
int *skips = malloc(sizeof(*skips)*(nDims-1));
skips[nDims-2] = dims[nDims-1];
for (i = nDims - 3; i >= 0; i--)
skips[i] = skips[i+1] * dims[i+1];
print2(a, nDims, dims, skips);
printf("\n");
free(skips);
}

int main(void) {
int a[24] = {
0,   1,   2,   3,
10,  11,  12,  13,
20,  21,  22,  23,

100, 101, 102, 103,
110, 111, 112, 113,
120, 121, 122, 123};

int dims[3] = {2, 3, 4};

int b[8];
int start[3] = {0, 1, 1};
int count[3] = {2, 2, 2};

print(a, 3, dims);
slice(a, 3, dims, b, start, count);
print(b, 3, count);

return 0;
}```
Okay, I have to try this. This looks really good, actually, and running the code, I think it works as expected! I am just curious, what are askips and bskips doing here?

Thanks so much!!