There a a number of issues when trying to share arrays betweeen F90 and C/C++. Besides different storage conventions, F90 has a richer array syntax compared to C's very simple address based system. The two key items below are how to create code that loops through the array without being tied to the storage order, and how to interpret F90 arrays (especially array sections) from C/C++.

**A C++ solution**
Code for the example is in `template_example/F90`.

The optimal solution to the ordering problem allows the implementation language to
use identical code within an algorithm to operate efficiently on data that may
be ordered any
way. The syntactic ``holy grail'' is to be able to reference an array (or
array object) as `array[i][j]` and have the resulting data item be the same
whether the array is passed from F90 (and is thus column-major), or passed from
C (and is thus row-major). The semantic richness of C++ allows this to be done
by creating an array abstraction class and overloading the correct operators. For
random access of an array, this is an immediate solution to the ordering
problem. However access through this abstraction adds considerable overhead
when compared to the compile time optimized `array[i][j]` that occurs on a
true array.

There is still hope for an optimized abstraction, however, because random
access is not the usual method of operating on arrays. More common is the
iteration through a row, column, or (more generically) a dimension. Take as an
example matrix multiplication, matrix inverse, transpose, etc... Each of these
algorithms uses an entire dimension, and then moves to the next. The only
difference here between column and row major ordering is that in one the stride
is *one*, whereas it is the product of all previous dimensions in the
other. This is the basis for the iterator abstraction.

We begin by creating an array wrapper/abstraction class, which we can template and re-use for different data types and ranks. When we call a function in our C++ library (matrix multiply is the example here), in our interface we immediately wrap our array:

typedef Array<double, 2> double2d; // Perform mat1*mat2 = matres void FTN(c_mat_mult)(int *dims,double *mat1,double *mat2,double *matres) { double2d arr1(mat1, double2d::F90, dims[0], dims[1]); double2d arr2(mat2, double2d::F90, dims[1], dims[2]); double2d arrres(matres, double2d::F90, dims[0], dims[2]); // ... Use arr1, arr2, arrres for algorithm }

In the C interface we do a similar wrapping, but we pass `double2d::C` as a
parameter to the object to signal row-major ordering. Now in our matrix
multiply we can concentrate on the algorithm and ignore the complexities of
ordering. As a first try we take advantage of C++ operator overloading
and use the `operator[]`. The result is
identical to our pure C implementation:

for (i = 0; i < k; i++) { for (j = 0; j < n; j++) { res = 0; for (l = 0; l < m; l++) { res += (double) arr1[j][l] * (double) arr2[l][i]; } arrres[j][i] = res; } }

Unfortunately the performance of this example is poor, so we introduce a second abstraction mechanism, the dimension iterator. We achieve performance that is not likely to be topped unless the algorithms for column and row ordering are both hand written (a maintenance nightmare):

double2d::dim_iterator m1r, m2c, resi; for (i = 0; i < k; i++) { for(resi = matres.dim_begin(double2d::DIM_ITERATE, i), j = 0; !resi.done(); resi++, j++ ) { res = 0; for (m1r = mat1.dim_begin(j, double2d::DIM_ITERATE), m2c = mat2.dim_begin(double2d::DIM_ITERATE, i); !m1r.done(); m1r++, m2c++) { res += *m1r * *m2c; } *resi = res; } }

The call to `dim_begin` must specify the index of all dimensions except the
one to iterate over, which is specified as a wildcard by `DIM_ITERATE`.
Calling the `operator++` on the iterator advances by the stride value (the
optimal generic solution to the ordering problem). The iterator is overloaded
such that dereferencing it accesses the raw array element (note `*resi =
res` and `res = *m1r + *m2c`).

6.2 Array Sections

real*4 :: a(100,100) call foo(a(2:20:2, 1:10))

The above call passes in a section of the original array; every other row from 2 to 20 and the columns 1 to 10. This
capability is implemented in F90 with what is commonly called *dope vectors*. This generally means that F90 passes
a pointer to a structure describing the array slice. The F90 routine that is passed the array interprets the strides
and sizes of each dimension from the table and operates on the array ``in place'', saving the overhead of copying the
array to a contiguous block, yet allowing the subroutine to still address the array as if it were contiguous.

If the function `foo` (above) is to be implemented in C, there must be a method by which C can discover the various sizes
and strides of the array section passed. One method of doing this is the direct interpretation of the dope vector in
C, i.e.:

typedef struct dope_dec{ /* defines dope vector types for CPQ/DEC f90 compiler */ char rank; /* Number of dimensions */ char r[7]; /* Spacer */ char kind; /* Either 4 or 8 bytes */ char k[7]; /* Spacer */ long addrs; /* Base address of array */ long d[2]; /* Spacer */ long ll[MAX_RANK][3]; /* strides (bytes), length(words) and start of each dimension */ } DOPEDEC, *DOPEDEC_PTR; void foo(DOPEDEC_PTR d) { ... }

When F90 passes an array to another F90 function (or a C function masquerading behind an interface block), a pointer to the structure above is passed. The C function can discover the various sizes and strides of the array from this structure, and the calculations may proceed (you have to be cautious and create an F90 interface block for your C function, otherwise F90 will copy the entire array slice into contiguous memory, causing severe performance degradation).

The weakness of this approach is that each compiler and each platform will have a different structure, and these structures are usually proprietary and only useable through consultation with vendors or experimentation. Keeping track of the various structures on numerous platforms could be difficult and risky.

An alternative solution is to recover the strides and sizes from a set of intermediary F90 functions. This method
begins by creating a module which converts F90 arrays to a C++ class handle. The C++ class is an array descriptor, containing
the strides and sizes of the array, as well as containing methods to iterate through the array. In this module one function
is needed for each array size and type. For instance, there is a function for `real*4` of 1 dimension, another for 2-d,
and so on. The same for `integer`, `real*8`, etc... These functions query the language itself for the information
that the previous method gleaned from the headers:

interface getArrDesc module procedure arrsec_desc1, arrsec_desc2, ... end interface function arrsec_desc1(a) type(arraysection) :: arrsec_desc1 real*4 :: a(:) call arrsec_desc1c(arrsec_desc1, & size(a), a(1), a(2)) end function function arrsec_desc2(a) type(arraysection) :: arrsec_desc2 real*4 :: a(:,:) call arrsec_desc2c(arrsec_desc2, & size(a,1), size(a,2), & a(1,1), a(1,2), a(2,1)) end function function arrsec_desc3(a) type(arraysection) :: arrsec_desc3 real*4 :: a(:,:,:) call arrsec_desc3c(arrsec_desc3, & size(a,1), size(a,2), size(a,3), & a(1,1,1), a(1,1,2), a(1,2,1), a(2,1,1)) end function

Listed here are the functions that convert 1-d, 2-d and 3-d `real*4` arrays to the C++ descriptors. The function calls a C
function which is provided the sizes of each dimension, and then a unit step in each dimension. Since F90 passes scalars
by address,
each array element is passed as an address and the stride in each dimension can be deduced by simple pointer arithmetic:

// C function to dispatch class function void arrsec_desc3c_(arrsec_desc<float> **arr, int *size1, int *size2, int *size3, float *a111, float *a112, float *a121, float *a211) { (*arr) = new arrsec_desc<float>(size1, size2, size3, a111, a112, a121, a211); } // C++ class constructor for 3-d case. Class is overloaded for the different possible number // of dimensions. arrsec_desc(int *size1, int *size2, int *size3, DTYPE *a111, DTYPE *a112, DTYPE *a121, DTYPE *a211) { dim = 3; // Set up the sizes of the dimensions dims[0] = *size1; dims[1] = *size2; dims[2] = *size2; // Register the stride lengths strides[0] = a112 - a111; strides[1] = a121 - a111; strides[2] = a211 - a111; // Get the base pointer base_ptr= a111; }

The C++ class contains the data necessary to navigate the array:

template<class DTYPE> class arrsec_desc { public: // 1-D constructor. arrsec_desc(int *size, DTYPE *a1, DTYPE *a2); // 2-D constructor. arrsec_desc(int *size1, int *size2, DTYPE *a11, DTYPE *a12, DTYPE *a21); // 3-D constructor. arrsec_desc(int *size1, int *size2, int *size3, DTYPE *a111, DTYPE *a112, DTYPE *a121, DTYPE *a211); ~arrsec_desc(); float *get_element_ptr(int i); float *get_element_ptr(int i, int j); float *get_element_ptr(int i, int j, int k); int get_num_dims(); int get_dim_size(int i); private: int dim; // Number of dimensions in array int dims[MAX_DIMS]; // Sizes of each dimension int strides[MAX_DIMS]; // Stride in each dimension DTYPE *base_ptr; // Address of base array section };

These functions are used in the interface code between F90 and C/C++ as in this example:

Fortran code:

real*4 :: a(100,100) call foo(a(2:20:2, 1:10))

Fortran interface code:

interface getArrDesc module procedure, end interface subroutine foo(a) real*4 :: a(:,:) type(ESMF_ArrayDesc) :: arr arr = getArrDesc(a) ! Call the C function call c_foo(ar) ....

The implementation C function:

void foo(arrsec_desc<float> **arr) { /*... Use arr to for some calculation */ }

The memory ordering class and the array section class could easily be brought together into a single utility that provides all of the capabilities necessary to handle F90 arrays in C.