next up previous contents
Next: 7 Pointer Size Up: IMPL_repdoc Previous: 5 Templates   Contents

Subsections

6 Array Sharing

There a a number of issues when trying to share arrays betweeen F90 and C/C++. Besides different storage conventions, F90 has a richer array syntax compared to C's very simple address based system. The two key items below are how to create code that loops through the array without being tied to the storage order, and how to interpret F90 arrays (especially array sections) from C/C++.

6.1 Memory Ordering

Storage schemes in F90 and C are typically reversed. F90 stores multi-dimensional arrays in column-major and C and C++ in row-major order. Being able to access arrays stored in either order is a routine task for numerical libraries. We present an example below to demonstrate some options.

A C++ solution Code for the example is in template_example/F90.

The optimal solution to the ordering problem allows the implementation language to use identical code within an algorithm to operate efficiently on data that may be ordered any way. The syntactic ``holy grail'' is to be able to reference an array (or array object) as array[i][j] and have the resulting data item be the same whether the array is passed from F90 (and is thus column-major), or passed from C (and is thus row-major). The semantic richness of C++ allows this to be done by creating an array abstraction class and overloading the correct operators. For random access of an array, this is an immediate solution to the ordering problem. However access through this abstraction adds considerable overhead when compared to the compile time optimized array[i][j] that occurs on a true array.

There is still hope for an optimized abstraction, however, because random access is not the usual method of operating on arrays. More common is the iteration through a row, column, or (more generically) a dimension. Take as an example matrix multiplication, matrix inverse, transpose, etc... Each of these algorithms uses an entire dimension, and then moves to the next. The only difference here between column and row major ordering is that in one the stride is one, whereas it is the product of all previous dimensions in the other. This is the basis for the iterator abstraction.

We begin by creating an array wrapper/abstraction class, which we can template and re-use for different data types and ranks. When we call a function in our C++ library (matrix multiply is the example here), in our interface we immediately wrap our array:

  typedef Array<double, 2> double2d;

  // Perform mat1*mat2 = matres 
  void FTN(c_mat_mult)(int *dims,double *mat1,double *mat2,double *matres) {
    double2d arr1(mat1, double2d::F90, dims[0], dims[1]);
    double2d arr2(mat2, double2d::F90, dims[1], dims[2]);
    double2d arrres(matres, double2d::F90, dims[0], dims[2]);
    // ... Use arr1, arr2, arrres for algorithm
  }

In the C interface we do a similar wrapping, but we pass double2d::C as a parameter to the object to signal row-major ordering. Now in our matrix multiply we can concentrate on the algorithm and ignore the complexities of ordering. As a first try we take advantage of C++ operator overloading and use the operator[]. The result is identical to our pure C implementation:

  for (i = 0; i < k; i++) {
      for (j = 0; j < n; j++) {
	  res = 0;
	  for (l = 0; l < m; l++) {
	      res += (double) arr1[j][l] *  (double) arr2[l][i];
	    }
	  arrres[j][i] = res;
	}
    }

Unfortunately the performance of this example is poor, so we introduce a second abstraction mechanism, the dimension iterator. We achieve performance that is not likely to be topped unless the algorithms for column and row ordering are both hand written (a maintenance nightmare):

  double2d::dim_iterator m1r, m2c, resi;
  for (i = 0; i < k; i++) {
      for(resi = matres.dim_begin(double2d::DIM_ITERATE, i), j = 0;
          !resi.done();
          resi++, j++ ) {
          res = 0;
          for (m1r = mat1.dim_begin(j, double2d::DIM_ITERATE),
               m2c = mat2.dim_begin(double2d::DIM_ITERATE, i);
               !m1r.done();
               m1r++, m2c++) {
              res += *m1r * *m2c;
          }
          *resi = res;
      }
   }

The call to dim_begin must specify the index of all dimensions except the one to iterate over, which is specified as a wildcard by DIM_ITERATE. Calling the operator++ on the iterator advances by the stride value (the optimal generic solution to the ordering problem). The iterator is overloaded such that dereferencing it accesses the raw array element (note *resi = res and res = *m1r + *m2c).


6.2 Array Sections

It is common in F90 to pass a subarray to a function call. This is accomplished using the unique F90 array syntax, for example:

	real*4 :: a(100,100)

	call foo(a(2:20:2, 1:10))

The above call passes in a section of the original array; every other row from 2 to 20 and the columns 1 to 10. This capability is implemented in F90 with what is commonly called dope vectors. This generally means that F90 passes a pointer to a structure describing the array slice. The F90 routine that is passed the array interprets the strides and sizes of each dimension from the table and operates on the array ``in place'', saving the overhead of copying the array to a contiguous block, yet allowing the subroutine to still address the array as if it were contiguous.

If the function foo (above) is to be implemented in C, there must be a method by which C can discover the various sizes and strides of the array section passed. One method of doing this is the direct interpretation of the dope vector in C, i.e.:

typedef struct dope_dec{ /*  defines dope vector types for CPQ/DEC f90 compiler */  
  char rank;            /* Number of dimensions   */
  char r[7];            /* Spacer                 */
  char kind;            /* Either 4 or 8 bytes    */
  char k[7];            /* Spacer                 */
  long addrs;           /* Base address of array  */
  long d[2];            /* Spacer                 */
  long ll[MAX_RANK][3]; /* strides (bytes), length(words) and start of each dimension */
}               DOPEDEC, *DOPEDEC_PTR;

void foo(DOPEDEC_PTR d)
{
...
}

When F90 passes an array to another F90 function (or a C function masquerading behind an interface block), a pointer to the structure above is passed. The C function can discover the various sizes and strides of the array from this structure, and the calculations may proceed (you have to be cautious and create an F90 interface block for your C function, otherwise F90 will copy the entire array slice into contiguous memory, causing severe performance degradation).

The weakness of this approach is that each compiler and each platform will have a different structure, and these structures are usually proprietary and only useable through consultation with vendors or experimentation. Keeping track of the various structures on numerous platforms could be difficult and risky.

An alternative solution is to recover the strides and sizes from a set of intermediary F90 functions. This method begins by creating a module which converts F90 arrays to a C++ class handle. The C++ class is an array descriptor, containing the strides and sizes of the array, as well as containing methods to iterate through the array. In this module one function is needed for each array size and type. For instance, there is a function for real*4 of 1 dimension, another for 2-d, and so on. The same for integer, real*8, etc... These functions query the language itself for the information that the previous method gleaned from the headers:

        interface getArrDesc
             module procedure
               arrsec_desc1,
               arrsec_desc2,
               ...
        end interface

        function arrsec_desc1(a)
        type(arraysection) :: arrsec_desc1
        real*4 :: a(:) 
        call arrsec_desc1c(arrsec_desc1,
     &  size(a), a(1), a(2))
        end function

        function arrsec_desc2(a)
        type(arraysection) :: arrsec_desc2
        real*4 :: a(:,:)
        call arrsec_desc2c(arrsec_desc2,
     &  size(a,1), size(a,2),
     &  a(1,1), a(1,2), a(2,1)) 
        end function

        function arrsec_desc3(a)
        type(arraysection) :: arrsec_desc3
        real*4 :: a(:,:,:)
        call arrsec_desc3c(arrsec_desc3,
     &  size(a,1), size(a,2), size(a,3),
     &  a(1,1,1), a(1,1,2), a(1,2,1), a(2,1,1))
        end function

Listed here are the functions that convert 1-d, 2-d and 3-d real*4 arrays to the C++ descriptors. The function calls a C function which is provided the sizes of each dimension, and then a unit step in each dimension. Since F90 passes scalars by address, each array element is passed as an address and the stride in each dimension can be deduced by simple pointer arithmetic:

// C function to dispatch class function
void arrsec_desc3c_(arrsec_desc<float> **arr, int *size1, int *size2, int *size3, 
        float *a111, float *a112, float *a121, float *a211)
{
        (*arr) = new arrsec_desc<float>(size1, size2, size3, a111, a112, a121, a211);
}

// C++ class constructor for 3-d case.  Class is overloaded for the different possible number
// of dimensions.
arrsec_desc(int *size1, int *size2, int *size3, DTYPE *a111, DTYPE *a112, DTYPE *a121, DTYPE *a211)
{
        dim = 3;

        // Set up the sizes of the dimensions
        dims[0] = *size1; 
        dims[1] = *size2; 
        dims[2] = *size2; 

        // Register the stride lengths 
        strides[0] = a112 - a111; 
        strides[1] = a121 - a111; 
        strides[2] = a211 - a111; 

        // Get the base pointer 
        base_ptr= a111;
}

The C++ class contains the data necessary to navigate the array:

template<class DTYPE>
class arrsec_desc {
public:
// 1-D constructor.
arrsec_desc(int *size, DTYPE *a1, DTYPE *a2);

// 2-D constructor.
arrsec_desc(int *size1, int *size2, DTYPE *a11, DTYPE *a12, DTYPE *a21);

// 3-D constructor.
arrsec_desc(int *size1, int *size2, int *size3, DTYPE *a111, DTYPE *a112, DTYPE *a121, DTYPE *a211); 

~arrsec_desc();
float *get_element_ptr(int i);
float *get_element_ptr(int i, int j); 
float *get_element_ptr(int i, int j, int k); 
int get_num_dims();
int get_dim_size(int i);
private:
        int dim;                   // Number of dimensions in array
        int dims[MAX_DIMS];        // Sizes of each dimension
        int strides[MAX_DIMS];     // Stride in each dimension
        DTYPE *base_ptr;           // Address of base array section
};

These functions are used in the interface code between F90 and C/C++ as in this example:

Fortran code:

	real*4 :: a(100,100)

	call foo(a(2:20:2, 1:10))

Fortran interface code:

	interface getArrDesc
		module procedure,
	end interface

	subroutine foo(a)
	real*4 :: a(:,:)
	type(ESMF_ArrayDesc) :: arr

	arr = getArrDesc(a)

! Call the C function

	call c_foo(ar)
	
	....

The implementation C function:

void foo(arrsec_desc<float> **arr)
{
	/*... Use arr to for some calculation */
}

The memory ordering class and the array section class could easily be brought together into a single utility that provides all of the capabilities necessary to handle F90 arrays in C.


next up previous contents
Next: 7 Pointer Size Up: IMPL_repdoc Previous: 5 Templates   Contents
esmf_support@list.woc.noaa.gov