Python – Cython: Learn about typed memory views with indirect contiguous memory layouts

Cython: Learn about typed memory views with indirect contiguous memory layouts… here is a solution to the problem.

Cython: Learn about typed memory views with indirect contiguous memory layouts

To learn more about Cython< a href="https://cython.readthedocs.io/en/stable/src/userguide/memoryviews.html" rel="noreferrer noopener nofollow">typed-memoryviews and memory layout indirect_contiguous.

According to documentation indirect_contiguous Used when “pointer list is contiguous”.

There is also an example usage:

# contiguous list of pointers to contiguous lists of ints
cdef int[::view.indirect_contiguous, ::1] b

So correct me if I’m wrong, but I’m assuming that “a list of consecutive pointers to a list of consecutive integers” means something like an array created by the following C++ virtual code:

// we want to create a 'contiguous list of pointers to contiguous lists of ints'

int** array;
 allocate row-pointers
 This is the 'contiguous list of pointers' related to the first dimension:
array = new int*[ROW_COUNT]

 allocate some rows, each row is a 'contiguous list of ints'
array[0] = new int[COL_COUNT]{1,2,3}

So if I understand correctly, then in my Cython code it should be possible to get the memory from int** View:

cdef int** list_of_pointers = get_pointers()
cdef int[::view.indirect_contiguous, ::1] view = <int[:ROW_COUNT:view.indirect_contiguous,COL_COUNT:1]> list_of_pointers

But I get the compile error :

cdef int[::view.indirect_contiguous, ::1] view = <int[:ROW_COUNT:view.indirect_contiguous,:COL_COUNT:1]> list_of_pointers
                                                                                                        ^                                                                                                                              
------------------------------------------------------------

memview_test.pyx:76:116: Pointer base type does not match cython.array base type

What am I doing wrong?
Did I miss any conversions or did I misunderstand the concept of indirect_contiguous?

Solution

Let’s clarify: typed memory View can only be used with objects that implement buffer-protocol

The original C pointer apparently did not implement a buffer protocol. But you might ask, why does code like the quick code below work:

%%cython    
from libc.stdlib cimport calloc
def f():
    cdef int* v=<int *>calloc(4, sizeof(int))
    cdef int[:] b = <int[:4]>v
    return b[0] # leaks memory, so what?

Here, a pointer (v) is used to construct a typed memory View (b). However, there’s more behind the scenes (as shown in the cythonized c file):

  • A cython-array (i.e. cython.view.array) is built to wrap the raw pointer and expose it via the buffer protocol
  • This array is used to create a typed Memory View.

Your understanding of what view.indirect_contiguous used is right – that’s exactly what you want. However, the problem is view.array, which cannot handle this type of data layout.

View.indirect corresponds to view.indirect_contiguous PyBUF_INDIRECT in Protocol Buffer terms, this field suboffsets must contain some meaningful value (i.e. the >=0)。 But, as in The view.array seen in source-code doesn’t have this member at all — it doesn’t represent a complex memory layout at all!

Where does it leave us? As @chrisb and @DavidW point out in your other questions, you will have to implement a wrapper that exposes your data structure through Protocol Buffer.

There are a few data structures in Python that use indirect memory layouts — most prominently PIL arrays. A good starting point for understanding how suboffsets should work this piece of documenation :

void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
                       Py_ssize_t *suboffsets, Py_ssize_t *indices) {
    char *pointer = (char*)buf;     A
    int i;
    for (i = 0; i < ndim; i++) {
        pointer += strides[i] * indices[i];  B
        if (suboffsets[i] >=0 ) {
            pointer = *((char**)pointer) + suboffsets[i];   C
        }
    }
    return (void*)pointer;   D
}

In your case, strides and offsets will be

  • strides=[sizeof(int*), sizeof(int)] (i.e. [8,4] on a normal x86_64 machine).
  • offsets=[0,-1], i.e. only the first dimension is indirect.

Get the element address [x,y], and the following happens:

  • In line A, pointer is set to buf, let’s assume BUF
  • First dimension:
    • Queuing B, the pointer becomes BUF+x*8 and points to the position of the pointer to the x row.
    • Because suboffsets[0]>=0, we dereference the pointer in line C so it displays address ROW_X – the beginning of line x.
  • Second dimension:
    • Queue B we get the address element of y using strides, i.e. pointer=ROW_X+4*y
    • The second dimension is direct (signaled by suboffset[1]<0), so no dereference is required.
  • We’re done, pointer points to the desired address and returns in line D

FWIW, I’ve implemented a library capable of exporting int** and similar memory layouts via protocol: https://github.com/realead/indirect_buffer .

Related Problems and Solutions