Author | Pearu Peterson |
Created | 2021-02-05 |
The aim of this blog post is to review the current state of CPU/CUDA Array Interfaces and PEP 3118 Buffer Protocol in the context of NumPy and PyTorch, and give recommendations to improve the PyTorch support to these protocols. We also indicate the overall usage of array interfaces in different Python libraries.
This blog post is inspired by a PyTorch issue 51156.
Array Interface (Version 3) defines a protocol for objects to re-use each other’s data buffers. It was created in 2005 within the NumPy project for CPU array-like objects. The implementation of the array interface is defined by the existence of the following attributes or methods:
__array_interface__
- a Python dictionary that contains the shape,
the element type, and optionally, the data buffer address and the
strides of an array-like object.
__array__()
- a method returning NumPy ndarray view of an array-like object
__array_struct__
- holds a pointer to PyArrayInterface
C-structure.
Numba introduces CUDA Array Interface (Version 2) for GPU array-like objects. The implementation of the CUDA array interface is defined by the existence of the attribute
__cuda_array_interface__
that holds the same information about an array-like object as __array_interface__
except
the data buffer address will point to GPU memory area.
PEP 3118 Buffer Protocol defines Python C/API
for re-using data buffers of buffer-like objects.
The Buffer protocol can implemented for extension types using Python C/API but not for types defined in Python:
this has been requested and discussed but no solution yet.
In Python, the data buffers of extension types can be accessed using memoryview
object.
NumPy ndarray object implements CPU Array Interface as well as Buffer Protocol for sharing its data buffers:
>>> import numpy
>>> arr = numpy.array([1, 2, 3, 4])
>>> arr.__array__()
array([1, 2, 3, 4])
>>> arr.__array_interface__
{'data': (94398925856320, False), 'strides': None, 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (4,), 'version': 3}
>>> arr.__array_struct__
<capsule object NULL at 0x7f86f8354a80>
>>> memoryview(arr)
<memory at 0x7f8824bf94c0>
NumPy ndarray can be used for wrapping arbitrary objects that implement the CPU Array Interface or Buffer Protocol:
>>> data = numpy.array([1, 2, 3, 4, 5]) # this will be the only place where memory will be located for data
>>> class A1:
... def __array__(self): return data
...
>>> class A2:
... __array_interface__ = data.__array_interface__
...
>>> class A3:
... __array_struct__ = data.__array_struct__
...
>>> a1 = numpy.asarray(A1())
>>> a1[0] = 11
>>> a2 = numpy.asarray(A2())
>>> a2[1] = 21
>>> a3 = numpy.asarray(A3())
>>> a3[2] = 31
>>> m4 = memoryview(data)
>>> a4 = numpy.frombuffer(m4, dtype=m4.format)
>>> a4[3] = 41
>>> data
array([11, 21, 31, 41, 5])
As seen above, there are at least four ways to construct a NumPy ndarray view of objects implementing different protocols. Here follows a performance test for all these cases:
>>> import timeit
>>> e4 = timeit.timeit('asarray(a)', globals=dict(asarray=numpy.frombuffer, a=m4), number=100000)
>>> round(timeit.timeit('asarray(a)', globals=dict(asarray=numpy.asarray, a=A1()), number=100000) / e4, 1)
5.8
>>> round(timeit.timeit('asarray(a)', globals=dict(asarray=numpy.asarray, a=A2()), number=100000) / e4, 1)
6.5
>>> round(timeit.timeit('asarray(a)', globals=dict(asarray=numpy.asarray, a=A3()), number=100000) / e4, 1)
4.8
So, the Array Interface methods __array_interface__
, __array__
,
__array_struct__
are 4.5 to 6.5 times slower than the Buffer Protocol.
By default, numpy.frombuffer(buf)
returns a NumPy ndarray with dtype==numpy.float64
but discards buf.format
.
I think it would make sense to use the buf.format
for determing the dtype
of the numpy.frombuffer
result, as
demostrated in the following:
>>> data = numpy.array([1, 2, 3, 4, 5])
>>> buf = memoryview(data)
>>> numpy.frombuffer(buf)
array([4.9e-324, 9.9e-324, 1.5e-323, 2.0e-323, 2.5e-323])
>>> numpy.frombuffer(buf, dtype=buf.format)
array([1, 2, 3, 4, 5])
The following examples use PyTorch version 1.9.0a0.
PyTorch Tensor object implements the CPU Array Interface partly and does not implement Buffer Protocol:
>>> import torch
>>> t = torch.tensor([1, 2, 3, 4, 5])
>>> arr = t.__array__() # equivalent to numpy.asarray(t)
>>> arr[0] = 99
>>> t
tensor([99, 2, 3, 4, 5])
>>> t.__array_interface__
AttributeError: 'Tensor' object has no attribute '__array_interface__'
>>> t.__array_struct__
AttributeError: 'Tensor' object has no attribute '__array_struct__'
>>> memoryview(t)
TypeError: memoryview: a bytes-like object is required, not 'Tensor'
However, since the Tensor.__array__()
method returns a NumPy ndarray as a view of tensor data buffer,
the CPU Array Interface is effective to PyTorch tensors:
>>> t.__array__().__array_interface__
{'data': (94398970087872, False), 'strides': None, 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (5,), 'version': 3}
>>> t.__array__().__array_struct__
<capsule object NULL at 0x7f8692074840>
>>> memoryview(t.__array__())
<memory at 0x7f8691a96f40>
PyTorch Tensor object implements the CUDA Array Interface:
>>> t = torch.tensor([1, 2, 3, 4, 5], device='cuda')
>>> t.__cuda_array_interface__
{'typestr': '<i8', 'shape': (5,), 'strides': None, 'data': (140214628515840, False), 'version': 2}
PyTorch Tensor object cannot be used for wrapping arbitrary objects that implement the CPU Array Interface:
>>> data = numpy.array([1, 2, 3, 4, 5])
>>> class A1:
... def __array__(self): return data
...
>>> class A2:
... __array_interface__ = data.__array_interface__
...
>>> class A3:
... __array_struct__ = data.__array_struct__
...
>>> t1 = torch.as_tensor(A1())
RuntimeError: Could not infer dtype of A1
>>> t2 = torch.as_tensor(A2())
RuntimeError: Could not infer dtype of A2
>>> t3 = torch.as_tensor(A3())
RuntimeError: Could not infer dtype of A3
However, wrapping the objects with NumPy ndarray first, one can effectively wrap arbitrary objects using PyTorch Tensor object:
>>> t1 = torch.as_tensor(numpy.asarray(A1()))
>>> t1[0] = 101
>>> t2 = torch.as_tensor(numpy.asarray(A2()))
>>> t2[1] = 102
>>> t3 = torch.as_tensor(numpy.asarray(A3()))
>>> t3[2] = 103
>>> data
array([101, 102, 103, 4, 5])
PyTorch Tensor object implements the Buffer Protocol partly (or incorrectly):
>>> m4 = memoryview(data)
>>> t4 = torch.as_tensor(m4) # A copy of memoryview buffer is made!!!
>>> t4[3] = 104
>>> data
array([101, 102, 103, 4, 5])
but wrapping with NumPy ndarray provides a workaround:
>>> t4 = torch.as_tensor(numpy.frombuffer(m4, dtype=m4.format))
>>> t4[3] = 104
>>> data
array([101, 102, 103, 104, 5])
PyTorch Tensor object can be used for wrapping arbitrary objects that implement the CUDA Array Interface:
>>> cuda_data = torch.tensor([1, 2, 3, 4, 5], device='cuda')
>>> class A5:
... __cuda_array_interface__ = cuda_data.__cuda_array_interface__
...
>>> t5 = torch.as_tensor(A5(), device='cuda') # device must be specified explicitly
>>> t5[4] = 1005
>>> cuda_data
tensor([ 1, 2, 3, 4, 1005], device='cuda:0')
torch.Tensor.__array_interface__
and torch.Tensor.__array_struct__
attributes to fully support the CPU Array Interfaced.torch.as_tensor(obj)
should succeed when obj
implements the CPU Array Interface but is not NumPy ndarray nor PyTorch Tensor object.torch.as_tensor(obj)
should use device='cuda'
by default when obj
implements the CUDA Array Interface. Currently, a CPU copy of a CUDA data buffer is returned from torch.as_tensor(obj)
while it would be more natural to return a CUDA view of the CUDA data buffer, IMHO.torch.as_tensor(buf)
should return a view of data buffer when buf
is memoryview
object. Currently, a copy of data buffer is made. BC alert!Many Python libraries have adopted the above mentioned array interfaces. We do not attempt to compose a complete list of such libraries here. Instead, to get a rough idea of the explicit usage of array interfaces, we use GitHub code search tool and report the code hits for relevant search patterns as shown below (all queries were executed on March 5, 2021).
The search results about using/exposing array interfaces in Python codes:
extension:.py "__array__" 102,931 hits
extension:.py "__array_interface__" 50,970 hits
extension:.py "__array_struct__" 9,478 hits
extension:.py "__cuda_array_interface__" 424 hits
as well as in C/C++ codes:
extension:.c extension:.cpp "__array_struct__" 1,530 hits
extension:.c extension:.cpp "__array_interface__" 1,202 hits
extension:.c extension:.cpp "__cuda_array_interface__" 91 hits
extension:.c extension:.cpp "__array__" 1,574 hits (lots of unrelated hits)
The search results about exposing array interfaces in Python codes:
extension:.py "def __array__(" 57,445 hits
extension:.py "def __array_interface__(" 11,097 hits
extension:.py "def __cuda_array_interface__(" 146 hits
extension:.py "def __array_struct__(" 19 hits
The search results for some popular Python methods, given here for reference purposes only:
extension:.py "def __init__(" 33,653,170 hits
extension:.py "def __getitem__(" 2,185,139 hits
extension:.py "def __len__(" 1,802,131 hits
Clearly, the most used array interface details are __array__
and
__array_interface__
.
Currently, PyTorch implements hooks for
__array__
and __cuda_array_interface__
but not for
__array_interface__
nor __array_struct__
, although, workarounds exists
when using NumPy ndarray as intermediate wrapper of data buffers (see above).