pearu.github.io

CSR tensor support in PyTorch

   
Author Pearu Peterson
Created 2021-04-07

The aim of this blog post is to propose a roadmap for completing the implementation of the CSR layout for PyTorch tensors that is started in PR 50937.

The CSR layout was introduced to resolve the issue of slow matrix multiplication with sparse matrices when using the existing COO layout support in PyTorch. See Roadmap for torch.sparse Matrix Product API for overview. The PR 50937 resolves this issue with high success: the performance of matrix multiplication of sparse matrix-vector and sparse matrix-dense matrix increases by ten-fold when using the CSR layout, another five-fold increase is achieved when using Intel MKL library tools, see CSR vs COO benchmarks.

However, maintaining the PR 50937 and preparing it for landing has become an increasingly difficult task because of

It has been agreed that the PR will land as it is but the work on the CSR layout support needs to continue to meet the (still envolving) PyTorch coding and testing standards.

Here, an attempt will be made to organize and discuss the follow-up tasks for completing the CSR layout support.

Unresolved discussion items

MKL and Windows build issues

MKL and macOS build issues

CSR indices support int32 and int64

See also https://github.com/pytorch/pytorch/issues/56959 that summarizes the background of the three discussion items above.

IIUC, it would be preferred to support a single dtype for CSR indices tensors. (Explain why this preference). While COO uses only int64 as dtype for indices, a natural choice for the dtype of CSR crow/col_indices would also be int64. As a result, the situation would be simpler for users, conversion from COO to CSR and CSR to COO would be memory/processor efficient, etc. However, a big performance gain in matrix multiplication is achieved when using Intel MKL library tools but with the current MKL support in PyTorch, only int32 indices can be used as inputs to MKL routines.

So, we could fix dtype to int64 (as in COO) but at the expense that one needs int64->int32 conversion (and inverse?) whenever calling an MKL routine.

We could also fix dtype to int32 (that would be most efficient when using MKL support) but at the expense that all conversions between COO and CSR would be more expensive than necessary.

Btw, the conversion between COO and CSR is important because the COO layout is the most human-friendly layout for constructing sparse tensors while the CSR layout is computationally much more efficient than COO.

COO to CSR conversion

Not much to discuss here: for efficiency, implement the direct dense to CSR conversion in C++. This will be important when we decide that the indices of COO and CSR will have different dtypes. Otherwise, I would not expect much performance gain.

Testing

Avoid COO-isms

Code quality

Deal for landing PR 50937

Main features missing