UPDF AI

Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

Muhammad Osama,D. Merrill,2 作者,John Douglas Owens

2023 · DOI: 10.1145/3572848.3577479
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming · 引用数 33

TLDR

Stream-K is introduced, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra that provides a near-perfect utilization of computing resources, regardless of how efficiently the output tiling for any given problem quantizes across the underlying processing elements.

摘要

We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an even share of the aggregate inner loop iterations among physical processing elements. This provides a near-perfect utilization of computing resources, regardless of how efficiently the output tiling for any given problem quantizes across the underlying processing elements.