Machine Learning Training on a Real Processing-in-Memory System
Juan Gómez-Luna,Yuxin Guo,5 Authors,Onur Mutlu
TLDR
Training machine learning algorithms is a computation-ally intensive process, which requires large amounts of training data and can become the bottleneck of the training process, if there is not enough computation and locality to amortize its cost.
Abstract
Machine learning (ML) algorithms [1]–[6] have become ubiquitous in many fields of science and technology due to their ability to learn from and improve with experience with minimal human intervention. These algorithms train by updating their model parameters in an iterative manner to improve the overall prediction accuracy. However, training machine learning algorithms is a computation-ally intensive process, which requires large amounts of training data. Accessing training data in current processor-centric systems (e.g., CPU, GPU) implies costly data movement between memory and processors, which results in high energy consumption and a large percentage of the total execution cycles. This data movement can become the bottleneck of the training process, if there is not enough computation and locality to amortize its cost.
