Research and Optimization of Neural Network Accelerator Based on NVDLA
Research and Optimization of Neural Network Accelerator Based on NVDLA
Liang Liu,Zengmin Ren,Ting Chong
TLDR
A heterogeneous acceleration system of FPGA and CPU, which expands the function of NVDLA, and a conversion method between the network model and data under the common framework and the Caffe.
Abstract
Convolutional neural network (CNN) has been widely used in image recognition and natural language processing. Its computing and storage overhead are also increased with the advent of massive amounts of data and complex models, Neural processing accelerators are in urgent need. NVDLA is a convolutional neural network accelerator developed by NVIDIA company for deep learning reasoning. It has very good acceleration performance, and its code is completely open source, which is very helpful for our in-depth research. However, due to the lack of multi-functional tools, DLA is also faced with great limitations. Firstly, the number of operators it supports is insufficient, and the network model it can support is very limited, so it can only realize simple classification function; Secondly, the official compiler can only support Caffe framework, but can not support some current mainstream frameworks, such as pytorch, tensorflow, etc; Finally, before the hardware reasoning of the network, we need to quantify the designed network to meet the data format requirements of the hardware. This paper explores the research and optimization of NVDLA-based neural network accelerators, we design a heterogeneous acceleration system of FPGA and CPU, and Let the CPU handle the parts that the NVDLA accelerator cannot handle, which expands the function of NVDLA.The task division of heterogeneous operation is implemented by segmenting network to minimize the processing part of CPU, thusmprove the utilization of NVDLA and minimize the data communication overhead. We also design a conversion method between the network model and data under the common framework and the Caffe. Finally, the quantization interface and the quantization calibration table are designed to meets the data bit requirements of hardware reasoning. The system is verified by LeNet-5, ResNet-50 and YOLOv3. CCS CONCEPTS • Computer systems organization• Architectures• Other architectures• Neural networks
