EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

Junjie Zhang, Qiao Yin, Weicheng Hu, Yunfeng Li, Hu Li, Nan Ye, Bingyao Cao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Thanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA-based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine-grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2-Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.

Original languageEnglish
Article numbere6198
JournalConcurrency and Computation: Practice and Experience
Volume35
Issue number18
DOIs
Publication statusPublished - 15 Aug 2023
Externally publishedYes

Keywords

  • CNN accelerator
  • FPGA
  • computing efficiency
  • fine-grained pipeline
  • high performance
  • pipeline architecture

Fingerprint

Dive into the research topics of 'EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA'. Together they form a unique fingerprint.

Cite this