EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

Junjie Zhang; Qiao Yin; Weicheng Hu; Yunfeng Li; Hu Li; Nan Ye; Bingyao Cao

doi:10.1002/cpe.6198

EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

Junjie Zhang, Qiao Yin, Weicheng Hu, Yunfeng Li, Hu Li, Nan Ye, Bingyao Cao^*

^*Corresponding author for this work

Shanghai University

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Thanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA-based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine-grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2-Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.

Original language	English
Article number	e6198
Journal	Concurrency and Computation: Practice and Experience
Volume	35
Issue number	18
DOIs	https://doi.org/10.1002/cpe.6198
Publication status	Published - 15 Aug 2023
Externally published	Yes

Keywords

CNN accelerator
FPGA
computing efficiency
fine-grained pipeline
high performance
pipeline architecture

Access to Document

10.1002/cpe.6198

Cite this

@article{a4d302b436214d6e90c001b1b91c924c,

title = "EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA",

abstract = "Thanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA-based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine-grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2-Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.",

keywords = "CNN accelerator, FPGA, computing efficiency, fine-grained pipeline, high performance, pipeline architecture",

author = "Junjie Zhang and Qiao Yin and Weicheng Hu and Yunfeng Li and Hu Li and Nan Ye and Bingyao Cao",

note = "Publisher Copyright: {\textcopyright} 2021 John Wiley & Sons, Ltd.",

year = "2023",

month = aug,

day = "15",

doi = "10.1002/cpe.6198",

language = "English",

volume = "35",

journal = "Concurrency and Computation: Practice and Experience",

issn = "1532-0626",

number = "18",

}

TY - JOUR

T1 - EPA

T2 - The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

AU - Zhang, Junjie

AU - Yin, Qiao

AU - Hu, Weicheng

AU - Li, Yunfeng

AU - Li, Hu

AU - Ye, Nan

AU - Cao, Bingyao

PY - 2023/8/15

Y1 - 2023/8/15

N2 - Thanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA-based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine-grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2-Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.

AB - Thanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA-based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine-grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2-Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.

KW - CNN accelerator

KW - FPGA

KW - computing efficiency

KW - fine-grained pipeline

KW - high performance

KW - pipeline architecture

UR - http://www.scopus.com/inward/record.url?scp=85103415502&partnerID=8YFLogxK

U2 - 10.1002/cpe.6198

DO - 10.1002/cpe.6198

M3 - Article

AN - SCOPUS:85103415502

SN - 1532-0626

VL - 35

JO - Concurrency and Computation: Practice and Experience

JF - Concurrency and Computation: Practice and Experience

IS - 18

M1 - e6198

ER -

EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this