TY - GEN
T1 - Configurable CNN Accelerator in Speech Processing based on Vector Convolution
AU - Hui, Lanqing
AU - Cao, Shan
AU - Chen, Zhiyong
AU - Li, Shan
AU - Xu, Shugong
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In speech applications, both input feature maps (IFMs) and kernels of neural networks are greatly diverse in shapes and sizes, which poses significant challenges to hardware acceleration. In this paper, a configurable CNN accelerator is introduced to make a good balance between the flexibility and efficiency for various neural network models in speech processing. The vector convolution scheme is first proposed by re-arrangement of IFM rows and weight values in vectors, by which the element convolution is converted into vector operations to break the limit of kernel-centric processing. The structure of vector processing element (VPE) is introduced to fit the continuous scaling down of IFMs with little control overheads, and the architecture of the CNN accelerator is proposed accordingly. FPGA implementation results demonstrate that the throughput is increased by 86% by the proposed architecture compared to state-of-the-art FPGA accelerators for the VGG16 network, while high DSP utilization is guaranteed for both 1D and 2D CNNs with various input sizes.
AB - In speech applications, both input feature maps (IFMs) and kernels of neural networks are greatly diverse in shapes and sizes, which poses significant challenges to hardware acceleration. In this paper, a configurable CNN accelerator is introduced to make a good balance between the flexibility and efficiency for various neural network models in speech processing. The vector convolution scheme is first proposed by re-arrangement of IFM rows and weight values in vectors, by which the element convolution is converted into vector operations to break the limit of kernel-centric processing. The structure of vector processing element (VPE) is introduced to fit the continuous scaling down of IFMs with little control overheads, and the architecture of the CNN accelerator is proposed accordingly. FPGA implementation results demonstrate that the throughput is increased by 86% by the proposed architecture compared to state-of-the-art FPGA accelerators for the VGG16 network, while high DSP utilization is guaranteed for both 1D and 2D CNNs with various input sizes.
KW - Accelerator
KW - CNN
KW - FPGA implementation
KW - speech processing
UR - http://www.scopus.com/inward/record.url?scp=85139059515&partnerID=8YFLogxK
U2 - 10.1109/AICAS54282.2022.9869904
DO - 10.1109/AICAS54282.2022.9869904
M3 - Conference Proceeding
AN - SCOPUS:85139059515
T3 - Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
SP - 146
EP - 149
BT - Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
Y2 - 13 June 2022 through 15 June 2022
ER -