Abstract
This article proposes a hardware accelerator for image feature extraction based on the oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary features (ORB) algorithm. The architecture adopts a hybrid workflow to process three scales in parallel as an octave, and multiple octaves are processed in serial by time-sharing the DRAM. In order to support any image resolution, a block-wise dataflow is used on the serial-parallel architecture. The overlapped data between two blocks is reused. As a result, the on-chip memory is limited to 1.47 Mb, and the DRAM bandwidth is compressed by 33%. Besides, 3 × 3 non-maximum suppression with heap sorting is applied to balance keypoint distribution in 2-D. It improves the valid keypoint match ratio by 7.09%. In addition, parallel processing in keypoint detection increases 2 × throughout. Moreover, approximate computing and superscalar processing reduce the timing cost in orientation estimation and descriptor generation by 92.78% and 67% respectively. Compared to the non-optimized baseline architecture, the proposed architecture saves the total timing cost by 82.4%. The accelerator is implemented in a Xilinx MPSoC, and it achieves 108 fps on full-HD images at 200 MHz under 873 mW.
Original language | English |
---|---|
Article number | 8946871 |
Pages (from-to) | 565-575 |
Number of pages | 11 |
Journal | IEEE Transactions on Very Large Scale Integration (VLSI) Systems |
Volume | 28 |
Issue number | 2 |
DOIs | |
Publication status | Published - Feb 2020 |
Externally published | Yes |
Keywords
- Feature extraction
- hardware accelerator
- oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary features (ORB)