DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention.

in IEEE transactions on neural networks and learning systems by Quan Zhou, Huimin Shi, Weikang Xiang, Bin Kang, Longin Jan Latecki

TLDR

  • The study presents a new way to detect objects in images using a computer program called a neural network. The program has a special design that allows it to quickly and accurately detect objects in real-time. The program uses a lightweight attention scheme that helps it focus on the most important parts of the image when detecting objects. The program also uses a lightweight self-correlation module (LSCM) that captures global interactions between different parts of the image. The program was tested on three different datasets and achieved state-of-the-art results in terms of accuracy and efficiency. The study's findings suggest that lightweight attention schemes can improve the accuracy and efficiency of object detection in computer vision tasks.

Abstract

The recent advances in compressing high-accuracy convolutional neural networks (CNNs) have witnessed remarkable progress in real-time object detection. To accelerate detection speed, lightweight detectors always have few convolution layers using a single-path backbone. Single-path architecture, however, involves continuous pooling and downsampling operations, always resulting in coarse and inaccurate feature maps that are disadvantageous to locate objects. On the other hand, due to limited network capacity, recent lightweight networks are often weak in representing large-scale visual data. To address these problems, we present a dual-path network, named DPNet, with a lightweight attention scheme for real-time object detection. The dual-path architecture enables us to extract in parallel high-level semantic features and low-level object details. Although DPNet has a nearly duplicated shape with respect to single-path detectors, the computational costs and model size are not significantly increased. To enhance representation capability, a lightweight self-correlation module (LSCM) is designed to capture global interactions, with only a few computational overheads and network parameters. In the neck, LSCM is extended into a lightweight cross correlation module (LCCM), capturing mutual dependencies among neighboring scale features. We have conducted exhaustive experiments on MS COCO, Pascal VOC 2007, and ImageNet datasets. The experimental results demonstrate that DPNet achieves a state-of-the-art trade off between detection accuracy and implementation efficiency. More specifically, DPNet achieves 31.3% AP on MS COCO test-dev, 82.7% mAP on Pascal VOC 2007 test set, and 41.6% mAP on ImageNet validation set, together with nearly 2.5M model size, 1.04 GFLOPs, and 164 and 196 frames/s (FPS) FPS for [Formula: see text] input images of three datasets.

Overview

  • The study presents a dual-path network, named DPNet, for real-time object detection using a lightweight attention scheme. The dual-path architecture enables the extraction of high-level semantic features and low-level object details in parallel, while the lightweight self-correlation module (LSCM) captures global interactions with minimal computational overheads and network parameters. The study aims to achieve a state-of-the-art trade-off between detection accuracy and implementation efficiency on three datasets: MS COCO, Pascal VOC 2007, and ImageNet. The hypothesis being tested is whether the dual-path network with the lightweight attention scheme can improve the accuracy and efficiency of object detection compared to single-path detectors with limited network capacity.

Comparative Analysis & Findings

  • The study compares the outcomes observed under the dual-path network with the single-path network on three datasets. The results demonstrate that DPNet achieves a state-of-the-art trade-off between detection accuracy and implementation efficiency. Specifically, DPNet achieves 31.3% AP on MS COCO test-dev, 82.7% mAP on Pascal VOC 2007 test set, and 41.6% mAP on ImageNet validation set, with nearly 2.5M model size, 1.04 GFLOPs, and 164 and 196 frames/s (FPS) FPS for [Formula: see text] input images of three datasets. The dual-path architecture enables the extraction of high-level semantic features and low-level object details in parallel, while the lightweight self-correlation module (LSCM) captures global interactions with minimal computational overheads and network parameters. The study's findings support the hypothesis that the dual-path network with the lightweight attention scheme can improve the accuracy and efficiency of object detection compared to single-path detectors with limited network capacity.

Implications and Future Directions

  • The study's findings have significant implications for the field of research and clinical practice, as they demonstrate the potential of lightweight attention schemes to improve the accuracy and efficiency of object detection. The study identifies the limitations of single-path architecture, which involves continuous pooling and downsampling operations, resulting in coarse and inaccurate feature maps. The study suggests future research directions that could build on the results of the study, explore unresolved questions, or utilize novel approaches. For example, future research could investigate the use of attention mechanisms in other computer vision tasks, such as image classification or segmentation. Additionally, future research could explore the use of transfer learning to improve the performance of lightweight networks on large-scale visual data.