Sol Research

GCapNet-FSD: A heterogeneous Graph Capsule Network for Few-Shot object Detection.

Last Updated May 23, 2025 in Neural networks : the official journal of the International Neural Network Society by Jiaxu Leng, Qianru Chen, Taiyue Chen, Feng Gao, Ji Gan, Changjun Gu, Xinbo Gao

TLDR

A new method is proposed for few-shot object detection that combines internal and external data to improve performance, showing promising results on various benchmarks.

Abstract

Few-shot object detection is a challenging task that aims to quickly adapt detectors to detect novel objects with only a minimal number of annotated examples. Although promising results have been achieved, performance still declines significantly when the number of shots decreases sharply. We argue that this shot sensitivity is due to the critical under-utilization of both internal few-shot data and external common knowledge bases. Therefore, the key insight is how to extract more discriminative notions to compensate for the insufficient task-specific information from the limited novel dataset. We propose a novel heterogeneous Graph Capsule Network for Few-Shot object Detection, named GCapNet-FSD. Specifically, we design a heterogeneous graph to combine the high-level visual capsule neurons from internal few-shot data and the stable semantic embeddings from the external easily available corpus for more discriminative task-specific representations. As a result, our proposed GCapNet-FSD is stable and robust for various settings of the shots. Our design outperforms current works in 1-shot of any split, with up to +3.7% on PASCAL VOC07&12 and +0.4% on challenging COCO benchmark, and extensive experiments on both PASCAL VOC07&12 and MS COCO benchmarks demonstrate that our GCapNet-FSD shows shot-stable detection performance and achieves significantly better performance at lower shots.

Overview

The study focuses on the few-shot object detection task, which aims to quickly adapt detectors to detect novel objects with minimal annotated examples.
The researchers argue that the performance decline is caused by the under-utilization of both internal few-shot data and external common knowledge bases.
The primary objective is to extract more discriminative notions to compensate for the insufficient task-specific information from the limited novel dataset.

Comparative Analysis & Findings

The proposed GCapNet-FSD outperforms current works in 1-shot settings on PASCAL VOC07&12 and COCO benchmarks.
The design achieves an improvement of up to +3.7% on PASCAL VOC07&12 and +0.4% on COCO benchmark in 1-shot setting.
Extensive experiments on both PASCAL VOC07&12 and MS COCO benchmarks demonstrate that GCapNet-FSD shows shot-stable detection performance and achieves better performance at lower shots.

Implications and Future Directions

The study highlights the importance of combining internal few-shot data and external common knowledge bases for few-shot object detection.
Future research can explore ways to further improve the performance by utilizing more advanced techniques, such as active learning or transfer learning.
The design could be extended to other computer vision tasks, such as image classification or segmentation, by adapting the proposed heterogeneous graph and capsule network architecture.

Read Full Article