Sol Research

AIPNet: Action-Instance Progressive Learning Network for Instrument-tissue Interaction Detection.

Last Updated Jun 02, 2025 in IEEE journal of biomedical and health informatics by Wenjun Lin, Yan Hu, Luoying Hao, Huazhu Fu, Cheekong Chui, Jiang Liu

TLDR

The study proposes a new method for detecting instrument-tissue interactions in surgical videos that achieves superior accuracy and faster processing speed than state-of-the-art models.
The method uses a combination of techniques to identify the interactions between instruments and tissue in real-time, making it useful for improving the safety and effectiveness of surgical procedures.

Abstract

Instrument-tissue interaction detection, a task aimed at understanding surgical scenes from videos, holds immense importance in constructing computer-assisted surgery systems. Existing methods consist of two stages: instance detection and interaction prediction. This sequential and separate model structure limits both effectiveness and efficiency, making it difficult to deploy on surgical robotic platforms. In this paper, we propose an end-to-end Action-Instance Progressive Learning Network (AIPNet) for the task. The model operates in three steps: action detection, instance detection, and action class refinement. Starting with coarse-scale proposals, the model progressively refines them into coarse-grained actions, which then serve as proposals for instance detection. The action prediction results are further refined using instance features through late fusion. These progressive learning processes improve the performance of the end-to-end model. Additionally, we introduce Dynamic Proposal Generators (DPG) to create dynamic adaptive learnable proposals for each video frame. To address the training challenges of this multi-task model, semantic supervised training is introduced to transfer prior language knowledge, and a training label strategy is proposed to generate unrelated instrument-tissue pair labels for enhanced supervision. Experimental results on PhacoQ and CholecQ datasets show that the proposed method achieves superior accuracy and faster processing speed than state-of-the-art models.

Overview

The study proposes an end-to-end Action-Instance Progressive Learning Network (AIPNet) for instrument-tissue interaction detection in surgical videos.
The model operates in three steps: action detection, instance detection, and action class refinement, allowing for progressive refinement and improvement in performance.
The model includes Dynamic Proposal Generators (DPG) to create dynamic, adaptive, and learnable proposals for each video frame, making it more effective and efficient.

Comparative Analysis & Findings

The proposed method achieves superior accuracy and faster processing speed compared to state-of-the-art models on the PhacoQ and CholecQ datasets.
The study uses semantic supervised training and a training label strategy to address the training challenges of the multi-task model.
The experimental results demonstrate the effectiveness of the proposed method in detecting instrument-tissue interactions in surgical videos.

Implications and Future Directions

The proposed method has the potential to improve computer-assisted surgery systems by enabling more accurate and efficient detection of instrument-tissue interactions.
Future research could focus on adapting the proposed method to other medical environments, such as robotic-assisted surgery or minimally invasive procedures.
The study's findings could also be used to develop more advanced surgical training simulations and to improve the overall safety and effectiveness of surgical procedures.

Read Full Article