Segment anything model for medical image analysis: An experimental study.

in Medical image analysis by Maciej A Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang

TLDR

  • The study evaluates a model called Segment Anything Model (SAM) for medical image segmentation. SAM is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. The study compares SAM's performance on medical images with other methods and finds that SAM performs better with box prompts than with point prompts. However, SAM's performance varies depending on the dataset and the task, and it performs better for well-circumscribed objects with prompts with less ambiguity. The study concludes that SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it.

Abstract

Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. While the model performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point and box prompts for SAM using a standard method that simulates interactive segmentation. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Code for evaluation SAM is made publicly available at https://github.com/mazurowski-lab/segment-anything-medical-evaluation.

Overview

  • The study evaluates the Segment Anything Model (SAM) for medical image segmentation on a collection of 19 medical imaging datasets from various modalities and anatomies. The methodology involves generating point and box prompts for SAM using a standard method that simulates interactive segmentation. The primary objective is to assess SAM's performance on medical images and compare it with other methods. The study aims to answer the question: How well does SAM perform on medical images compared to other methods?
  • Comparative Analysis & Findings

Comparative Analysis & Findings

  • The study compares SAM's performance on medical images with other methods, including RITM, SimpleClick, and FocalClick. SAM outperforms these methods in almost all single-point prompt settings. However, when multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. SAM performs notably better with box prompts than with point prompts. Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors.

Implications and Future Directions

  • The study's findings suggest that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Future research directions could focus on improving SAM's performance on specific medical imaging datasets or developing methods that can adapt to different anatomies and modalities. Additionally, exploring the use of SAM in clinical settings and integrating it with existing workflows could provide valuable insights into its practical applications.