Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches.

in Genome medicine by Brent S Pedersen, Preetida J Bhetariya, Joe Brown, Stephanie N Kravitz, Gabor Marth, Randy L Jensen, Mary P Bronner, Hunter R Underhill, Aaron R Quinlan

TLDR

  • Somalier is a tool that helps researchers figure out if samples from a patient have been mixed up during sequencing. It does this by looking at the genetic information from the samples and comparing them to each other. The study shows that Somalier can accurately measure relatedness in large groups of samples and distinguish pairs of whole-genome and RNA-seq samples from the same individuals. This is important because it can help researchers better understand the genetic changes that occur in tumors and how they relate to the patient's overall health.

Abstract

When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project. Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier .

Overview

  • The study focuses on developing a tool called Somalier to detect sample mix-ups in sequencing data from multiple spatial or longitudinal biopsies. The hypothesis being tested is that Somalier can accurately measure relatedness in large cohorts and distinguish pairs of whole-genome and RNA-seq samples from the same individuals. The methodology used for the experiment includes aligning high-coverage sequence data from the 1000 Genomes Project and a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. The primary objective of the study is to demonstrate the utility of Somalier in measuring relatedness from sequencing data and distinguish pairs of whole-genome and RNA-seq samples from the same individuals.

Comparative Analysis & Findings

  • The study compares the outcomes observed under different experimental conditions or interventions, specifically the use of Somalier to detect sample mix-ups in sequencing data. The results show that Somalier can accurately measure relatedness in large cohorts and distinguish pairs of whole-genome and RNA-seq samples from the same individuals. The key findings of the study demonstrate the utility of Somalier in measuring relatedness from sequencing data and highlight its potential impact on the field of research or clinical practice.

Implications and Future Directions

  • The study's findings have significant implications for the field of research or clinical practice, as they demonstrate the utility of Somalier in measuring relatedness from sequencing data. The limitations of the study include the need for further validation and testing in larger cohorts. Future research directions could include exploring the use of Somalier in other sequencing data types and genome builds, as well as investigating its potential applications in clinical settings.