Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA-minus RNA sequencing data.

in GigaScience by Youri Hoogstrate, Malgorzata A Komor, René Böttcher, Job van Riet, Harmen J G van de Werken, Stef van Lieshout, Ralf Hoffmann, Evert van den Broek, Anne S Bolijn, Natasja Dits, Daoud Sie, David van der Meer, Floor Pepers, Chris H Bangma, Geert J L H van Leenders, Marcel Smid, Pim J French, John W M Martens, Wilbert van Workum, Peter J van der Spek, Bart Janssen, Eric Caldenhoven, Christian Rausch, Mark de Jong, Andrew P Stubbs, Gerrit A Meijer, Remond J A Fijneman, Guido W Jenster

TLDR

  • The study looked at how well RNA-seq data can tell us about the genetic changes that happen in cancer. They found that most of the genetic changes are not being expressed in the RNA-seq data, but that some of the genetic changes are being expressed. They also found that some of the genetic changes are being expressed in a way that could potentially be targeted for immunotherapy. This study is important because it shows us how we can use RNA-seq data to better understand cancer and how we can potentially treat it.

Abstract

Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non-poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA-minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG-positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. By using the full potential of non-poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects.

Overview

  • The study aims to investigate the extent to which genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA-minus RNA-seq data. The methodology used for the experiment includes an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. The study uses 1,275 RNA-seq samples and compares the results with whole-genome sequencing data. The primary objective of the study is to identify fusion transcripts other than classical gene-to-gene fusions that can be identified using RNA-seq.

Comparative Analysis & Findings

  • The study reveals that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG-positive tumours were present at RNA level. The study also reveals tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer, the study identifies rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, the study finds fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. These findings suggest that fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq.

Implications and Future Directions

  • The study's findings have significant implications for the field of research and clinical practice. The study demonstrates that the full potential of non-poly(A)-enriched RNA-seq data can be utilized to reliably identify expressed genomic breakpoints and their transcriptional effects. The study also identifies potential neo-antigens that could be targeted for immunotherapy. Future research directions could include further exploration of the potential of RNA-seq to identify fusion transcripts and their implications for cancer diagnosis and treatment.