Genome-wide detection of human variants that disrupt intronic branchpoints.

in Proceedings of the National Academy of Sciences of the United States of America by Peng Zhang, Quentin Philippot, Weicheng Ren, Wei-Te Lei, Juan Li, Peter D Stenson, Pere Soler Palacín, Roger Colobran, Bertrand Boisson, Shen-Ying Zhang, Anne Puel, Qiang Pan-Hammarström, Qian Zhang, David N Cooper, Laurent Abel, Jean-Laurent Casanova

TLDR

  • ELI5: The study develops a computer program called BPHunter to find rare changes in the DNA of people with diseases caused by problems with how their genes are made. BPHunter looks at the DNA of people with these diseases and finds changes that might be causing the problem. It then checks if those changes are likely to cause problems with how the genes are made. The study finds 40 of the 48 known changes that cause these problems and finds two new changes that might be causing problems. The study shows that these changes are important for understanding how genes are made and how they can cause diseases.

Abstract

Pre-messenger RNA splicing is initiated with the recognition of a single-nucleotide intronic branchpoint (BP) within a BP motif by spliceosome elements. Forty-eight rare variants in 43 human genes have been reported to alter splicing and cause disease by disrupting BP. However, until now, no computational approach was available to efficiently detect such variants in massively parallel sequencing data. We established a comprehensive human genome-wide BP database by integrating existing BP data and generating new BP data from RNA sequencing of lariat debranching enzyme DBR1-mutated patients and from machine-learning predictions. We characterized multiple features of BP in major and minor introns and found that BP and BP-2 (two nucleotides upstream of BP) positions exhibit a lower rate of variation in human populations and higher evolutionary conservation than the intronic background, while being comparable to the exonic background. We developed BPHunter as a genome-wide computational approach to systematically and efficiently detect intronic variants that may disrupt BP recognition. BPHunter retrospectively identified 40 of the 48 known pathogenic BP variants, in which we summarized a strategy for prioritizing BP variant candidates. The remaining eight variants all create AG-dinucleotides between the BP and acceptor site, which is the likely reason for missplicing. We demonstrated the practical utility of BPHunter prospectively by using it to identify a novel germline heterozygous BP variant ofin a patient with critical COVID-19 pneumonia and a novel somatic intronic 59-nucleotide deletion ofin a lymphoma patient, both of which were validated experimentally. BPHunter is publicly available from https://hgidsoft.rockefeller.edu/BPHunter and https://github.com/casanova-lab/BPHunter.

Overview

  • The study aims to develop a computational approach to detect rare variants in human genes that disrupt splicing and cause disease by altering branchpoint (BP) recognition. The methodology used for the experiment includes the integration of existing BP data and the generation of new BP data from RNA sequencing of lariat debranching enzyme DBR1-mutated patients and machine-learning predictions. The primary objective of the study is to establish a comprehensive human genome-wide BP database and develop a genome-wide computational approach called BPHunter to systematically and efficiently detect intronic variants that may disrupt BP recognition. The study identifies 40 of the 48 known pathogenic BP variants and demonstrates the practical utility of BPHunter by prospectively identifying a novel germline heterozygous BP variant and a novel somatic intronic 59-nucleotide deletion. The study highlights the significance of BP variants in splicing and disease and the potential of computational approaches to detect and prioritize such variants for further study and clinical application.

Comparative Analysis & Findings

  • The study compares the outcomes observed under different experimental conditions or interventions, specifically the integration of existing BP data and the generation of new BP data from RNA sequencing of lariat debranching enzyme DBR1-mutated patients and machine-learning predictions. The results show that BP and BP-2 (two nucleotides upstream of BP) positions exhibit a lower rate of variation in human populations and higher evolutionary conservation than the intronic background, while being comparable to the exonic background. The study identifies 40 of the 48 known pathogenic BP variants and demonstrates the practical utility of BPHunter by prospectively identifying a novel germline heterozygous BP variant and a novel somatic intronic 59-nucleotide deletion. The key findings of the study suggest that computational approaches can efficiently detect and prioritize rare variants in human genes that disrupt splicing and cause disease by altering branchpoint (BP) recognition.

Implications and Future Directions

  • The study's findings have significant implications for the field of research and clinical practice, as they highlight the importance of branchpoint (BP) variants in splicing and disease and the potential of computational approaches to detect and prioritize such variants for further study and clinical application. The study identifies 40 of the 48 known pathogenic BP variants and demonstrates the practical utility of BPHunter by prospectively identifying a novel germline heterozygous BP variant and a novel somatic intronic 59-nucleotide deletion. The study suggests that computational approaches can be used to systematically and efficiently detect intronic variants that may disrupt BP recognition and prioritize them for further study and clinical application. Future research directions could include the integration of additional data sources, such as functional annotations and clinical data, to improve the accuracy and relevance of the BP database and BPHunter. Additionally, the study suggests that computational approaches could be used to predict the impact of novel variants on splicing and disease, which could inform personalized medicine and precision medicine strategies.