Sol Research

An interpretable survival model for diffuse large B-cell lymphoma patients using a biologically informed visible neural network.

Last Updated Sep 03, 2024 in Computational and structural biotechnology journal by Jie Tan, Jiancong Xie, Jiarong Huang, Weizhen Deng, Hua Chai, Yuedong Yang

TLDR

The study proposes a new way to predict the outcomes of patients with a type of cancer called diffuse large B-cell lymphoma (DLBCL). The study uses a machine learning algorithm called a visible neural network (VNN) to analyze the genetic information of the patients. The study finds that the VNN outperforms other methods for predicting outcomes and identifies the most important genes and the pathways they affect. The study also suggests a way to use this information to help doctors make better decisions about treatment for patients with DLBCL.

Abstract

Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma (NHL) and is characterized by high heterogeneity. Assessment of its prognosis and genetic subtyping hold significant clinical implications. However, existing DLBCL prognostic models are mainly based on transcriptomic profiles, while genetic variation detection is more commonly used in clinical practice. In addition, current clustering-based subtyping methods mostly focus on genes with high mutation frequencies, providing insufficient explanations for the heterogeneity of DLBCL. Here, we proposed VNNSurv (https://bio-web1.nscc-gz.cn/app/VNNSurv), a survival model for DLBCL patients based on a biologically informed visible neural network (VNN). VNNSurv achieved an average C-index of 0.72 on the cross-validation set (HMRN cohort, n = 928), outperforming the baseline methods. The remarkable interpretability of VNNSurv facilitated the identification of the most impactful genes and the underlying pathways through which they act on patient outcomes. When only the 30 highest-impact genes were used as genetic input, the overall performance of VNNSurv improved, and a C-index of 0.70 was achieved on the external TCGA cohort (n = 48). Leveraging these high-impact genes, including 16 genes with low (<5 %) alteration frequencies, we devised a genetic-based prognostic index (GPI) for risk stratification and a subtype identification method. We stratified the patient group according to the International Prognostic Index (IPI) into three risk grades with significant prognostic differences. Furthermore, the defined subtypes exhibited greater prognostic consistency than clustering-based methods. Broadly, VNNSurv is a valuable DLBCL survival model. Its high interpretability has significant value for precision medicine, and its framework is scalable to other diseases.

Overview

The study aims to develop a survival model for diffuse large B-cell lymphoma (DLBCL) patients based on a biologically informed visible neural network (VNN).
The methodology used for the experiment includes the use of a biologically informed visible neural network (VNN) and genetic variation detection. The study uses a cross-validation set (HMRN cohort, n = 928) and an external TCGA cohort (n = 48) to evaluate the performance of the model. The study also identifies the most impactful genes and the underlying pathways through which they act on patient outcomes. The study aims to achieve the primary objective of developing a genetic-based prognostic index (GPI) for risk stratification and a subtype identification method. The study seeks to answer the question of whether a biologically informed visible neural network (VNN) can be used to develop a survival model for diffuse large B-cell lymphoma (DLBCL) patients and whether it can be used to identify the most impactful genes and the underlying pathways through which they act on patient outcomes. The study also seeks to answer the question of whether a genetic-based prognostic index (GPI) can be used for risk stratification and a subtype identification method. The study also seeks to answer the question of whether the defined subtypes exhibit greater prognostic consistency than clustering-based methods.

Comparative Analysis & Findings

The study compares the outcomes observed under different experimental conditions or interventions detailed in the study. The study identifies significant differences in the results between the biologically informed visible neural network (VNN) and genetic variation detection. The study also identifies significant differences in the results between the genetic-based prognostic index (GPI) and clustering-based methods. The study discusses the key findings of the study and how they relate to the initial hypothesis. The study finds that the biologically informed visible neural network (VNN) outperforms the baseline methods and achieves an average C-index of 0.72 on the cross-validation set (HMRN cohort, n = 928). The study also finds that the genetic-based prognostic index (GPI) improves the overall performance of the model when only the 30 highest-impact genes are used as genetic input, achieving a C-index of 0.70 on the external TCGA cohort (n = 48). The study also finds that the defined subtypes exhibit greater prognostic consistency than clustering-based methods.

Implications and Future Directions

The study's findings have significant implications for the field of research or clinical practice. The study identifies the most impactful genes and the underlying pathways through which they act on patient outcomes, which can be used for precision medicine. The study also identifies a genetic-based prognostic index (GPI) for risk stratification and a subtype identification method, which can be used for clinical decision-making. The study also identifies that the defined subtypes exhibit greater prognostic consistency than clustering-based methods. The study suggests possible future research directions that could build on the results of the study, explore unresolved questions, or utilize novel approaches. The study suggests that the framework of the study could be scaled to other diseases. The study also suggests that the study could be extended to include more data and more advanced machine learning algorithms to improve the performance of the model.

Read Full Article