Predicting and interpreting protein and phosphoprotein abundance from pan-cancer and single-cell transcriptomes.
Abstract
Proteins that impact phenotype and disease are often approximated by RNA expression, which poorly infers protein abundance. We developed DeepGxP, a deep-learning model trained on The Cancer Genome Atlas pan-cancer data, to predict protein abundance from transcriptome profiles. DeepGxP outperformed conventional models, achieving median Pearson's correlation of 0.68 (n = 187) and predictive performance of 0.74 and 0.64 for proteins with high (>=0.31) and low (<0.31) self-gene/protein correlation, respectively. We also developed DeepEnrich, an integrated gradient-based interpretation framework that identifies predictor genes and enriched functions. For example, predictors of cyclin B1 and E2 are enriched in mitotic chromatid segregation and G2/M transition, respectively. In lung adenocarcinoma, we uncovered distinct EGFR/HER2 phosphorylation patterns in alveolar cells. In breast cancer, p53 protein, but not TP53 mRNA, correlated with survival. DeepGxP also accurately predicted the abundance of single-cell surface proteins, confirming cell identification. Our findings underscore DeepGxP's potential in decoding gene-to-protein relationships for cancer biomarker discovery.
| Authors: | Tsai HM, Hsiao TH, Chiu YC, Huang Y, Chuang EY, Chen Y, |
|---|---|
| Journal: | iScience;2026Mar20; 29 (3) 114815. doi:10.1016/j.isci.2026.114815 |
| Year: | 2026 |
| PubMed: | PMID: 41816284 (Go to PubMed) |