Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA Sequencing Data.
Abstract
The development of single cell RNA sequencing (scRNA-seq) has enabled innovative approaches to investigating mRNA abundances. In our study, we are interested in extracting the systematic patterns of scRNA-seq data in an unsupervised manner; thus, we have developed two extensions of robust principal component analysis (RPCA). First, we present a truncated version of RPCA (tRPCA), which is much faster and memory efficient. Second, we introduce a noise reduction in tRPCA with L2 regularization. Unlike RPCA that only considers a low-rank L and sparse S matrices, the proposed method can also extract a noise E matrix inherent in modern genomic data. We demonstrate its usefulness by applying our methods on the peripheral blood mononuclear cell scRNA-seq data. Particularly, the clustering of a low-rank L matrix showcases better classification of unlabeled single cells. Overall, the proposed variants are well suited for high-dimensional and noisy data that are routinely generated in genomics.
Authors: | Gogolewski K, Sykulski M, Chung NC, Gambin A. |
---|---|
Journal: | J Comput Biol. 2019 Aug;26(8):782-793 |
Year: | 2019 |
PubMed: | Find in PubMed |