TY - JOUR
T1 - Uncover context-specific gene regulation by transcription factors and microRNAs using Bayesian sparse nonnegative factor regression
AU - Meng, Jia
AU - Chen, Yidong
AU - Huang, Yufei
N1 - Funding Information:
This project was supported by an NSF Grant (CCF-0546345) to Yufei Huang and a Qatar National Research Fund (09-874-3-235) to Yufei Huang and Yidong Chen.
PY - 2012/12
Y1 - 2012/12
N2 - In multicellular organisms, transcription factors (TFs) and microRNAs (miRNA) embody two largest families of molecules that modulate messenger RNA (mRNA) expressions through transcriptional and post-transcriptional regulations. While mRNA and microRNA expressions can be measured by microarray technique, the activities of transcription factors manifested by their protein expression are still difficult to observe, making it usually a complex problem to reconstruct a collaborative gene regulatory network (GRN) by TFs and miRNAs from expression data. In this paper, a novel Bayesian sparse non-negative factor regression (BSNFR) model is proposed for modeling the joint regulations of mRNAs by TFs and miRNAs and integration of multiple data types including gene expressions, microRNA expressions, TF targeted genes, and microRNA targets. Powered by a Gibbs sampling solution, BSNFR can infer both the TF/microRNA-mediated mRNA regulations and the unknown TF activities. Additionally, since BSNFR directly models the non-negative activities of TFs, it avoids the common problem of sign ambiguity with factor models and is capable of accurate prediction of the types (up or down) of regulations as well. BSNFR also includes a nonparametric Bayesian model for the latent factor activities, which enables the discovery of the clustering effects among samples due to (disease) subtypes. The proposed BSNFR model and the developed Gibbs sampling solution were validated on simulated systems and applied to real data of glioblastoma multiforme (GBM) patients from The Cancer Genome Atlas (TCGA). A GBM specific gene regulatory network by TFs and miRNAs was reconstructed. This GBM network includes 107 regulations recorded in the existing databases and 16 new regulations. Functional analysis suggests that the regulated genes are enriched in cell cycle and P53 pathways. In addition, BSNFR also identified 3 clusters among GBM patient samples, two of which demonstrates significant survival differences (p=0.004). Finally, the estimated TF activities imply that EGR-1 is significantly correlated with patient survivals (p=0.004) and may be used as a prognostic biomarker. The data and matlab code are available at: .
AB - In multicellular organisms, transcription factors (TFs) and microRNAs (miRNA) embody two largest families of molecules that modulate messenger RNA (mRNA) expressions through transcriptional and post-transcriptional regulations. While mRNA and microRNA expressions can be measured by microarray technique, the activities of transcription factors manifested by their protein expression are still difficult to observe, making it usually a complex problem to reconstruct a collaborative gene regulatory network (GRN) by TFs and miRNAs from expression data. In this paper, a novel Bayesian sparse non-negative factor regression (BSNFR) model is proposed for modeling the joint regulations of mRNAs by TFs and miRNAs and integration of multiple data types including gene expressions, microRNA expressions, TF targeted genes, and microRNA targets. Powered by a Gibbs sampling solution, BSNFR can infer both the TF/microRNA-mediated mRNA regulations and the unknown TF activities. Additionally, since BSNFR directly models the non-negative activities of TFs, it avoids the common problem of sign ambiguity with factor models and is capable of accurate prediction of the types (up or down) of regulations as well. BSNFR also includes a nonparametric Bayesian model for the latent factor activities, which enables the discovery of the clustering effects among samples due to (disease) subtypes. The proposed BSNFR model and the developed Gibbs sampling solution were validated on simulated systems and applied to real data of glioblastoma multiforme (GBM) patients from The Cancer Genome Atlas (TCGA). A GBM specific gene regulatory network by TFs and miRNAs was reconstructed. This GBM network includes 107 regulations recorded in the existing databases and 16 new regulations. Functional analysis suggests that the regulated genes are enriched in cell cycle and P53 pathways. In addition, BSNFR also identified 3 clusters among GBM patient samples, two of which demonstrates significant survival differences (p=0.004). Finally, the estimated TF activities imply that EGR-1 is significantly correlated with patient survivals (p=0.004) and may be used as a prognostic biomarker. The data and matlab code are available at: .
KW - Nonnegative factorization
KW - break microRNA
KW - factor regression model
KW - gene network
KW - transcription factor
UR - http://www.scopus.com/inward/record.url?scp=84873598295&partnerID=8YFLogxK
U2 - 10.1142/S0218339012400037
DO - 10.1142/S0218339012400037
M3 - Article
AN - SCOPUS:84873598295
SN - 0218-3390
VL - 20
SP - 377
EP - 402
JO - Journal of Biological Systems
JF - Journal of Biological Systems
IS - 4
ER -