TY - JOUR
T1 - Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules
AU - Meng, Jia
AU - Gao, Shou Jiang
AU - Huang, Yufei
N1 - Funding Information:
Funding: NSF (grant CCF-0546345 to Y.H.); National Institutes of Health (grants CA096512 and CA124332 to S.-J.G.).
PY - 2009
Y1 - 2009
N2 - Motivation: Clustering is a popular data exploration technique widely used in microarray data analysis. When dealing with time-series data, most conventional clustering algorithms, however, either use one-way clustering methods, which fail to consider the heterogeneity of temporary domain, or use two-way clustering methods that do not take into account the time dependency between samples, thus producing less informative results. Furthermore, enrichment analysis is often performed independent of and after clustering and such practice, though capable of revealing biological significant clusters, cannot guide the clustering to produce biologically significant result. Result: We present a new enrichment constrained framework (ECF) coupled with a time-dependent iterative signature algorithm (TDISA), which, by applying a sliding time window to incorporate the time dependency of samples and imposing an enrichment constraint to parameters of clustering, allows supervised identification of temporal transcription modules (TTMs) that are biologically meaningful. Rigorous mathematical definitions of TTM as well as the enrichment constraint framework are also provided that serve as objective functions for retrieving biologically significant modules. We applied the enrichment constrained time-dependent iterative signature algorithm (ECTDISA) to human gene expression time-series data of Kaposi's sarcoma-associated herpesvirus (KSHV) infection of human primary endothelial cells; the result not only confirms known biological facts, but also reveals new insight into the molecular mechanism of KSHV infection.
AB - Motivation: Clustering is a popular data exploration technique widely used in microarray data analysis. When dealing with time-series data, most conventional clustering algorithms, however, either use one-way clustering methods, which fail to consider the heterogeneity of temporary domain, or use two-way clustering methods that do not take into account the time dependency between samples, thus producing less informative results. Furthermore, enrichment analysis is often performed independent of and after clustering and such practice, though capable of revealing biological significant clusters, cannot guide the clustering to produce biologically significant result. Result: We present a new enrichment constrained framework (ECF) coupled with a time-dependent iterative signature algorithm (TDISA), which, by applying a sliding time window to incorporate the time dependency of samples and imposing an enrichment constraint to parameters of clustering, allows supervised identification of temporal transcription modules (TTMs) that are biologically meaningful. Rigorous mathematical definitions of TTM as well as the enrichment constraint framework are also provided that serve as objective functions for retrieving biologically significant modules. We applied the enrichment constrained time-dependent iterative signature algorithm (ECTDISA) to human gene expression time-series data of Kaposi's sarcoma-associated herpesvirus (KSHV) infection of human primary endothelial cells; the result not only confirms known biological facts, but also reveals new insight into the molecular mechanism of KSHV infection.
UR - http://www.scopus.com/inward/record.url?scp=66349129690&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btp235
DO - 10.1093/bioinformatics/btp235
M3 - Article
C2 - 19351618
AN - SCOPUS:66349129690
SN - 1367-4803
VL - 25
SP - 1521
EP - 1527
JO - Bioinformatics
JF - Bioinformatics
IS - 12
ER -