TY - GEN
T1 - Multi-resolution modelling of topic relationships in semantic space
AU - Wang, Wei
AU - Bargiela, Andrzej
PY - 2009
Y1 - 2009
N2 - Recent techniques for document modelling provide means for transforming document representation in high dimensional word space to low dimensional semantic space. The representation with coarse resolution is often regarded as being able to capture intrinsic semantic structure of the original documents. Probabilistic topic models for document modelling attempt to search for richer representations of the structure of linguistic stimuli and as such support the process of human cognition. The topics inferred by the probabilistic topic models (latent topics) are represented as probability distributions over words. Although they are interpretable, the interpretation is not sufficiently straightforward for human understanding. Also, perhaps more importantly, relationships between the topics are difficult, if not impossible to interpret. Instead of directly operating on the latent topics, we extract topics with labels from a document collection and represent them using fictitious documents. Having trained the probabilistic topic models, we propose a method for deriving relationships (more general or more specific) between the extracted topics in the semantic space. To ensure a reasonable accuracy of modeling in a given semantic space we have conducted experiments with various dimensionality of the semantic space to identify optimal parameter settings in this context. Evaluation and comparison show that our method outperforms the existing methods for learning concept or topic relationships using same dataset.
AB - Recent techniques for document modelling provide means for transforming document representation in high dimensional word space to low dimensional semantic space. The representation with coarse resolution is often regarded as being able to capture intrinsic semantic structure of the original documents. Probabilistic topic models for document modelling attempt to search for richer representations of the structure of linguistic stimuli and as such support the process of human cognition. The topics inferred by the probabilistic topic models (latent topics) are represented as probability distributions over words. Although they are interpretable, the interpretation is not sufficiently straightforward for human understanding. Also, perhaps more importantly, relationships between the topics are difficult, if not impossible to interpret. Instead of directly operating on the latent topics, we extract topics with labels from a document collection and represent them using fictitious documents. Having trained the probabilistic topic models, we propose a method for deriving relationships (more general or more specific) between the extracted topics in the semantic space. To ensure a reasonable accuracy of modeling in a given semantic space we have conducted experiments with various dimensionality of the semantic space to identify optimal parameter settings in this context. Evaluation and comparison show that our method outperforms the existing methods for learning concept or topic relationships using same dataset.
KW - Document modelling
KW - Latent semantic allocation
KW - Probabilistic topic models
KW - Topic hierarchy
UR - http://www.scopus.com/inward/record.url?scp=84863242234&partnerID=8YFLogxK
M3 - Conference Proceeding
AN - SCOPUS:84863242234
SN - 0955301882
SN - 9780955301889
T3 - Proceedings - 23rd European Conference on Modelling and Simulation, ECMS 2009
SP - 813
EP - 819
BT - Proceedings - 23rd European Conference on Modelling and Simulation, ECMS 2009
T2 - 23rd European Conference on Modelling and Simulation, ECMS 2009
Y2 - 9 June 2009 through 12 June 2009
ER -