| |  | Mothe, Josiane | Combining mining and visualization tools to discover the geographic structure of a domain read moreAbstract: Science monitoring is a core issue in the new world of business and research. Companies and institutes need to monitor the activities of their competitors, get information on the market, changing technologies or government policies. This paper presents the Tétralogie platform that is aimed at allowing a user to interactively discover trends in scientific research and communities from large textual collections that include information about geographical location. Tétralogie consists of several agents that communicate with each other on users’ demands in order to deliver results to them. Metadata and document content are extracted before being mined. Results are displayed in the form of histograms, networks and geographical maps; these complementary types of presentations increase the possibilities of analysis compared to the use of these tools separately. We illustrate the overall process through a case study of scientific literature analysis and show how the different agents can be combined to discover the structure of a domain. The system correctly predicts the country contribution to a field in future years and allows exploration of the relationships between countries. | 2006 |
| |  | Griffiths, Thomas L. | Finding scientific topics read moreAbstract: A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying hot topics by examining temporal dynamics and tagging abstracts to illustrate semantic content. | 2004 |