@brusilovsky

TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling

, , , , , , and . ACM Trans. Intell. Syst. Technol., 3 (2): 23:1--23:26 (February 2012)
DOI: 10.1145/2089094.2089099

Abstract

We present TopicNets, a Web-based system for visual and interactive analysis of large sets of documents using statistical topic models. A range of visualization types and control mechanisms to support knowledge discovery are presented. These include corpus- and document-specific views, iterative topic modeling, search, and visual filtering. Drill-down functionality is provided to allow analysts to visualize individual document sections and their relations within the global topic space. Analysts can search across a dataset through a set of expansion techniques on selected document and topic nodes. Furthermore, analysts can select relevant subsets of documents and perform real-time topic modeling on these subsets to interactively visualize topics at various levels of granularity, allowing for a better understanding of the documents. A discussion of the design and implementation choices for each visual analysis technique is presented. This is followed by a discussion of three diverse use cases in which TopicNets enables fast discovery of information that is otherwise hard to find. These include a corpus of 50,000 successful NSF grant proposals, 10,000 publications from a large research center, and single documents including a grant proposal and a PhD thesis.

Description

TopicNets

Links and resources

Tags