@brusilovsky

Crowd Synthesis: Extracting Categories and Clusters from Complex Data

, , and . Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing, page 989--998. New York, NY, USA, ACM, (2014)
DOI: 10.1145/2531602.2531653

Abstract

Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.

Links and resources

Tags

community

  • @brusilovsky
  • @aho
@brusilovsky's tags highlighted