Crowd Synthesis: Extracting Categories and Clusters from Complex Data
P. André, A. Kittur, and S. Dow. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing, page 989--998. New York, NY, USA, ACM, (2014)
DOI: 10.1145/2531602.2531653
Abstract
Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.
%0 Conference Paper
%1 citeulike:13854169
%A André, Paul
%A Kittur, Aniket
%A Dow, Steven P.
%B Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing
%C New York, NY, USA
%D 2014
%I ACM
%K crowdsourcing text-analysis
%P 989--998
%R 10.1145/2531602.2531653
%T Crowd Synthesis: Extracting Categories and Clusters from Complex Data
%U http://dx.doi.org/10.1145/2531602.2531653
%X Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.
%@ 978-1-4503-2540-0
@inproceedings{citeulike:13854169,
abstract = {{Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.}},
added-at = {2018-03-19T12:24:51.000+0100},
address = {New York, NY, USA},
author = {Andr{\'{e}}, Paul and Kittur, Aniket and Dow, Steven P.},
biburl = {https://www.bibsonomy.org/bibtex/219ace006d6c71d899f3ae3f174ef7e3e/aho},
booktitle = {Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work \&\#38; Social Computing},
citeulike-article-id = {13854169},
citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2531653},
citeulike-linkout-1 = {http://dx.doi.org/10.1145/2531602.2531653},
doi = {10.1145/2531602.2531653},
interhash = {ba0484d2633a61ffb7626323762f89b1},
intrahash = {19ace006d6c71d899f3ae3f174ef7e3e},
isbn = {978-1-4503-2540-0},
keywords = {crowdsourcing text-analysis},
location = {Baltimore, Maryland, USA},
pages = {989--998},
posted-at = {2015-12-04 19:36:23},
priority = {4},
publisher = {ACM},
series = {CSCW '14},
timestamp = {2018-03-19T12:24:51.000+0100},
title = {{Crowd Synthesis: Extracting Categories and Clusters from Complex Data}},
url = {http://dx.doi.org/10.1145/2531602.2531653},
year = 2014
}