Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
Y. Lin, B. Yu, A. Hall, и B. Hecht. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, стр. 2052--2067. New York, NY, USA, ACM, (2017)
DOI: 10.1145/2998181.2998274
Аннотация
Wikipedia-based studies and systems frequently assume that no two articles describe the same concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors' tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. "Portland, Oregon" and "History of Portland, Oregon" in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the sub-article matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.
Описание
Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
%0 Conference Paper
%1 lin2017problematizing
%A Lin, Yilun
%A Yu, Bowen
%A Hall, Andrew
%A Hecht, Brent
%B Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
%C New York, NY, USA
%D 2017
%I ACM
%K article assumption concept wikipedia
%P 2052--2067
%R 10.1145/2998181.2998274
%T Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
%U http://doi.acm.org/10.1145/2998181.2998274
%X Wikipedia-based studies and systems frequently assume that no two articles describe the same concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors' tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. "Portland, Oregon" and "History of Portland, Oregon" in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the sub-article matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.
%@ 978-1-4503-4335-0
@inproceedings{lin2017problematizing,
abstract = {Wikipedia-based studies and systems frequently assume that no two articles describe the same concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors' tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. "Portland, Oregon" and "History of Portland, Oregon" in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the sub-article matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.},
acmid = {2998274},
added-at = {2017-08-27T12:52:43.000+0200},
address = {New York, NY, USA},
author = {Lin, Yilun and Yu, Bowen and Hall, Andrew and Hecht, Brent},
biburl = {https://www.bibsonomy.org/bibtex/22e89ff62df37d969cd7f7505318a9eac/thoni},
booktitle = {Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing},
description = {Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia},
doi = {10.1145/2998181.2998274},
interhash = {31b5137eac3c7790ea3c9eb28f3177f6},
intrahash = {2e89ff62df37d969cd7f7505318a9eac},
isbn = {978-1-4503-4335-0},
keywords = {article assumption concept wikipedia},
location = {Portland, Oregon, USA},
numpages = {16},
pages = {2052--2067},
publisher = {ACM},
series = {CSCW '17},
timestamp = {2017-08-27T12:52:43.000+0200},
title = {Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia},
url = {http://doi.acm.org/10.1145/2998181.2998274},
year = 2017
}