копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Near-duplicate Detection by Instance-level Constrained Clustering

H. Yang, и J. Callan. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, стр. 421--428. New York, NY, USA, ACM, (2006)
DOI: 10.1145/1148170.1148243

Аннотация

For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information retrieval community are not sufficiently accurate. This is due to the fact that the characteristics of near-duplicated documents are different from that of both älmost-identical" documents in the data cleaning task and "relevant" documents in the search task. This paper presents an instance-level constrained clustering approach for near-duplicate detection. The framework incorporates information such as document attributes and content structure into the clustering process to form near-duplicate clusters. Gathered from several collections of public comments sent to U.S. government agencies on proposed new regulations, the experimental results demonstrate that our approach outperforms other near-duplicate detection algorithms and as about as effective as human assessors.

Линки и ресурсы

ключ BibTeX: citeulike:2295267
тип записи: inproceedings
адрес: New York, NY, USA
название книги: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
год: 2006
страницы: 421--428
издательство: ACM
серии: SIGIR '06
citeulike-article-id: 2295267
isbn: 1-59593-369-7
citeulike-linkout-1: http://dx.doi.org/10.1145/1148170.1148243
priority: 2
posted-at: 2011-01-11 17:15:07
citeulike-linkout-0: http://portal.acm.org/citation.cfm?id=1148243
location: Seattle, Washington, USA
DOI: 10.1145/1148170.1148243
url: http://dx.doi.org/10.1145/1148170.1148243

тэги

@aho- тэги данного пользователя выделены

clustering

Цитировать эту публикацию

искать в

Метаданные

Последнее изменение 7 лет назад
Создан 7 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!