Categories are pages that are used to group other pages on similar subjects together. This is done to help users find the pages they are looking for, even if they do not know whether it exists or what it is called.
Every page should belong to at least one category. A page may often be in several categories. However, putting a page in too many categories may not be useful.
Wikipedia is a terrific knowledge resource, and many recent studies in artificial intelligence, information retrieval and related fields used Wikipedia to endow computers with (some) human knowledge. Wikipedia dumps are publicly available in XML format, but they have a few shortcomings. First, they contain a lot of information that is often not used when Wikipedia texts are used as knowledge (e.g., ids of users who changed each article, timestamps of article modifications). On the other hand, the XML dumps do not contain a lot of useful information that could be inferred from the dump, such as link tables, category hierarchy, resolution of redirection links etc.
Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as social network analysis, information retrieval, recommendation system, biology applications, multimedia, and e-commerce. The objective of this special issue is to connect academia and industry on the methods and experiences of large scale data analysis. We look for scalable machine learning, data mining algorithms, implementations, frameworks and case studies that target at real and practical scenarios for large datasets. The focus is to identify the real challenges in large-scale data mining and to investigate the scalable methods and practical solutions of the core machine learning and data mining problems with respect to both theoretical and experimental perspectives.
The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric
Die gezeigten Posts sind eventuell nicht akkurat bei Änderungen, die vor Kurzem vorgenommen worden. Wollen Sie jedoch akkurate Posts mit eingeschränkten Sortierungsmöglichkeiten, folgen Sie dem folgenden Link.
M. Koolen, G. Kazai, und N. Craswell. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, Seite 44--53. New York, NY, USA, ACM, (2009)
A. Turpin, und F. Scholer. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seite 11--18. New York, NY, USA, ACM, (2006)
C. Daskalakis, P. Goldberg, und C. Papadimitriou. STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, Seite 71--78. New York, NY, USA, ACM, (2006)
C. Karande, K. Chellapilla, und R. Andersen. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, Seite 272--281. New York, NY, USA, ACM, (2009)
M. Banko, und E. Brill. ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Seite 26--33. Morristown, NJ, USA, Association for Computational Linguistics, (2001)
M. Hearst, und J. Pedersen. SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, Seite 76--84. New York, NY, USA, ACM, (1996)
C. Ding, T. Li, D. Luo, und W. Peng. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Seite 831--832. New York, NY, USA, ACM, (2008)
O. Kurland, und L. Lee. SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, Seite 306--313. New York, NY, USA, ACM, (2005)