Categories are pages that are used to group other pages on similar subjects together. This is done to help users find the pages they are looking for, even if they do not know whether it exists or what it is called.
Every page should belong to at least one category. A page may often be in several categories. However, putting a page in too many categories may not be useful.
Wikipedia is a terrific knowledge resource, and many recent studies in artificial intelligence, information retrieval and related fields used Wikipedia to endow computers with (some) human knowledge. Wikipedia dumps are publicly available in XML format, but they have a few shortcomings. First, they contain a lot of information that is often not used when Wikipedia texts are used as knowledge (e.g., ids of users who changed each article, timestamps of article modifications). On the other hand, the XML dumps do not contain a lot of useful information that could be inferred from the dump, such as link tables, category hierarchy, resolution of redirection links etc.
Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as social network analysis, information retrieval, recommendation system, biology applications, multimedia, and e-commerce. The objective of this special issue is to connect academia and industry on the methods and experiences of large scale data analysis. We look for scalable machine learning, data mining algorithms, implementations, frameworks and case studies that target at real and practical scenarios for large datasets. The focus is to identify the real challenges in large-scale data mining and to investigate the scalable methods and practical solutions of the core machine learning and data mining problems with respect to both theoretical and experimental perspectives.
The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric
M. Koolen, G. Kazai, and N. Craswell. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 44--53. New York, NY, USA, ACM, (2009)
A. Turpin, and F. Scholer. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, page 11--18. New York, NY, USA, ACM, (2006)
C. Daskalakis, P. Goldberg, and C. Papadimitriou. STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, page 71--78. New York, NY, USA, ACM, (2006)
C. Karande, K. Chellapilla, and R. Andersen. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 272--281. New York, NY, USA, ACM, (2009)
M. Banko, and E. Brill. ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, page 26--33. Morristown, NJ, USA, Association for Computational Linguistics, (2001)
M. Hearst, and J. Pedersen. SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, page 76--84. New York, NY, USA, ACM, (1996)
C. Ding, T. Li, D. Luo, and W. Peng. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, page 831--832. New York, NY, USA, ACM, (2008)
O. Kurland, and L. Lee. SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, page 306--313. New York, NY, USA, ACM, (2005)
K. Avrachenkov, V. Dobrynin, D. Nemirovsky, S. Pham, and E. Smirnova. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, page 873--874. New York, NY, USA, ACM, (2008)
J. Tang, and P. Lewis. CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval, page 105--112. New York, NY, USA, ACM, (2008)
D. Bollegala, Y. Matsuo, and M. Ishizuka. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 104--113. New York, NY, USA, ACM, (2009)
G. Erkan. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, page 479--486. Morristown, NJ, USA, Association for Computational Linguistics, (2006)
D. Arthur, and S. Vassilvitskii. SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, page 1027--1035. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2007)
J. Kamps, and M. Koolen. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 232--241. New York, NY, USA, ACM, (2009)
T. Hu, H. Xiong, W. Zhou, S. Sung, and H. Luo. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, page 871--872. New York, NY, USA, ACM, (2008)
D. Arthur, and S. Vassilvitskii. SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry, page 144--153. New York, NY, USA, ACM, (2006)
X. Liu, and W. Croft. ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval, page 454--462. Berlin, Heidelberg, Springer-Verlag, (2008)
A. Leuski. CIKM '01: Proceedings of the tenth international conference on Information and knowledge management, page 33--40. New York, NY, USA, ACM, (2001)
N. Slonim, and N. Tishby. SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, page 208--215. New York, NY, USA, ACM, (2000)
W. Xu, X. Liu, and Y. Gong. SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, page 267--273. New York, NY, USA, ACM, (2003)
B. Dorow, and D. Widdows. Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2, page 79--82. Morristown, NJ, USA, Association for Computational Linguistics, (2003)
R. West, D. Precup, and J. Pineau. Proceeding of the 18th ACM conference on Information and knowledge management, page 1097--1106. New York, NY, USA, ACM, (2009)