What would be a good way to extract headlines, dates, and authors from news articles? It seems easy to write a scraper using xpath or similar to extract this information from a single site, but I'm not sure of a more scalable solution if you're extracting from say 10,000 sites.
Tagging, folksonomy, distributed classification, ethnoclassificationhowever it is labelled, the concept of users creating and aggregating their own metadata is gaining ground on the internet. This literature review briefly defines the topic at hand, looking at current implementations and summarizing key advantages and disadvantages of distributed classification systems with reference to prominent folksonomy commentators. After considering whether distributed classification can replace expert catalogers entirely, it concludes that distributed classification can make an important contribution to digital information organisation, but that it may need to be integrated with more traditional organisation tools to overcome its current weaknesses.
Full text content of the book Search User Interfaces, written by Marti Hearst and published by Cambridge University Press, 2009. Chapter 3: Models of the Information Seeking Process
Goodgecko is an easy-to-use tool to gather and analyze customer feedback. It works on the web, on mobile and with physical retail spaces such as restaurants.
My paper Telling Experts from Spammers: Expertise Ranking in Folksonomies, a joint work written together with fellow Ph.D. candidate Ching-Man Au Yeung from
T. Ley, and P. Seitlinger. CEUR Workshop Proceedings of the International Workshop on Adaptation in Social and Semantic Web (SASWeb2010), 590, page 13-18. (2010)
R. Wetzker, C. Zimmermann, C. Bauckhage, and S. Albayrak. WSDM '10: Proceedings of the Third ACM International Conference on Web Search and Data Mining, page 71--80. New York, NY, USA, ACM, (2010)
N. Weber, T. Nelkner, K. Schoefegger, and S. Lindstaedt. Proceedings of the Third International Workshop on Mashup Personal Learning Environments (MUPPLE09), in conjunction with the 5th European Conference on Technology Enhanced Learning (EC-TEL2010), Barcelona, Spain, (September 2010)
M. Noll, C. man Au Yeung, N. Gibbins, C. Meinel, and N. Shadbolt. SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, page 612--619. New York, NY, USA, ACM, (2009)
A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke. RecSys '08: Proceedings of the 2008 ACM conference on Recommender systems, page 259--266. New York, NY, USA, ACM, (2008)
J. Fogarty, R. Baker, and S. Hudson. GI '05: Proceedings of Graphics Interface 2005, page 129--136. School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, Canadian Human-Computer Communications Society, (2005)
K. Schoefegger, P. Seitlinger, and T. Ley. Proceedings of the 1st Workshop on Recommender Systems for Technology Enhanced Learning (RecSysTEL 2010), 1, page 2829 - 2838. (Sep 7, 2010)
K. Schoefegger, P. Seitlinger, and T. Ley. It’s about time: Exploring temporality in group learning. Alpine Rendez-Vous, Garmisch-Partenkirchen, December 2009, (2009)
M. Szomszor, I. Cantador, and H. Alani. Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (Hypertext 2008), page 33--42. New York, NY, USA, ACM, (June 2008)
J. Teevan, S. Dumais, and E. Horvitz. SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, page 449--456. New York, NY, USA, ACM Press, (2005)
X. Shen, B. Tan, and C. Zhai. CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management, page 824--831. New York, NY, USA, ACM Press, (2005)
K. Sugiyama, K. Hatano, and M. Yoshikawa. WWW '04: Proceedings of the 13th international conference on World Wide Web, page 675--684. New York, NY, USA, ACM Press, (2004)
M. Riedl, and R. Amant. AAMAS '03: Proceedings of the second international joint conference on Autonomous agents and multiagent systems, page 361--368. New York, NY, USA, ACM Press, (2003)
K. Ehrlich, and N. Shami. CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, page 1093--1096. New York, NY, USA, ACM, (2008)
A. Budura, D. Bourges-Waldegg, and J. Riordan. CSE '09: Proceedings of the 2009 International Conference on Computational Science and Engineering, page 34--41. Washington, DC, USA, IEEE Computer Society, (2009)
E. Michlmayr, and S. Cayzer. Proceedings of the Workshop on Tagging and Metadata for Social Information Organization, 16th International World Wide Web Conference, (2007)
S. Tartir, I. Arpinar, M. Moore, A. Sheth, and B. Aleman-Meza. Proceedings of IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources, (2005)
J. Tang, H. fung Leung, Q. Luo, D. Chen, and J. Gong. IJCAI'09: Proceedings of the 21st international jont conference on Artifical intelligence, page 2089--2094. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., (2009)