The Model Organism Databases (MODs) are working with the InterMine group to enable faster comparative studies and develop tools that make analysis accessible to the wider scientific community.
even in the most wildly optimistic projections, data mining isn't tenable for uncovering future terrorist plots. We're not trading privacy for security; we're giving up privacy and getting no security in return.
Baker provides us with a fascinating guide to the world of "The Numerati" who use the data we produce every day (click web pages, flip channels, drive through automatic toll booths, shop with credit cards, and make cell phone calls) to profile us as workers, shoppers, patients, voters, potential terrorists, and lovers.
Online repository of large data sets for researchers in knowledge discovery and data mining. includes Discrete Sequence Data, Image Data, Multivariate Data, Relational Data, Spatio-Temporal Data, Text (corpora), Time Series, Web Data (web pages and log files).
The Software Environment for the Advancement of Scholarly Research (SEASR), funded by the Andrew W. Mellon Foundation, provides a research and development environment capable of powering leading-edge digital humanities initiatives.
the data here is useful for testing classification / clustering, and the accuracy of indexing techniques. However the datasets are too small to make claims about the efficiency of indexing.
The Digging into Data Challenge is an international grant competition sponsored by four leading research agencies, the Joint Information Systems Committee (JISC1) from the United Kingdom, the National Endowment for the Humanities (NEH2) from the United States, the National Science Foundation (NSF3) from the United States, and the Social Sciences and Humanities Research Council (SSHRC4) from Canada.
GroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota. datasets include MovieLens, Wikilens, Book-Crossing, Jester Joke, EachMovie.
This work is in the general area of sentiment analysis, opinion extraction or opinion mining, and feature-based opinion summarization from the user-generated content or user-generated media on the Web, e.g., reviews, forum and group discussions, and blogs. The area is also closely related to sentiment classification.
DataSift provides very granular and modular ‘sifting’ functions from a wide range of social and web input feeds, augmenting them with sentiment analysis, storage and analytics to offer an unrivalled service platform which leverages the cloud and scales infinitely. The world is moving to streams, and consumers will consume and curate their own news. DataSift follows this paradigm shift and seeks to become the platform of choice for stream curation, consumption, and ultimately monetization. The end visualizations are unlimited and bounded only by your imagination.
The Datawrangling blog was put on the back burner last May while I focused on my startup. Now that I have some bandwidth again, I am getting back to work on several pet projects (including the Amazon EC2 Cluster).
R. Agrawal, S. Gollapudi, A. Kannan, and K. Kenthapadi. Proceedings of the 20th International Conference Companion on World Wide Web, page 483--492. New York, NY, USA, ACM, (2011)
M. Houbraken, C. Sun, E. Smirnov, and K. Driessens. Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, page 147--152. New York, NY, USA, ACM, (2017)
B. Bullock, H. Lerch, A. Roßnagel, A. Hotho, and G. Stumme. Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies, page 15:1--15:8. New York, NY, USA, ACM, (2011)
J. Abowd, L. Vilhuber, and W. Block. Privacy in Statistical Databases, volume 7556 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2012)
B. Bullock, H. Lerch, A. Roßnagel, A. Hotho, and G. Stumme. Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies, page 15:1--15:8. New York, NY, USA, ACM, (2011)
L. Vocht, S. Softic, M. Ebner, and H. Mühlburger. Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies, page 43:1--43:9. New York, NY, USA, ACM, (2011)
E. Müller, I. Assent, R. Krieger, T. Jansen, and T. Seidl. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, page 1089--1092. New York, NY, USA, ACM, (2008)
D. Maniyar, and I. Nabney. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, page 643--648. New York, NY, USA, ACM, (2006)
W. Kammergruber, M. Viermetz, K. Ehms, and M. Langen. 6th International Conference on Next Generation Web Services Practices (NWeSP 2010), (2010)-- best student paper award.
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Proc. of the 3rd int. workshop on Web information and data management (WIDM '01), page 9--15. New York, NY, USA, ACM, (2001)
C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, page 137--143. New York, NY, USA, ACM, (2006)
E. Gabrilovich, and S. Markovitch. IJCAI'07: Proceedings of the 20th international joint conference on Artifical intelligence, page 1606--1611. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., (2007)
D. Newman, J. Lau, K. Grieser, and T. Baldwin. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, page 100--108. Los Angeles, California, Association for Computational Linguistics, (June 2010)
C. Romero, S. Ventura, J. Delgado, and P. Bra. Creating New Learning Experiences on a Global Scale. Second European Conference on Technology Enhanced Learning, EC-TEL 2007, page 293--305. Crete, Greece, Springer, (2007)
U. Fayyad. KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, page 2-3. New York, NY, USA, ACM, (2007)