Infochimps.org
Free Redistributable Rich Data Sets
There are many sources to find out something about everything. Until now, there’s been no good place for you to find out everything about something.
The infochimps.org community is assembling and interconnecting the world's best repository for raw data -- a sort of giant free allmanac, with tables on everything you can put in a table. Built by data nerds, used by data nerds, it's a central source for the information you need to power the projects the world needs.
The Datawrangling blog was put on the back burner last May while I focused on my startup. Now that I have some bandwidth again, I am getting back to work on several pet projects (including the Amazon EC2 Cluster).
What makes something “Information Visualization?” Is it just visual titillation? Or is it a tool that interprets, analyzes, and facilitates deeper understanding of data?
The burgeoning interest in R demonstrates that there’s demand for analytics to solve real, business-critical problems in a broad spectrum of companies and roles, and that some of the incumbent analytics offerings, in particular SAS and SPSS, don’t sufficiently meet the growing need for analytics in many major companies. Annotated link http://www.diigo.com/bookmark/http%3A%2F%2Fspotfire.tibco.com%2Fcommunity%2Fblogs%2Fenterpriseanalytics%2Farchive%2F2009%2F01%2F08%2Fanalytics-in-the-nyt.aspx
Various US databases provided by federal government agencies. Census, Labor Statistics, Transportation, Economics. Also: A 3D Version of the PubChem Library, Annotated Human Genome Data.
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. The collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.
YQL (Yahoo Query Language) works with arbitrary structured (XML or JSON) documents with repeating elements, such as a list of restaurants or search results. Different "known" collections of these items are presented as "tables" in the YQL syntax, and are notionally namespaced based on the service providing the data.
a torrent tracker for public datasets. If you are scientist, research developer or just interested in it, you can find and download some dataset or, if you are owner of dataset, you can publish this dataset (become a torrent seeder) at this site.
The Knowledge Network for Energy Transitions (KNET) is a global network of scholars, educators and organizations who study the economic, political, cultural, environmental and technological aspects of changes in society's major energy systems
Designed and produced by the World Wide Web Foundation, the Web Index is the world’s first multi-dimensional measure of the Web’s growth, utility and impact on people and nations.
By donating your bookmarks, you let GiveALink analyze your preferences along with those of many other people. We will mine the resulting collection for interesting insights and use the information to develop novel applications. We will also share bookmark data with the Web research community, hoping to foster the development of many novel Web mining techniques and applications to search, recommendation, navigation, personalization and visualization of the Web.
Last week, Sam explored trends in the technology jobs market, suggesting that significant opportunities only reveal themselves when examining both the available jobs and the underlying trends in demand for skills. Coincidentally, on the same day that Sam’s piece was published, The New York Times suggested that “the sexy job in the next 10 years will be statisticians.”
Find and download data in any format, from financial to social networking to GIS data. Or sell data in our data marketplace, at a price you set. We have large data sets, spreadsheets, and databases packed with statistics.
Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At
Looks at contemporary American culture through austere lens of statistics. Each image portrays a specific quantity of something: fifteen million sheets of office paper (five minutes of paper use); 106,000 aluminum cans (thirty seconds of can consumption)
The Bioinformatics Links Directory features curated links to molecular resources, tools and databases. The links listed in this directory are selected on the basis of recommendations from bioinformatics experts in the field. We also rely on input from our community of bioinformatics users for suggestions.
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.
The National Digital Archive of Datasets (NDAD) preserves and provides online access to archived digital datasets and documents from UK central government departments. Our collection spans 40 years of recent history, with the earliest available dataset dating back to about 1963.
GroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota. datasets include MovieLens, Wikilens, Book-Crossing, Jester Joke, EachMovie.
The Open Economics project provides open content, data and code related to Economics. This site itself provides interfaces to some (though not all) of the Open Economics datasets and models.
StatLib, a system for distributing statistical software, datasets, and information. started in 1989. hosted by the Department of Statistics at Carnegie Mellon University.
F. Alam, U. Qazi, M. Imran, and F. Ofli. (2021)cite arxiv:2104.03090Comment: Accepted in ICWSM-2021, Twitter datasets, Textual content, Natural disasters, Crisis Informatics.
C. Baker, C. Fillmore, and J. Lowe. Proceedings of the 17th international conference on Computational linguistics, page 86--90. Morristown, NJ, USA, Association for Computational Linguistics, (1998)
D. Brain, and G. Webb. Proceedings of the Fourth Australian Knowledge Acquisition Workshop (AKAW '99), page 117-128. Sydney, The University of New South Wales, (1999)
D. Brain, and G. Webb. Lecture Notes in Computer Science 2431: Principles of Data Mining and Knowledge Discovery: Proceedings of the Sixth European Conference (PKDD 2002), page 62-73. Berlin/Heidelberg, Springer-Verlag, (2002)