Zanran helps you to find ‘semi-structured’ data on the web. This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. Put more simply: Zanran is Google for data. At present, we extract tables and images from HTML, PDF and Excel files and will be processing PowerPoint and Word documents in the near future.
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.
The Web Ecology Project is an interdisciplinary research group based in Boston, Massachusetts focusing on using large scale data mining to analyze the system-wide flows of culture and community online. In addition to the task of understanding culture on the web through quantitative research and rigorous experimentation, we are attempting to build a science around community management and social media. To that end, we are building tools and conducting research that enable planners to launch data-driven campaigns backed by network science. twitter archive.
I have come to realize how hard it is for a everyday programmer to get access to even the most basic factual data. If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for. If you want to write an open source TiVo competitor, you need television listings data for every cable provider in the country, but your options are tenuous at best
VisualEyes is web-based authoring tool developed at the University of Virginia to weave images, maps, charts, video and data into highly interactive and compelling dynamic visualizations. VisualEyes enables scholars to present selected primary source materials and research findings while encouraging active inquiry and hands-on learning among general and targeted audiences. It communicates through the use of dynamic displays – or "visualizations" – that organize and present meaningful information in both traditional and multimedia formats, such as audio-video, animation, charts, maps, data, and interactive timelines. The effective use of the visualizations can reveal and illuminate relationships between multiple kinds of information across time and space far more effectively than words alone.
The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) has launched a new internet-based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point (http://data.un.org/) from which users can now search and download a variety of statistical resources of the UN System.
The UCSD-Nature Signaling Gateway is a comprehensive and up-to-the-minute resource for anyone interested in signal transduction. This Gateway represents a unique collaboration between the University of California San Diego (UCSD) and Nature Publishing Group and is designed to facilitate navigation of the complex world of research into cellular signaling. Information and data presented here are freely available to all. It is powered by the San Diego Supercomputer Center (SDSC).
The U.S. Census Grids provide raster data sets that include not only population and housing counts, but a wide variety of socioeconomic characteristics. These gridded data sets transform irregularly shaped census block and block group boundaries into a regular surface – a raster grid – for faster and easier analysis.