Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clusteri
Text mining and web scraping involves chunk parsing and recognition of named entities (institutions, dates, titles)...The extraction of named entities is mostly based on a strategy that combines look up in gazetteers (lists of companies, cities, etc.) wit
The semantic web must "explain the meaning of words" to computers. Some semantic technologies use a "bottom up" by embedding semantic annotations (metadata) into web content. "Top down" technologies analyze information without metadata using some form of
It is important to differentiate between text data mining and information access (or information retrieval, as it is more widely known)... the goal of data mining is to discover or derive new information from data, finding patterns across datasets, and/o
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and
The 12 winners in Microsoft Live Labs “Accelerating Search in Academic Research” are part of a quest to identify bold and innovative approaches to information retrieval, data mining, machine learning and human/computer interactions. Here they (and the
Yesterday, I had dinner with two people from yet another startup that uses tagging and collaborative filtering in the same sentence. So are tags and collaborative filtering a marriage made in heaven? It's a promising approach, but there are challenges in
Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At
FOAF facilitates the creation of the Semantic Web equivalent of the archetypal personal homepage: My name is Leigh, this is a picture of me, I'm interested in XML, and here are some links to my friends. And just like the HTML version, FOAF documents can b
It used to be you had to get a warrant to monitor a person or a group of people. Today, it is increasingly easy to monitor ideas. And then track them back to people. Most of us don't have access to the databases, software, or computing power of the NSA, F
When you pay attention to something (and when you ignore something), data is created. This “attention data” is a valuable resource that reflects your interests, your activities and your values, and it serves as a proxy for your attention. To capture t
Google Zeitgeist reports on a compilation of searches and queries over time, space, and attention; these snapshots reveal a bit of the human condition. (Zeitgeist: intellectual, moral, cultural climate of an era)
Hyperlinking is the foundation of the web. As users add new content, and new sites, it is bound in to the structure of the web by other users discovering the content and linking to it. Much as synapses form in the brain, with associations becoming stronge
Internet singularity? Microsoft's Gary Flake, launching Live Labs, says it's "a deeper and tighter coupling between the online and offline worlds," while a blog commenter says, "There can be no Web 2.0 Singularity in the enterprise until organizations ta