Our new term extraction service analyzes text and an optional query, returning a list of the key concepts from the text. You can use the service for a variety of different purposes. For example, Y!Q uses it to determine key concepts within the search context and then uses those terms for augmenting a user's search query.
So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani. Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000 large grant = $50,000 nice place to have a meeting: Snowbird, Utah, French Alps nice place to have a meeting:Las Vegas in August
Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties: * Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. * Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically * Extensibility.
SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity
CiteSeerX - Document Details (Isaac Councill, Lee Giles): The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the supportvector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.
I posted an earlier version of this data mining blog list in a previously on DMR. Here is an updated version (blogs recently added to the list have the logo
Facebook's f8 conference is shaping up to have quite a few improvements in store for developers, and we think we've come across another one: a change to Facebook's data retention policy. Yesterday, Facebook employee Monica Keller (who left MySpace to join the company last month), took part in a conversation on Twitter that seemed to indicate that developers may no longer have to delete user data. The possible change came to light after Gnip CEO Eric Marcoullier gently chided Keller about developers being unable to store any user data, to which she responded,
They focus on patterns of connectivity and self-organizing behavior in economic and social networks and how these new structures lead to resilience, adaptability, agility, transparency, and innovation.
Eureqa is a software tool for detecting equations and hidden mathematical relationships in your data. Its primary goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use
The power of open-source development turns RapidMiner into one of the most widely used data mining and predictive analysis solutions world-wide. On the other hand, the company Rapid-I controls and guides the development and ensures that RapidMiner also is an enterprise-capable solution. Learn more about how the Enterprise Edition of RapidMiner can help you to get better results in faster times.
The Laboratory for Advanced Computing develops technologies for high performance computing, high performance networking, internet computing, data mining and related areas.
The underlying principle to the minimum description length principle of which there is a famouse criticizm of the theorem in the film Contact with Jodie Foster:
The easiest answer may not always be the correct/most appropriate one. :)
Textpresso is a text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc). A search engine enables the user to search for one or a combination of these categories and/or keywords within an entire literature.
Catcher in the what?\n\nSomething that interests me greatly is the relationship between inherent value and social value. For example, a book (let's say one of those Harry Something-or-other books) has some inherent value, but it also has social value. This social value exists in its function in our social networks. Did you read it? I did! Me too! Let's strengthen our bond! The relationship between these two value systems gets more interesting when they diverge. It is possible for something to be not really that good (i.e. having low inherent value) but function with high social value.
With Google Insights for Search, you can compare search volume patterns across specific regions, categories, time frames and properties. See examples of how you can use Google Insights for Search.
R. Agrawal, S. Gollapudi, A. Kannan, and K. Kenthapadi. Proceedings of the 20th International Conference Companion on World Wide Web, page 483--492. New York, NY, USA, ACM, (2011)
M. Houbraken, C. Sun, E. Smirnov, and K. Driessens. Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, page 147--152. New York, NY, USA, ACM, (2017)
B. Bullock, H. Lerch, A. Roßnagel, A. Hotho, and G. Stumme. Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies, page 15:1--15:8. New York, NY, USA, ACM, (2011)