Build Real-time Big Data Applications on Apache HBase. Open Source. Apache 2.0 Licensed. Gives a natural, intuitive toolkit for predictive modeling with machine learning library.
jWebSocket is a pure Java/JavaScript high speed bidirectional communication solution for the Web - secure, reliable and fast. Provides easy integration into existing Tomcat web applications.
Tangle is a JavaScript library for creating reactive documents. Your readers can interactively explore possibilities, play with parameters, and see the document update immediately. Tangle is super-simple and easy to learn.
sigma.js is an open-source lightweight JavaScript library to draw graphs, using the HTML canvas element. It has been especially designed to display interactively static graphs exported from a graph visualization software like Gephi and to display dynamically graphs that are generated on the fly.
An extendible and configurable PDF manipulation layer. It is a ready to use java library to perform PDF document manipulation without having to deal with the low level API.
The Mozenda Scraper provides web data extraction software, Web Screen Scraping tools that makes it easy to capture nearly any content from the web. See how you can start getting data from the web in minutes.
An example of a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second in less than a page of python code.
We are a community of motherfucking programmers who have been humiliated by software development methodologies for years. We are tired of XP, Scrum, Kanban, Waterfall, Software Craftsmanship (aka XP-Lite) and anything else getting in the way of...Programming, Motherfucker.
JADE (Java Agent DEvelopment Framework) is a software Framework fully implemented in Java language. It simplifies the implementation of multi-agent systems through a middle-ware that complies with the FIPA specifications and through a set of graphical tools that supports the debugging and deployment phases
RequireJS is a JavaScript file and module loader. It is optimized for in-browser use, but it can be used in other JavaScript environments, like Rhino and Node. Using a modular script loader like RequireJS will improve the speed and quality of your code.
Despite the many JavaScript libraries that are available today, I cannot find one that makes it easy to add keyboard shortcuts(or accelerators) to your javascript app. This is because keyboard shortcuts where only used in JavaScript games - no serious web application used keyboard shortcuts to navigate around its interface. But Google apps like Google Reader and Gmail changed that. So, I have created a function to make adding shortcuts to your application much easier.
Source code to repeat the paper evaluation: We present the first unsupervised approach to the problem of learning a semantic parser, using Markov logic. Our USP system transforms dependency trees into quasi-logical forms, recursively induces lambda forms from these, and clusters them to abstract away syntactic variations of the same meaning. The MAP semantic parse of a sentence is obtained by recursively assigning its parts to lambda-form clusters and composing them. We evaluate our approach by using it to extract a knowledge base from biomedical abstracts and answer questions. USP substantially outperforms TextRunner, DIRT and an informed baseline on both precision and recall on this task.
Lately I’ve been working on evaluating and comparing algorithms, capable of extracting useful content from arbitrary html documents. I have made a feature wise comparison of related software and APIs.
Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop.
AWS Elastic Beanstalk is an even easier way for developers to quickly deploy and manage applications in the AWS cloud without having to worry about the physical infrastructure or the resource configuration that accompanies setting up that infrastructure. You simply upload your application and AWS Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring, while allowing you to change configuration settings and deploy new versions.
Trying to find a name for a company, project, algorithm, product? Acronym Creator helps you generate a name that is an acronym or abbreviation. With this acronym builder, abbreviation maker, name generator, label finder - whatever you call it - you can make your own acronyms and have fun!
The professional, open source development tool for the open web. Develop and test your entire web application using a single environment. With support for the latest browser technology specs such as HTML5, CSS3 and JavaScript; and Ruby, Rails, PHP & Python on the server side. We've got you covered!
Node.js provides a its own assert module with some really useful functions for creating basic tests. However, the reporting and running of these assertions can become complicated, especially with asynchronous code. How can you be sure that all assertions ran? Or that they ran in the correct order? This is where nodeunit comes in, a tool for defining and running unit tests in the simplest way possible.
The Javatools are a collection of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team.
A simple Web server with only 200 lines of C source code. In this article, Nigel Griffiths provides a copy of this Web server and includes the source code as well. You can see exactly what it can and can't do.
T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.
With proper mark-up/logic separation, a POJO data model, and a refreshing lack of XML, Apache Wicket makes developing web-apps simple and enjoyable again.
Markup Language for Temporal and Event Expressions - TimeML is a robust specification language for events and temporal expressions in natural language.
Protégé is a free, open source ontology editor and knowledge-base framework.
The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema.
Protégé is based on Java, is extensible, and provides a plug-and-play environment that makes it a flexible base for rapid prototyping and application development.
The OntoLT approach aims at a more direct connection between ontology engineering and linguistic analysis. OntoLT is a Protégé plug-in, with which concepts (Protégé classes) and relations (Protégé slots) can be extracted automatically from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé.
This workshop will gather researchers in a variety of fields that contribute to the automated construction of knowledge bases. It will be held at Xerox Research Centre Europe, near Grenoble (France), May 17-19, 2010.
andLinux runs Linux natively inside Windows. It is a complete Ubuntu Linux system running seamlessly in Windows 2000 based systems (2000, XP, 2003, Vista, 7; 32-bit versions only).
qooxdoo is a comprehensive and innovative framework for creating rich internet applications (RIAs). Leveraging object-oriented JavaScript allows developers to build impressive cross-browser applications. No HTML, CSS nor DOM knowledge is needed.
Our goal is to develop a probabilistic knowledge base that mirrors the content of the web. We are developing a system that uses semi-supervised learning methods to learn to extract symbolic knowledge from unstructured text and HTML. We are exploring methods of continous learning, where our system runs 24x7, continuously learning to read better, and continuously extracting facts from the web.
ConceptNet represents data in the form of a semantic network, and makes it available to be used in natural language processing and intelligent user interfaces.
MegaMap is a Java implementation of a map (or hashtable) that can store an unbounded amount of data, limited only by the amount of disk space available. Objects stored in the map are persisted to disk. Good performance is achieved by an in-memory cache. The MegaMap can, for all practical reasons, be thought of as a map implementation with unlimited storage space.
Cibyl is a programming environment and binary translator that allows compiled C programs to execute on J2ME-capable phones. Cibyl uses GCC to compile the C programs to MIPS binaries, and these are then recompiled into Java bytecode.
NestedVM provides binary translation for Java Bytecode. This is done by having GCC compile to a MIPS binary which is then translated to a Java class file. Hence any application written in C, C++, Fortran, or any other language supported by GCC can be run in 100% pure Java with no source changes.
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
CLEANEVAL is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus, for linguistic and language technology research and development.
Emacs is the extensible, customizable, self-documenting real-time display editor. This Info file describes how to edit with Emacs and some of how to customize it; it corresponds to GNU Emacs version 23.1.
Diese DVD-ROM der Deutschen Nationalbibliothek enthält sowohl die Personennamendatei (PND) als auch die Schlagwortnormdatei (SWD) und die Gemeinsame Körperschaftsdatei (GKD) und ist direkt über die Deutsche Nationalbibliothek zu beziehen.
SmartGWT is a GWT based framework that allows you to not only utilize its comprehensive widget library for your application UI, but also tie these widgets in with your server-side for data management. SmartGWT is based on the powerful and mature SmartClient library.
Joda-Time provides a quality replacement for the Java date and time classes. The design allows for multiple calendar systems, while still providing a simple API. The 'default' calendar is the ISO8601 standard which is used by XML. The Gregorian, Julian, Buddhist, Coptic, Ethiopic and Islamic systems are also included, and we welcome further additions. Supporting classes include time zone, duration, format and parsing.
The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format, and Office OpenXML format, using pure Java. In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java.
PojoCache is an in-memory, transactional, and replicated POJO (plain old Java object) cache system that allows users to operate on a POJO transparently without active user management of either replication or persistency aspects. This tutorial focuses on the usage of the PojoCache API.
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
This is an overview of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.
Webstemmer is a web crawler and HTML layout analyzer that automatically extracts main text of a news site without having banners, ads and/or navigation links mixed up
Das Fußball Studio ist eine Freeware, mit der Fussball-Ligen und -Turniere verwaltet und ausgewertet werden können. Dazu die Bundesliga-Datenbank mit vollständigen Daten der 1. und 2. Bundesliga.
F. Reichartz, H. Korte, and G. Paass. KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, page 773--782. New York, NY, USA, ACM, (2010)
F. Suchanek, G. Ifrim, and G. Weikum. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), page 712--717. New York, NY, USA, ACM, (2006)
P. Pantel, D. Ravichandran, and E. Hovy. Proceedings of the 20th international conference on Computational Linguistics (COLING-04), page 771--777. Geneva, Switzerland, Association for Computational Linguistics, (2004)
A. Carlson, J. Betteridge, R. Wang, E. Jr., and T. Mitchell. WSDM '10: Proceedings of the third ACM international conference on Web search and data mining, page 101--110. New York, NY, USA, ACM, (2010)
D. Downey, M. Broadhead, and O. Etzioni. Proc. of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India, (January 2007)
P. Pantel, and M. Pennacchiotti. Ontology Learning and Population: Bridging the Gap between Text and Knowledge, volume 167 of Frontiers in Artificial Intelligence and Applications, IOS Press, (2008)
P. Pantel, and M. Pennacchiotti. Proc. of the International Conference on Computational Linguistics/Association, page 113-120. Sydney, Australia, ACL Press, (17th-21st July 2006)
E. Riloff, C. Schafer, and D. Yarowsky. Proceedings of the 19th international conference on Computational linguistics, page 1--7. Morristown, NJ, USA, Association for Computational Linguistics, (2002)
E. Riloff, and R. Jones. AAAI '99/IAAI '99: Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, page 474--479. Menlo Park, CA, USA, American Association for Artificial Intelligence, (1999)
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, page 1003--1011. Suntec, Singapore, Association for Computational Linguistics, (August 2009)
F. Reichartz, H. Korte, and G. Paass. Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, page 365--368. Suntec, Singapore, Association for Computational Linguistics, (August 2009)
H. Isozaki, and H. Kazawa. Proceedings of the 19th international conference on Computational linguistics, page 1--7. Morristown, NJ, USA, Association for Computational Linguistics, (2002)