T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.
With proper mark-up/logic separation, a POJO data model, and a refreshing lack of XML, Apache Wicket makes developing web-apps simple and enjoyable again.
Markup Language for Temporal and Event Expressions - TimeML is a robust specification language for events and temporal expressions in natural language.
Protégé is a free, open source ontology editor and knowledge-base framework.
The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema.
Protégé is based on Java, is extensible, and provides a plug-and-play environment that makes it a flexible base for rapid prototyping and application development.
The OntoLT approach aims at a more direct connection between ontology engineering and linguistic analysis. OntoLT is a Protégé plug-in, with which concepts (Protégé classes) and relations (Protégé slots) can be extracted automatically from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé.
This workshop will gather researchers in a variety of fields that contribute to the automated construction of knowledge bases. It will be held at Xerox Research Centre Europe, near Grenoble (France), May 17-19, 2010.
andLinux runs Linux natively inside Windows. It is a complete Ubuntu Linux system running seamlessly in Windows 2000 based systems (2000, XP, 2003, Vista, 7; 32-bit versions only).
qooxdoo is a comprehensive and innovative framework for creating rich internet applications (RIAs). Leveraging object-oriented JavaScript allows developers to build impressive cross-browser applications. No HTML, CSS nor DOM knowledge is needed.
Our goal is to develop a probabilistic knowledge base that mirrors the content of the web. We are developing a system that uses semi-supervised learning methods to learn to extract symbolic knowledge from unstructured text and HTML. We are exploring methods of continous learning, where our system runs 24x7, continuously learning to read better, and continuously extracting facts from the web.
ConceptNet represents data in the form of a semantic network, and makes it available to be used in natural language processing and intelligent user interfaces.
MegaMap is a Java implementation of a map (or hashtable) that can store an unbounded amount of data, limited only by the amount of disk space available. Objects stored in the map are persisted to disk. Good performance is achieved by an in-memory cache. The MegaMap can, for all practical reasons, be thought of as a map implementation with unlimited storage space.
Cibyl is a programming environment and binary translator that allows compiled C programs to execute on J2ME-capable phones. Cibyl uses GCC to compile the C programs to MIPS binaries, and these are then recompiled into Java bytecode.
NestedVM provides binary translation for Java Bytecode. This is done by having GCC compile to a MIPS binary which is then translated to a Java class file. Hence any application written in C, C++, Fortran, or any other language supported by GCC can be run in 100% pure Java with no source changes.
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
CLEANEVAL is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus, for linguistic and language technology research and development.
Emacs is the extensible, customizable, self-documenting real-time display editor. This Info file describes how to edit with Emacs and some of how to customize it; it corresponds to GNU Emacs version 23.1.
Diese DVD-ROM der Deutschen Nationalbibliothek enthält sowohl die Personennamendatei (PND) als auch die Schlagwortnormdatei (SWD) und die Gemeinsame Körperschaftsdatei (GKD) und ist direkt über die Deutsche Nationalbibliothek zu beziehen.
SmartGWT is a GWT based framework that allows you to not only utilize its comprehensive widget library for your application UI, but also tie these widgets in with your server-side for data management. SmartGWT is based on the powerful and mature SmartClient library.
Joda-Time provides a quality replacement for the Java date and time classes. The design allows for multiple calendar systems, while still providing a simple API. The 'default' calendar is the ISO8601 standard which is used by XML. The Gregorian, Julian, Buddhist, Coptic, Ethiopic and Islamic systems are also included, and we welcome further additions. Supporting classes include time zone, duration, format and parsing.
The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format, and Office OpenXML format, using pure Java. In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java.
PojoCache is an in-memory, transactional, and replicated POJO (plain old Java object) cache system that allows users to operate on a POJO transparently without active user management of either replication or persistency aspects. This tutorial focuses on the usage of the PojoCache API.
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
This is an overview of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.
F. Reichartz, H. Korte, and G. Paass. KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, page 773--782. New York, NY, USA, ACM, (2010)
F. Suchanek, G. Ifrim, and G. Weikum. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), page 712--717. New York, NY, USA, ACM, (2006)
P. Pantel, D. Ravichandran, and E. Hovy. Proceedings of the 20th international conference on Computational Linguistics (COLING-04), page 771--777. Geneva, Switzerland, Association for Computational Linguistics, (2004)
A. Carlson, J. Betteridge, R. Wang, E. Jr., and T. Mitchell. WSDM '10: Proceedings of the third ACM international conference on Web search and data mining, page 101--110. New York, NY, USA, ACM, (2010)
D. Downey, M. Broadhead, and O. Etzioni. Proc. of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India, (January 2007)
P. Pantel, and M. Pennacchiotti. Ontology Learning and Population: Bridging the Gap between Text and Knowledge, volume 167 of Frontiers in Artificial Intelligence and Applications, IOS Press, (2008)
P. Pantel, and M. Pennacchiotti. Proc. of the International Conference on Computational Linguistics/Association, page 113-120. Sydney, Australia, ACL Press, (17th-21st July 2006)
E. Riloff, C. Schafer, and D. Yarowsky. Proceedings of the 19th international conference on Computational linguistics, page 1--7. Morristown, NJ, USA, Association for Computational Linguistics, (2002)
E. Riloff, and R. Jones. AAAI '99/IAAI '99: Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, page 474--479. Menlo Park, CA, USA, American Association for Artificial Intelligence, (1999)
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, page 1003--1011. Suntec, Singapore, Association for Computational Linguistics, (August 2009)
F. Reichartz, H. Korte, and G. Paass. Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, page 365--368. Suntec, Singapore, Association for Computational Linguistics, (August 2009)
H. Isozaki, and H. Kazawa. Proceedings of the 19th international conference on Computational linguistics, page 1--7. Morristown, NJ, USA, Association for Computational Linguistics, (2002)
B. Zapirain, E. Agirre, and L. Màrquez. Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, page 73--76. Suntec, Singapore, Association for Computational Linguistics, (August 2009)