ASV Toolbox is a modular collection of tools for the exploration of written language data. They work either on word lists or text and solve several linguistic classification and clustering tasks. The topics covered contain language detection, POS-tagging, base form reduction, named entity recognition, and terminology extraction.
The Calais web service automatically attaches rich semantic metadata to the content you submit. Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations,
This is an overview of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.
Alle Programme und Resourcen auf der Liste sind frei, d.h. kostenlos (für Forschungszwecke) verfügbar, auf deutschsprachige Texte anwendbar und sofort startklar, d.h. sie müssen nicht erst mit Hilfe von z.B. annotierten Korpora trainiert werden. Die Liste ist natürlich unvollständig (Stand 22.5.2007).
AI Related Ruby Extensions This page will maintain list of AI related extensions/modules/gems for the Ruby programming language. Please contact me if you know something I missed.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
The string2string library is an open-source tool that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes both traditional algorithmic solutions and recent advanced neural approaches to address various problems in pairwise string alignment, distance measurement, lexical and semantic search, and similarity analysis. Additionally, the library provides several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods.
OpenNLP is an organizational center for open source projects related to natural language processing. It hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.
TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file. It can process the TIGER corpus and any other corpus encoded in TIGER-XML. The underlying API specifies a Java object model for corpora encoded in TIGER-XML and provides methods for traversing syntax trees and accessing elements such as sentences, syntax graph nodes, and their attributes.
In October, I read a fascinating article on GQ.com about head injuries among former NFL players. Written by Jeanne Marie Laskas, the article was a forensic
Source code to repeat the paper evaluation: We present the first unsupervised approach to the problem of learning a semantic parser, using Markov logic. Our USP system transforms dependency trees into quasi-logical forms, recursively induces lambda forms from these, and clusters them to abstract away syntactic variations of the same meaning. The MAP semantic parse of a sentence is obtained by recursively assigning its parts to lambda-form clusters and composing them. We evaluate our approach by using it to extract a knowledge base from biomedical abstracts and answer questions. USP substantially outperforms TextRunner, DIRT and an informed baseline on both precision and recall on this task.