The Natural Programming Project is working on making programming languages and environments easier to learn, more effective, and less error prone. We are taking a human-centered approach, first studying how people perform their tasks and then designing languages and environments around people's natural tendencies. We focus on all kinds of programming, including professional programmers, novice programmers who are trying to learn to be experts, and end users, who program to support other jobs or hobbies, such as multimedia authoring, simulations, teaching, prototyping, and other activities supported by computing.
NGramJ is a Java based library containing two types of ngram based applications. It's major focus is to provide robust and state of the art language recognition.
Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 6900 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. SPICE aims to overcome both limitations by providing an interactive language creation and evaluation toolkit that allows everyone to develop speech processing models, to collect appropriate data for model building, and to evaluate the results enabling iterative improvements.
Stanford CoreNLP provides a set of natural language analysis tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract open-class relations between mentions, etc.
Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-pe