Elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit) is an open source library for machine learning licensed under the Mozilla Public License (MPL). We develop an open source machine learning toolkit which provides
algorithms for machine learning utilising the power of multi-core/multi-threaded processors/operating systems (Linux, WIndows, Mac OS X),
a graphical user interface for users who want to quickly prototype machine learning experiments,
tutorials to support learning about Statistical Machine Learning (Statistical Machine Learning at The Australian National University), and
detailed and precise documentation for each of the above.
The workshop aims to discuss key issues and practices of semantic mining. Thanks to the initiatives of the Linked Open Data and robust techniques for semantic annotation of Web, social, and sensor data, more semantic data is available. Many research efforts have been directed toward demonstrating semantic techniques to analyze and mine this growing resource. The workshop will provide a cross-disciplinary forum for researchers to showcase their innovation and efforts, and to further enhance existing bounds and create new connections among different communities. Here we solicit contributions on researches and practices of mining data semantics including theory, algorithms, and applications from computer science, life science, healthcare and other domains.
A great deal of research has focused on algorithms for learning features from un- labeled data. Indeed, much progress has been made on benchmark datasets like NORB and CIFAR by employing increasingly complex unsupervised learning al- gorithms and deep models. In this paper, however, we show that several very sim- ple factors, such as the number of hidden nodes in the model, may be as important to achieving high performance as the choice of learning algorithm or the depth of the model. Specifically, we will apply several off-the-shelf feature learning al- gorithms (sparse auto-encoders, sparse RBMs and K-means clustering, Gaussian mixtures) to NORB and CIFAR datasets using only single-layer networks. We then present a detailed analysis of the effect of changes in the model setup: the receptive field size, number of hidden nodes (features), the step-size (“stride”) be- tween extracted features, and the effect of whitening. Our results show that large numbers of hidden nodes and dense feature extraction are as critical to achieving high performance as the choice of algorithm itself—so critical, in fact, that when these parameters are pushed to their limits, we are able to achieve state-of-the- art performance on both CIFAR and NORB using only a single layer of features. More surprisingly, our best performance is based on K-means clustering, which is extremely fast, has no hyper-parameters to tune beyond the model structure it- self, and is very easy implement. Despite the simplicity of our system, we achieve performance beyond all previously published results on the CIFAR-10 and NORB datasets (79.6% and 97.0% accuracy respectively).
HEigen is a spectral analysis tool which computes top k eigenvalues and corresponding eigenvectors of extremely large(~billions of nodes and edges) graphs. HEigen runs on top of Hadoop platform.
Atom Interface is a novel interactive visualization of single/multiple tree structures. It is based on the metaphor of electrons, atoms and molecules. For mo...