- How to remove ._ files when creating tarballs on OS X.
- If you install macports you can install gcc select, and then choose your gcc version. /opt/local/bin/port install gcc_select To see your versions use port select --list gcc To select a version use sudo port select --set gcc gcc40
- The following article will describe how to configure a CentOS 5.x-based or Centos 6.x-based system to use Fedora Epel repos and third party remi package repos. These package repositories are not officially supported by CentOS, but they provide much more current versions of popular applications like PHP or MYSQL.
- bag of words document data sets
- Elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit) is an open source library for machine learning licensed under the Mozilla Public License (MPL). We develop an open source machine learning toolkit which provides
- A variety way of countings bits fast.
- Using GCC ASM syntax for SIMD instructions.
- Discussion about throughput of SIMD XOR instructions on x86.
- The FOSS in Research and Student Innovation Miniconf brings together researchers and students with an active interest in Free and Open Source Software with the broader Linux.conf.au community to highlight exciting work taking place within the often esoteric world of academia and educational institutions. The Miniconf is part of Linux.conf.au 2011, being held at the QUT Gardens Point Campus in Brisbane, Queensland in January. Topics are split into two streams: FOSS in Research, which invites presentations on research relating to Free and Open Source Software; and Student Innovation, which explores new and exciting work in the FOSS world conducted by students. Presentations may be proposed in a 25-minute talk format (20 minutes talk + 5 minutes discussion).
- information on comparing floats
- This website provides tutorials and sample course content so CS students and educators can learn more about current computing technologies and paradigms. In particular, this content is Creative Commons licensed which makes it easy for CS educators to use in their own classes. The Courses section contains tutorials, lecture slides, and problem sets for a variety of topic areas: AJAX Programming Algorithms Distributed Systems Web Security Languages In the Tools 101 section, you will find a set of introductions to some common tools used in Computer Science such as version control systems and databases. The CS Curriculum Search will help you find teaching materials that have been published to the web by faculty from CS departments around the world. You can refine your search to display just lectures, assignments or reference materials for a set of courses.
- This is a "tree of all knowledge" category, a top-level place to start when browsing Wikipedia categories for articles. This is the top level in terms of encyclopedia article function and content. It is intended to contain all and only the few most fundamental ontological categories which can reasonably be expected to contain every possible Wikipedia article under their category trees. These categories are: physical entities; biological entities; social entities; and intellectual entities. An alternative root category, based on a somewhat more detailed initial classification, is Category:Main topic classifications.
- This is a list of Wikipedia's major topic classifications. These are used throughout Wikipedia to organize the presentation of links to articles on its various reference systems, including Wikipedia's lists, portals, and categories.
- Wikipedia is a terrific knowledge resource, and many recent studies in artificial intelligence, information retrieval and related fields used Wikipedia to endow computers with (some) human knowledge. Wikipedia dumps are publicly available in XML format, but they have a few shortcomings. First, they contain a lot of information that is often not used when Wikipedia texts are used as knowledge (e.g., ids of users who changed each article, timestamps of article modifications). On the other hand, the XML dumps do not contain a lot of useful information that could be inferred from the dump, such as link tables, category hierarchy, resolution of redirection links etc.
- Free and Open Source Software (FOSS) for Sun Microsystem's Solaris
- THE ENGINEER'S ULTIMATE GUIDE TO WAVELET ANALYSIS: The Wavelet Tutorial
- Commonly used commands from the vim text editor.
- BitC is a new systems programming language. It seeks to combine the flexibility, safety, and richness of Standard ML or Haskell with the low-level expressiveness of C.
- The GWT Window Manager provides a high level windowing system for the GWT applications. It offers a desktop component, dialog features , free floating windows and more. Try it by yourself and feel free to use it, it's free!
- Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as social network analysis, information retrieval, recommendation system, biology applications, multimedia, and e-commerce. The objective of this special issue is to connect academia and industry on the methods and experiences of large scale data analysis. We look for scalable machine learning, data mining algorithms, implementations, frameworks and case studies that target at real and practical scenarios for large datasets. The focus is to identify the real challenges in large-scale data mining and to investigate the scalable methods and practical solutions of the core machine learning and data mining problems with respect to both theoretical and experimental perspectives.
- The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric
- Here you have a DBSCAN code implemented in C++, boost and stl
- Consensus clustering has emerged as an important elaboration of the classical clustering problem. Consensus clustering, also called aggregation of clustering (or partitions), refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete.
- OnlyWire syndicates your content and articles to the web's top social networking sites with a single button click. The OnlyWire Bookmark & Share button gives your website and blog visitors the ability to post your content to all of their social networking sites.
- Information on convex hull algorithms, including algorithms for high dimensional spaces.
- Broadly speaking, there are two no free lunch theorems. One for supervised machine learning and one for search/optimization.
- Pseudocode package for LaTeX to format it the style of Introduction to Algorithms text book.
- Categories are pages that are used to group other pages on similar subjects together. This is done to help users find the pages they are looking for, even if they do not know whether it exists or what it is called. Every page should belong to at least one category. A page may often be in several categories. However, putting a page in too many categories may not be useful.
- Python documentation for os.path
- How do you write standalone scripts which make use of Django components?
- Résumé, Curriculum Vitae or simply CV is an important brief about your professional life. It is likely to be one of the first contacts with a prospective employer. Curriculum Vitae means course of life in Latin. So what exactly should a Résumé contain and how detailed should it be? There is no silver bullet answer. ...
- MegaMap is a Java implementation of a map (or hashtable) that can store an unbounded amount of data, limited only by the amount of disk space available. Objects stored in the map are persisted to disk. Good performance is achieved by an in-memory cache. The MegaMap can, for all practical reasons, be thought of as a map implementation with unlimited storage space.
- EM has been shown to have favorable convergence properties, automatical satisfaction of constraints, and fast convergence. The next section explains the traditional approach to deriving the EM algorithm and proving its convergence property. Section 3.3 covers the interpretion the EM algorithm as the maximization of two quantities: the entropy and the expectation of complete-data likelihood. Then, the K-means algorithm and the EM algorithm are compared. The conditions under which the EM algorithm is reduced to the K-means are also explained. The discussion in Section 3.4 generalizes the EM algorithm described in Sections 3.2 and 3.3 to problems with partial-data and hidden-state. We refer to this new type of EM as the doubly stochastic EM. Finally, the chapter is concluded in Section 3.5.
- In a recent piece called Strong Typing vs. Strong Testing, noted programmer and author Bruce Eckel makes an argument that dynamically typed languages such as Python are superior to statically typed languages such as Java and C++. I've done quite a bit of Python and Java programming, and even a little C++, so I can appreciate his position, but I think the conclusion goes too far. Whether Python is more productive than C++ or Java is one thing, whether static typing in general should be abandoned is quite another.
- MogileFS is our open source distributed filesystem.
- This is a guide to the LaTeX markup language. It is intended that this can serve as a useful resource for everyone from new users who wish to learn, to old hands who need a quick reference.
- The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs based on simulation of (stochastic) flow in graphs.
- flexbackup is for you if you have a single or small number of machines, amanda is "too much", and tarring things up by hand isn't nearly enough...
- JCublas is providing Java bindings for the NVIDIA CUDA BLAS implementation, thus making the parallel processing power of modern graphics hardware available for Java programs.
- In mathematics and physics, a small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps. A small world network, where nodes represent people and edges connect people that know each other, captures the small world phenomenon of strangers being linked by a mutual acquaintance.
- The Community Z Tools (CZT) project is building a set of tools for editing, typechecking and animating formal specifications written in the Z specification language, with some support for Z extensions such as Object-Z, Circus, and TCOZ. These tools are all built using the CZT Java framework for Z tools.
- This series of three talks will give a nontechnical, high level overview of geometric complexity theory (GCT), which is an approach to the P vs. NP problem via algebraic geometry, representation theory, and the theory of a new class of quantum groups, called nonstandard quantum groups, that arise in this approach.
- A list of cluster papers. Includes some links to source code.
- Locality-Sensitive Hashing (LSH) is an algorithm for solving the (approximate/exact) Near Neighbor Search in high dimensional spaces.
- Delta Debugging automates the scientific method of debugging. The Delta Debugging algorithm isolates failure causes automatically - by systematically narrowing down failure-inducing circumstances until a minimal set remains.
- Netlib is a collection of mission-critical software components for linear algebra systems (i.e. working with vectors or matrices). Netlib libraries are written in C, Fortran or optimised assembly code. A Java translation has been provided by the F2J project but it does not take advantage of optimised system libraries.
- If two numbers b and c have the property that their difference b-c is integrally divisible by a number m (i.e., (b-c)/m is an integer), then b and c are said to be "congruent modulo m."
- Welcome to OSDev.org, the largest online community of operating system developers. If you want to learn how to write your own OS we have all the information to get you started. Read our OS development wiki to learn where to start. The forums are a great place to discuss OS theory and ask for help when you get stuck. Don't forget to add a link on the OS List to your OS project once it gets going.
- Compare k-means and PAM. PAM is also known as k-medoids.
- A list of open source collection libraries for Java.
- fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; it also includes a fast I/O API for binary and text files. It is free software distributed under the GNU Lesser General Public License.
- Useful bullet points on different types of clustering.
- The notes below apply to technical papers in computer science and electrical engineering, with emphasis on papers in systems and networks.
- In mathematical logic, Gödel's incompleteness theorems, proved by Kurt Gödel in 1931, are two theorems stating inherent limitations of all but the most trivial formal systems for arithmetic of mathematical interest. The theorems are of considerable importance to the philosophy of mathematics. They are widely regarded as showing that Hilbert's program to find a complete and consistent set of axioms for all of mathematics is impossible, thus giving a negative answer to Hilbert's second problem.
- While still primarily a research project, transactional memory shows promise for making parallel programming easier.
- Comparison of naive bayes classifiers, support vector machines and modular multilayer perceptron neural networks.
- The goal of this book is to provide practical information on how to gain the largest possible benefit from your connection to the Internet. By applying the monitoring and optimisation techniques discussed here, the effectiveness of your network can be significantly improved.
- Excellence of any sort--excellent dancing, excellent quarterbacking, excellent woodworking--has no waste. You fix wordy writing by doing the same job using fewer words.
- We've all heard of 'six degrees of separation', the idea that everyone in the world can be connected in just a few steps. But what if those steps don't just relate to people but also to viruses, neurons, proteins and even to fashion trends? What if this 'six degrees of separation' allowed us an insight into something at the core of Nature?
- This tool performs spectral clustering using either sparse similarity matrix (nearest neighbors) or the Nystrom method.

*Evaluation of cross-language information retrieval systems,**page 143--170.**Springer,*(*2002*)*Kluwer Academic Publishers,*(*2003*)- (
*2010*)*cite arxiv:1011.5270.* *Library trends**52 (4): 748--764*(*2004*)*MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC,*(*2010*)*Statistical Science**16 (3): 199--215*(*2001*)*British Journal of Mathematical and Statistical Psychology**59 (1): 1--34*(*2006*)*ACM Trans. Inf. Syst.*(*December 2008*)*SIGIR Forum*(*June 2009*)*Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,**page 11--18.**New York, NY, USA,**ACM,*(*2006*)*Proceedings of the 2010 ACM Symposium on Applied Computing,**page 1708--1712.**New York, NY, USA,**ACM,*(*2010*)*Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2,**page 79--82.**Morristown, NJ, USA,**Association for Computational Linguistics,*(*2003*)*Proceeding of the 18th ACM conference on Information and knowledge management,**page 1097--1106.**New York, NY, USA,**ACM,*(*2009*)- (
*2007*) *Physica A: Statistical Mechanics and its Applications**302 (1-4): 70--79*(*2001*)*IEEE Transactions on Knowledge and Data Engineering**19 (8): 1026-1041*(*2007*)*CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management,**page 600--607.**New York, NY, USA,**ACM,*(*2002*)*Large-Scale Distributed Systems for Information Retrieval*(*2010*)*Information Processing & Management**24 (5): 577--597*(*1988*)*Information processing & management**38 (4): 559--582*(*2002*)