English translation of selected chapters of the WikiWord thesis "Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia" by Daniel Kinzler. Translation by the author.
My diploma thesis about a system to automatically build a multilingual thesaurus from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My research will hopefully help to make Wikipedia more accessible for automatic processing
the data here is useful for testing classification / clustering, and the accuracy of indexing techniques. However the datasets are too small to make claims about the efficiency of indexing.
The UDC Summary of around 2,000 classes has been online since October 2009 and can now be browsed in 10 languages here.
The UDC summary is fully aligned with the UDC MRF 2009 which is going to be released in the following months.This set is made available for free use under the Creative Commons Attribution Share Alike 3.0 license (CC-BY-SA).
L'origine du Mundaneum remonteà la fin du XIXème siècle. Créé à l’initiative de deux juristes belges,Paul OtletetHenri La Fontaine,le projet visait à rassembler l’ensemble des connaissances du mondeet à les classer selon le système deClassi
Collection of information about biodiversity compiled collaboratively by hundreds of expert and amateur contributors. Contains pictures, text, and other information for species living or extinct and the hierarchy of life, phylogeny and evolution.
Other articles where Thesaurus is discussed: library: Thesauri: A new use of the term thesaurus, now widespread, dates from the early 1950s in the work of H.P. Luhn, at International Business Machines Corporation (IBM), who was searching for a computer process that could create a list of authorized terms for the indexing…
"They are built to be human-usable (...) are targeted primarily for storage/retrieval of personal information and serendipitous discovery of group information . (...) The development communities for each are abuzz with ideas for exploiting the structure"
Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this paper we propose the type of Bayesian networks that we call the hierarchical Bayesian network (HBN) classifiers. We present algorithms for the construction of the HBN classifiers and test them on the Reuters text categorization test collection
"(...) tagging system is not "controlled" in this sense (...), but I'm wondering whether its web-scale nature can provide some benefit that one would not expect."
TIE is a project for application identification through network traffic analysis (aka Traffic Classification, Traffic Identification, etc.). We aim at building a common platform for the study and the development of traffic classification techniques by fostering collaboration among researchers and practitioners. TIE offers an open-source platform working as a multiple classifier system able to combine multiple classification techniques (implemented as separate plugins) and adopting different strategies of decision combination.
C. Hoede, and L. Zhang. Proceedings of the 9th International Conference on Conceptual Structures (ICCS 2001), volume 2120 of Lecture Notes in Computer Science, page 15-28. Springer, (2001)
J. Hopcroft, T. Lou, and J. Tang. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, page 1137--1146. New York, NY, USA, ACM, (2011)
S. Wu, J. Hofman, W. Mason, and D. Watts. Proceedings of the 20th international conference on World wide web, page 705--714. New York, NY, USA, ACM, (2011)
G. Krempl, D. Bodnar, and A. Hrubos. Advances in Intelligent Data Analysis XIV - 14th Int. Symposium, IDA 2015, St. Etienne, France, volume 9385 of Lecture Notes in Computer Science, page XXII--XXIII. Springer, (2015)
D. Shen, Z. Chen, Q. Yang, H. Zeng, B. Zhang, Y. Lu, and W. Ma. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, page 242--249. New York, NY, USA, ACM, (2004)
D. Shen, Z. Chen, Q. Yang, H. Zeng, B. Zhang, Y. Lu, and W. Ma. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, page 242--249. New York, NY, USA, ACM, (2004)