Autojar dient dazu, jar-Archive minimaler Größe aus unterschiedlichen Qellen (eigenen Klassen, Verzeichnissen, Archiven) zu erzeugen. Ausgehend von einer oder mehreren Klassen wird der Bytecode rekursiv nach weiteren benutzten Klassen durchsucht; diese werden ggf. aus ihrem Archiv extrahiert und in die Ausgabedatei übernommen. Das resultierende Archiv enthält alle tatsächlich benutzten Klassen, und nur diese. Somit lassen sich z.B. Größe und Ladezeit von Applets klein halten oder Applikationen unabhängig von installierten Bibliotheken machen.
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
The PDF format is one of the most common eBook types that you are likely to come across on the Internet. While OS X supports the reading of PDF files using the application Preview, and Apple’s portable devices now have
Extract, Transform, and Load (ETL) is a process in data warehousing that involves
* extracting data from outside sources,
* transforming it to fit business needs (which can include quality levels), and ultimately
* loading it into the end target, i.e. the data warehouse.
E. Rauch, M. Bukatin, and K. Baker. Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1, page 50--54. Stroudsburg, PA, USA, Association for Computational Linguistics, (2003)