bookmarks  2

  •  

    This project aims to develop an efficient rule based extractor of entries of references, located in scientific articles in English language. The application takes a pdf file or a directory of pdf and then returns an html file, containing the list of all entries with their respective title. Moreover the title of the article cited is searched through Google Web Service to get the URL that identifying the article on the web. If the URL provides on the page a Bibtex entry, this will appear in the html output under the relative entries, stolen from some typical site like citeseer, ieeexlpore etc. The application does not make search over pdf file based on images.
    15 years ago by @pitman
    (0)
     
     
  •  

    Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. For more information about Tika, please see the list of supported document formats and the available documentation . You can find the latest release on the download page . See the Getting Started guide for instructions on how to start using Tika. Tika is a subproject of Apache Lucene . Lucene is a project of the Apache Software Foundation .
    15 years ago by @pitman
    (0)
     
     
  • ⟨⟨
  • 1
  • ⟩⟩

publications  

    No matching posts.
  • ⟨⟨
  • ⟩⟩