pitman > extraction

bookmarks (hide)2
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1pdftoref - Google Code
This project aims to develop an efficient rule based extractor of entries of references, located in scientific articles in English language. The application takes a pdf file or a directory of pdf and then returns an html file, containing the list of all entries with their respective title. Moreover the title of the article cited is searched through Google Web Service to get the URL that identifying the article on the web. If the URL provides on the page a Bibtex entry, this will appear in the html output under the relative entries, stolen from some typical site like citeseer, ieeexlpore etc. The application does not make search over pdf file based on images.
15 years ago by @pitman
show all tags
bibliography
extraction
pdf
reference
bibliographyextractionpdfreference
(0)
copydelete
- community post
- history of this post
1Apache Tika - Apache Tika
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. For more information about Tika, please see the list of supported document formats and the available documentation . You can find the latest release on the download page . See the Getting Started guide for instructions on how to start using Tika. Tika is a subproject of Apache Lucene . Lucene is a project of the Apache Software Foundation .
15 years ago by @pitman
show all tags
data
extraction
parsing
structured
dataextractionparsingstructured
(0)
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

No matching posts.

⟨⟨
⟨
⟩
⟩⟩

BibSonomy

bookmarks (hide)2
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1pdftoref - Google Code

1Apache Tika - Apache Tika

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

browse

related tags

concepts

tags

BibSonomy

bookmarks (hide)2 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

1pdftoref - Google Code

1Apache Tika - Apache Tika

publications (hide) displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

browse

related tags

concepts

tags

bookmarks (hide)2
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...