copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Towards domain-independent information extraction from web tables

W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krüpl, and B. Pollak. Proceedings of the 16th international conference on World Wide Web, page 71--80. New York, NY, USA, ACM, (2007)
DOI: 10.1145/1242572.1242583

Abstract

Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A multitude of different HTML implementations of web tables make these approaches difficult to scale. In this paper, we approach the problem of domain-independent information extraction from web tables by shifting our attention from the tree-based representation of webpages to a variation of the two-dimensional visual box model used by web browsers to display the information on the screen. The there by obtained topological and style information allows us to fill the gap created by missing domain-specific knowledge about content and table templates. We believe that, in a future step, this approach can become the basis for a new way of large-scale knowledge acquisition from the current "Visual Web.

Links and resources

BibTeX key: gatterbauer2007towards
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 16th international conference on World Wide Web
year: 2007
pages: 71--80
publisher: ACM
series: WWW '07
timestamp: 2012-09-20 02:56:03
username: porta
intrahash: a5be13781838c20be5ec3bc4ad72556b
location: Banff, Alberta, Canada
acmid: 1242583
isbn: 978-1-59593-654-7
interhash: 61bd631988fe5a7495e3c54586d794f9
numpages: 10
groups: public
DOI: 10.1145/1242572.1242583
url: http://doi.acm.org/10.1145/1242572.1242583

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Towards domain-independent information extraction from web tables

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Towards domain-independent information extraction from web tables

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Towards domain-independent information extraction from web tables

Comments and Reviews
(0)