Inproceedings,

Efficiently incorporating user feedback into information extraction and integration programs

X. Chai, B. Vuong, A. Doan, and J. Naughton.
Proceedings of the 35th SIGMOD international conference on Management of data, page 87--100. New York, NY, USA, ACM, (2009)
DOI: 10.1145/1559845.1559857

1 + 7

Abstract

Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.

BibTeX key: chai2009efficiently
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 35th SIGMOD international conference on Management of data
year: 2009
pages: 87--100
publisher: ACM
location: Providence, Rhode Island, USA
acmid: 1559857
isbn: 978-1-60558-551-2
numpages: 14
DOI: 10.1145/1559845.1559857
url: http://doi.acm.org/10.1145/1559845.1559857

Users

Comments and Reviewsshow / hide

@jaeschke 12 years ago
The paper presents an information extraction (IE) approach that builds upon a declarative language (hlog, an extension of xlog, similar to Datalog) to describe the extraction rules. Like in the case of incremental updates of materialized views, user input updates the extracted data (not the rules!). The basic idea is to store together with the user's input the range in the input data where the affected tuples are coming from such that later, upon re-execution of the execution plan, the user feedback can be efficiently propagated. Very well written and interesting. Although I expected to find something more sophisticated (user feedback that directly modifies/generates rules) I must acknowledge that even this approach is complicated enough.
References
Bookmarks
deleting review

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Conference Paper %1 chai2009efficiently %A Chai, Xiaoyong %A Vuong, Ba-Quy %A Doan, AnHai %A Naughton, Jeffrey F. %B Proceedings of the 35th SIGMOD international conference on Management of data %C New York, NY, USA %D 2009 %I ACM %K %P 87--100 %R 10.1145/1559845.1559857 %T Efficiently incorporating user feedback into information extraction and integration programs %U http://doi.acm.org/10.1145/1559845.1559857 %X Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution. %@ 978-1-60558-551-2

@inproceedings{chai2009efficiently, abstract = {Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.}, acmid = {1559857}, added-at = {2012-06-19T17:05:41.000+0200}, address = {New York, NY, USA}, author = {Chai, Xiaoyong and Vuong, Ba-Quy and Doan, AnHai and Naughton, Jeffrey F.}, biburl = {https://www.bibsonomy.org/bibtex/2d6c9fbf442a935dc0618107f8fb54d44/jaeschke}, booktitle = {Proceedings of the 35th SIGMOD international conference on Management of data}, doi = {10.1145/1559845.1559857}, interhash = {5860215447e374b059597c0e3864e388}, intrahash = {d6c9fbf442a935dc0618107f8fb54d44}, isbn = {978-1-60558-551-2}, keywords = {}, location = {Providence, Rhode Island, USA}, numpages = {14}, pages = {87--100}, publisher = {ACM}, timestamp = {2012-06-19T17:05:41.000+0200}, title = {Efficiently incorporating user feedback into information extraction and integration programs}, url = {http://doi.acm.org/10.1145/1559845.1559857}, year = 2009 }

BibSonomy

Efficiently incorporating user feedback into information extraction and integration programs

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on

BibSonomy

Efficiently incorporating user feedback into information extraction and integration programs

Abstract

Tags

Users

Referenced and cited publications

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on