Inproceedings,

Efficiently incorporating user feedback into information extraction and integration programs

, , , and .
Proceedings of the 35th SIGMOD international conference on Management of data, page 87--100. New York, NY, USA, ACM, (2009)
DOI: 10.1145/1559845.1559857

Abstract

Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process.</p> <p>In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer <i>U</i> uses hlog, a declarative IE/II language, to write an IE/II program <i>P</i>. Next, <i>U</i> writes declarative user feedback rules that specify which parts of <i>P</i>'s data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program <i>P</i> is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback <i>F</i> on a data portion of <i>P</i>, we show how to automatically propagate <i>F</i> to the rest of <i>P</i>, and to seamlessly combine <i>F</i> with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.

Tags

Users

  • @jaeschke
  • @dblp

Comments and Reviewsshow / hide

  • @jaeschke
    12 years ago
    The paper presents an information extraction (IE) approach that builds upon a declarative language (hlog, an extension of xlog, similar to Datalog) to describe the extraction rules. Like in the case of incremental updates of materialized views, user input updates the extracted data (not the rules!). The basic idea is to store together with the user's input the range in the input data where the affected tuples are coming from such that later, upon re-execution of the execution plan, the user feedback can be efficiently propagated. Very well written and interesting. Although I expected to find something more sophisticated (user feedback that directly modifies/generates rules) I must acknowledge that even this approach is complicated enough.
Please log in to take part in the discussion (add own reviews or comments).