Abstract

Modern data-driven frameworks often have to process large amounts of data periodically. Hence, they often operate under time or space constraints. This also holds for Linked Data-driven frameworks when processing RDF data, in particular, when they perform link discovery tasks. In this work, we present a novel approach for link discovery under constraints pertaining to the expected recall of a link discovery task. Given a link specification, the approach aims to find a subsumed link specification that achieves a lower run time than the input specification while abiding by a predefined constraint on the expected recall it has to achieve. Our approach, dubbed LIGER, combines downward refinement oper- ators with monotonicity assumptions to detect such specifications. We evaluate our approach on seven datasets. Our results suggest that the different implemen- tations of LIGER can detect subsumed specifications that abide by expected recall constraints efficiently, thus leading to significantly shorter overall run times than our baseline.

Links and resources

Tags

community