<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:burst="http://xmlns.com/burst/0.1/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns="http://purl.org/rss/1.0/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:swrc="http://swrc.ontoware.org/ontology#" xmlns:cc="http://web.resource.org/cc/"><channel rdf:about="http://www.bibsonomy.org/bibtex/3e9b05638c537f23a276ef4e09d4b9d4"><title>BibSonomy publications for /bibtex/3e9b05638c537f23a276ef4e09d4b9d4</title><link>BibSonomyburst/bibtex/3e9b05638c537f23a276ef4e09d4b9d4</link><description>BibSonomy RSS feed for /bibtex/3e9b05638c537f23a276ef4e09d4b9d4</description><dc:date>2012-02-17T08:27:23+01:00</dc:date><items><rdf:Seq><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2e8d7e47dafc145c54846bb69e1c1be39/stroeh"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/278b3f3faced79adfcda4e3a57f7e57ff/mstrohm"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/22948189a910501dfdb86469a3e13505a/neilernst"/></rdf:Seq></items></channel><item rdf:about="http://www.bibsonomy.org/bibtex/2e8d7e47dafc145c54846bb69e1c1be39/stroeh"><title>On the resemblance and containment of documents</title><link>http://www.bibsonomy.org/bibtex/2e8d7e47dafc145c54846bb69e1c1be39/stroeh</link><dc:creator>stroeh</dc:creator><dc:date>2011-07-07T11:07:43+02:00</dc:date><dc:subject>detection duplicate resemblance </dc:subject><content:encoded>&lt;span class=&#034;authorEditorList&#034;&gt;&lt;a href=&#034;/author/Broder&#034;&gt;Andrei Z. Broder&lt;/a&gt; &lt;/span&gt;&lt;em&gt;Compression and Complexity of Sequences, &lt;/em&gt;&lt;em&gt;page 21--29. &lt;/em&gt;&lt;em&gt;Salerno, Italy, &lt;/em&gt;&lt;em&gt;IEEE Computer Society Press, &lt;/em&gt;(&lt;em&gt;June 1997&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/detection"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/duplicate"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/resemblance"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2e8d7e47dafc145c54846bb69e1c1be39/stroeh"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2e8d7e47dafc145c54846bb69e1c1be39/stroeh"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#InProceedings"/><owl:sameAs rdf:resource="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.779&amp;rep=rep1&amp;type=pdf"/><swrc:date>Thu Jul 07 11:07:43 CEST 2011</swrc:date><swrc:address>Salerno, Italy</swrc:address><swrc:booktitle>Compression and Complexity of Sequences</swrc:booktitle><swrc:month>June</swrc:month><swrc:pages>21--29</swrc:pages><swrc:publisher><swrc:Organization swrc:name="IEEE Computer Society Press"/></swrc:publisher><swrc:title>On the resemblance and containment of documents</swrc:title><swrc:year>1997</swrc:year><swrc:keywords>detection duplicate resemblance </swrc:keywords><swrc:abstract>Given two documents A and B we define two mathematical notions: their
	resemblance r(A, B) and their containment c(A, B) that seem to capture
	well the informal notions of â€œroughly the sameâ€� and â€œroughly
	contained.â€� The basic idea is to reduce these issues to set intersection
	problems that can be easily evaluated by a process of random sampling
	that can be done independently for each document. Furthermore, the
	resemblance can be evaluated using a fixed size sample for each
	document. This paper discusses the mathematical properties of these
	measures and the efficient implementation of the sampling process
	using Rabin (1981) fingerprints</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="3" swrc:key="priority"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="562668" swrc:key="citeulike-article-id"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Andrei Z. Broder"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication><description>Not previously uploaded</description></item><item rdf:about="http://www.bibsonomy.org/bibtex/278b3f3faced79adfcda4e3a57f7e57ff/mstrohm"><title>On the Resemblance and Containment of Documents</title><link>http://www.bibsonomy.org/bibtex/278b3f3faced79adfcda4e3a57f7e57ff/mstrohm</link><dc:creator>mstrohm</dc:creator><dc:date>2009-08-19T01:22:38+02:00</dc:date><dc:subject>INFLUENTIAL information-retrieval similarity </dc:subject><content:encoded>&lt;span class=&#034;authorEditorList&#034;&gt;&lt;a href=&#034;/author/Broder&#034;&gt;A. Broder&lt;/a&gt; &lt;/span&gt;&lt;em&gt;SEQUENCES &amp;#039;97: Proceedings of the Compression and Complexity of Sequences 1997, &lt;/em&gt;&lt;em&gt;page 21. &lt;/em&gt;&lt;em&gt;Washington, DC, USA, &lt;/em&gt;&lt;em&gt;IEEE Computer Society, &lt;/em&gt;(&lt;em&gt;1997&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/INFLUENTIAL"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/information-retrieval"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/similarity"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/278b3f3faced79adfcda4e3a57f7e57ff/mstrohm"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/278b3f3faced79adfcda4e3a57f7e57ff/mstrohm"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#InProceedings"/><swrc:date>Wed Aug 19 01:22:38 CEST 2009</swrc:date><swrc:address>Washington, DC, USA</swrc:address><swrc:booktitle>SEQUENCES &#039;97: Proceedings of the Compression and Complexity of Sequences 1997</swrc:booktitle><swrc:pages>21</swrc:pages><swrc:publisher><swrc:Organization swrc:name="IEEE Computer Society"/></swrc:publisher><swrc:title>On the Resemblance and Containment of Documents</swrc:title><swrc:year>1997</swrc:year><swrc:keywords>INFLUENTIAL information-retrieval similarity </swrc:keywords><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="A. Broder"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication><description>on shingles</description></item><item rdf:about="http://www.bibsonomy.org/bibtex/22948189a910501dfdb86469a3e13505a/neilernst"><title>On the resemblance and containment of documents</title><link>http://www.bibsonomy.org/bibtex/22948189a910501dfdb86469a3e13505a/neilernst</link><dc:creator>neilernst</dc:creator><dc:date>2006-09-25T06:32:37+02:00</dc:date><dc:subject>shingles database </dc:subject><content:encoded>&lt;span class=&#034;authorEditorList&#034;&gt;&lt;a href=&#034;/author/Broder&#034;&gt;A. Z. Broder&lt;/a&gt; &lt;/span&gt;&lt;em&gt;Compression and Complexity of Sequences, &lt;/em&gt;&lt;em&gt;page 21--29. &lt;/em&gt;&lt;em&gt;Salerno, Italy, &lt;/em&gt;&lt;em&gt;IEEE Computer Society Press, &lt;/em&gt;(&lt;em&gt;June 1997&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/shingles"/><rdf:li rdf:resource="http://www.bibsonomy.org/tag/database"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/22948189a910501dfdb86469a3e13505a/neilernst"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/22948189a910501dfdb86469a3e13505a/neilernst"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#InProceedings"/><owl:sameAs rdf:resource="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=666900"/><swrc:date>Mon Sep 25 06:32:37 CEST 2006</swrc:date><swrc:address>Salerno, Italy</swrc:address><swrc:booktitle>Compression and Complexity of Sequences</swrc:booktitle><swrc:month>June</swrc:month><swrc:pages>21--29</swrc:pages><swrc:publisher><swrc:Organization swrc:name="IEEE Computer Society Press"/></swrc:publisher><swrc:title>On the resemblance and containment of documents</swrc:title><swrc:year>1997</swrc:year><swrc:keywords>shingles database </swrc:keywords><swrc:abstract>Given two documents A and B we define two mathematical notions: their
	resemblance r(A, B) and their containment c(A, B) that seem to capture
	well the informal notions of â€œroughly the sameâ€� and â€œroughly
	contained.â€� The basic idea is to reduce these issues to set intersection
	problems that can be easily evaluated by a process of random sampling
	that can be done independently for each document. Furthermore, the
	resemblance can be evaluated using a fixed size sample for each
	document. This paper discusses the mathematical properties of these
	measures and the efficient implementation of the sampling process
	using Rabin (1981) fingerprints</swrc:abstract><swrc:hasExtraField><swrc:Field swrc:value="3" swrc:key="priority"/></swrc:hasExtraField><swrc:hasExtraField><swrc:Field swrc:value="562668" swrc:key="citeulike-article-id"/></swrc:hasExtraField><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="A. Z. Broder"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication><description>Not previously uploaded</description></item></rdf:RDF>
