tag :: duplicate | BibSonomy

bookmarks (hide)36
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1Near-duplicates and shingling
can now generate all pairs $i,j$ for which $x_i^\pi$ is present in both their sketches. From these we can compute, for each pair $i,j$ with non-zero sketch overlap, a count of the number of $x_i^\pi$ values they have in common. By applying a preset threshold, we know which pairs $i,j$ have heavily overlapping sketches. For instance, if the threshold were 80%, we would need the count to be at least 160 for any $i,j$. As we identify such pairs, we run the union-find to group documents into near-duplicate ``syntactic clusters''. This is essentially a variant of the single-link clustering algorithm introduced in Section 17.2 (page [*]).
14 years ago by @stroeh
show all tags
near
shingle
duplicate
shingling
nearshingleduplicateshingling
copydelete
- community post
- history of this post
1DuplicatesFilter - one for contrib? | Lucene | Java-Dev
http://www.gossamer-threads.com/lists/lucene/java-dev/53351
13 years ago by @folke
show all tags
detection
dedupe
remove
duplicate
lucene
duplicatesfilter
detectiondeduperemoveduplicateluceneduplicatesfilter
copydelete
- community post
- history of this post
1BibSonomy::changes
Output of BibTeX now includes tags in a field called "keywords", which is more common than "tags". When importing BibTeX, both fields are merged. If you post a single BibTeX snippet, which you already have, you can see the duplicate on the edit bibtex page. Furthermore the postBookmark button has been updated: selected text from the page is now automatically included in the description/comment field. On the settings page you can now update your email address, as well as your homepage and real name.
19 years ago by @admin
show all tags
settings
postBookmark
button
change
changes
duplicate
bibtex
bibsonomy
settingspostBookmarkbuttonchangechangesduplicatebibtexbibsonomy
copydelete
- community post
- history of this post
2Duplicate Data Detection
http://www.ir.iit.edu/~abdur/Research/Duplicate.html
18 years ago by @hotho
show all tags
detection
toread
duplicate
detectiontoreadduplicate
copydelete
- community post
- history of this post
1DuMP3 - duplicate/similar file finder - Home
How do you find similar pictures in a large collection in different formats, resolutions and different rotations? How do you find all the duplicates in a huge collection of music files in different formats? How do you find duplicate text files or binary files on your computer? Do you get a program to handle each case individually or would you rather have one program that does it all? Here is my solution
18 years ago by @gresch
show all tags
file
software
duplicate
tools
finder
search
filesoftwareduplicatetoolsfindersearch
copydelete
- community post
- history of this post
1DupeFinder - Monstrous Software
DupeFinder is a simple application for locating, moving, renaming and deleting duplicate files in a directory structure. It's perfect both for users who haven't kept their hard drives very well organized and need to do some cleaning to free space and for users who like to keep lots of backup copies of important data "just in case" something bad should happen.
18 years ago by @gresch
show all tags
software
find
duplicate
tools
search
softwarefindduplicatetoolssearch
copydelete
- community post
- history of this post
1Delete Duplicate Bookmarks In Firefox With BookmarkDD Add-on
http://www.watchingthenet.com/delete-duplicate-bookmarks-in-firefox-with-bookmarkdd-add-on.html
18 years ago by @viper
show all tags
bookmark
firefox
browser
duplicate
manage
internet
bookmarkfirefoxbrowserduplicatemanageinternet
copydelete
- community post
- history of this post
1=> 1-Klick Duplikate Löschen für Outlook
http://easy2sync.de/de/produkte/1-Click-Duplicate-Delete-Outlook.php
16 years ago by @karinnadrowski
show all tags
duplicate
outlook
duplicates
duplicateoutlookduplicates
copydelete
- community post
- history of this post
1UnDupe from Stevens Creek Software
http://stevenscreek.com/palm/undupe.shtml
16 years ago by @karinnadrowski
show all tags
duplicate
mobile
sync
duplicates
synchronisation
duplicatemobilesyncduplicatessynchronisation
copydelete
- community post
- history of this post
1Dupinator -- detect and delete duplicate files
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362459
17 years ago by @mortimer_m8
show all tags
file
management
python
utils
duplicate
filemanagementpythonutilsduplicate
copydelete
- community post
- history of this post
1Signature Based Duplicate Detection in Digital Libraries - Powered by Google Text & Tabellen
http://docs.google.com/viewer?a=v&q=cache:hSTBthSicWIJ:www.bibalex.org/icudl06/presentation/(SreenivasRao)_Signature_Based_Duplicate_Detection_in_Digital_Libraries.ppt+libraries+duplicate+detection&hl=de&pid=bl&srcid=ADGEESgF2l4SzhchmQ3FqkiNZAN5FpHI5hrC8ybDPwrTuM6TyoWvg-Ckfrt5VnMEJrs39uS4FW9g--_n8XiaZX0j7edrQ8ifAobN3uoDG9oXGqcWTeaFEpVGKLHLv5QwBmJB-5AzDc3x&sig=AHIEtbSKjdkrBwI2MLnoQBKPO-c2LxS12w
14 years ago by @stroeh
show all tags
library
detection
signature
duplicate
librarydetectionsignatureduplicate
copydelete
- community post
- history of this post
1tutorial 4 (Duplicate Detection).pdf (application/pdf-Objekt)
http://webcourse.cs.technion.ac.il/236621/Winter2010-2011/ho/WCFiles/tutorial%204%20(Duplicate%20Detection).pdf
14 years ago by @stroeh
show all tags
detection
webcourse
tutorial
duplicate
detectionwebcoursetutorialduplicate
copydelete
- community post
- history of this post
1Duplicate Files Finder | Download Duplicate Files Finder software for free at SourceForge.net
Duplicate Files Finder is a cross-platform application for finding and removing duplicate files by deleting, creating hardlinks or creating symbolic links. A special algorithm minimizes the amount of data read from disk, so the program is very fast.
14 years ago by @gresch
show all tags
doubletten
java
software
duplicate
tools
doublettenjavasoftwareduplicatetools
copydelete
- community post
- history of this post
1Deduplication - Solr Wiki
http://wiki.apache.org/solr/Deduplication
13 years ago by @stroeh
show all tags
searchengine
solr
ir
deduplication
duplicate
searchenginesolrirdeduplicationduplicate
copydelete
- community post
- history of this post
3dupeGuru - Duplicate file scanner
dupeGuru is a tool to find duplicate files on your computer. It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same. dupeGuru runs on Windows, Mac OS X and Linux. dupeGuru is efficient. Find your duplicate files in minutes, thanks to its quick fuzzy matching algorithm. dupeGuru not only finds filenames that are the same, but it also finds similar filenames. dupeGuru is customizable. You can tweak its matching engine to find exactly the kind of duplicates you want to find. The Preference page of the help file lists all the scanning engine settings you can change. dupeGuru is safe. Its engine has been especially designed with safety in mind. Its reference directory system as well as its grouping system prevent you from deleting files you didn't mean to delete. Do whatever you want with your duplicates. Not only can you delete duplicates files dupeGuru finds, but you can also move or copy them elsewhere. There are also multiple ways to filter and sort your results to easily weed out false duplicates (for low threshold scans). Supported languages: English, French. Requirements Mac OS X: 10.5 and up (Leopard, Snow Leopard or Lion). PowerPC or Intel. (Last version to support Tiger: v2.8.2) Windows: 2k/XP/Vista/Win7. Linux: Ubuntu 10.04
13 years ago by @gresch
show all tags
linux
dupl
software
duplicate
tools
windows
macos
linuxduplsoftwareduplicatetoolswindowsmacos
copydelete
- community post
- history of this post
1Duplication Reduction Processors - Heritrix - IA Webteam Confluence
https://webarchive.jira.com/wiki/display/Heritrix/Duplication+Reduction+Processors
11 years ago by @jaeschke
show all tags
heritrix
crawling
duplicate
recrawl
heritrixcrawlingduplicaterecrawl
copydelete
- community post
- history of this post

⟨⟨
⟨
1
2
⟩
⟩⟩

publications (hide)98
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2Duplicate News Story Detection Revisited
O. Alonso, D. Fetterly, and M. Manasse. AIRS, volume 8281 of Lecture Notes in Computer Science, page 203-214. Springer, (2013)
10 years ago by @nosebrain
show all tags
news
detection
duplicate
story
newsdetectionduplicatestory
copydeleteadd this publication to your clipboard
2Accurate discovery of co-derivative documents via duplicate text detection.
Y. Bernstein, and J. Zobel. Inf. Syst., 31 (7): 595-609 (2006)
13 years ago by @stroeh
show all tags
detection
co-derivative
duplicate
derivative
detectionco-derivativeduplicatederivative
copydeleteadd this publication to your clipboard
9Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content
K. Bharat, and A. Broder. Computer Networks, 31 (11-16): 1579-1590 (1999)
16 years ago by @juver
show all tags
web
information_science
duplicate
replicated
wismasys0809
content
webinformation_scienceduplicatereplicatedwismasys0809content
copydeleteadd this publication to your clipboard
9Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content
K. Bharat, and A. Broder. Computer Networks, 31 (11-16): 1579-1590 (1999)
16 years ago by @olhah
show all tags
Replicated
Web
Content
duplicate
wismasys0809
content
ReplicatedWebContentduplicatewismasys0809content
copydeleteadd this publication to your clipboard
9Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content
K. Bharat, and A. Broder. Computer Networks, 31 (11-16): 1579-1590 (1999)
16 years ago by @robo
show all tags
duplicate
ohst
wismasys0809
content
pairs
duplicateohstwismasys0809contentpairs
copydeleteadd this publication to your clipboard
10Adaptive duplicate detection using learnable string similarity measures
M. Bilenko, and R. Mooney. (2003)
17 years ago by @wnpxrz
show all tags
measure
detection
string
similarity
duplicate
measuredetectionstringsimilarityduplicate
copydeleteadd this publication to your clipboard
10Adaptive Duplicate Detection Using Learnable String Similarity Measures
M. Bilenko, and R. Mooney. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), page 39--48. (2003)
17 years ago by @philipp
show all tags
similarity
duplicate
svn
similarityduplicatesvn
copydeleteadd this publication to your clipboard
1On Evaluation and Training-Set Construction for Duplicate Detection
M. Bilenko, and R. Mooney. Proceedings of the KDD-2003 Workshop on Data, page 7--12. Washington, DC, (2003)
19 years ago by @sam_chapman
show all tags
set
standard
detection
construction
training
duplicate
RIDDLE
gold
setstandarddetectionconstructiontrainingduplicateRIDDLEgold
copydeleteadd this publication to your clipboard
10Adaptive duplicate detection using learnable string similarity measures.
M. Bilenko, and R. Mooney. KDD, page 39-48. ACM, (2003)
13 years ago by @stroeh
show all tags
detection
similarity
duplicate
detectionsimilarityduplicate
copydeleteadd this publication to your clipboard
10Adaptive Duplicate Detection Using Learnable String Similarity Measures
M. Bilenko, and R. Mooney. Proceedings of the Ninth ACM SIGKDD International, Washington, DC, (2003)
19 years ago by @sam_chapman
show all tags
detection
string
similarity
machine
learning
duplicate
detectionstringsimilaritymachinelearningduplicate
copydeleteadd this publication to your clipboard
7Space/time trade-offs in hash coding with allowable errors
B. Bloom. Commun. ACM, 13 (7): 422--426 (July 1970)
16 years ago by @lillejul
show all tags
annotation
lineage
imprecision
semantic
error
okkam
probabilistic
aggregation
duplicate
graph
entity
annotationlineageimprecisionsemanticerrorokkamprobabilisticaggregationduplicategraphentity
copydeleteadd this publication to your clipboard
1An unsupervised heuristic-based approach for bibliographic metadata deduplication
E. Borges, M. De Carvalho, R. Galante, M. Gonçalves, and A. Laender. Information Processing & Management, 47 (5): 706--718 (2011)
13 years ago by @hotho
show all tags
toread
bibliographic
duplicate
puma
toreadbibliographicduplicatepuma
copydeleteadd this publication to your clipboard
3A Classification-based Approach for Bibliographic Metadata Deduplication
E. Borges, K. Becker, C. Heuser, and R. Galante. Proceedings of the IADIS International Conference WWW/Internet 2011, (2011)
13 years ago by @jaeschke
show all tags
detection
metadata
bibliographic
duplicate
classification
detectionmetadatabibliographicduplicateclassification
copydeleteadd this publication to your clipboard
3A Classification-based Approach for Bibliographic Metadata Deduplication
E. Borges, K. Becker, C. Heuser, and R. Galante. Proceedings of the IADIS International Conference WWW/Internet 2011, (2011)
13 years ago by @hotho
show all tags
detection
bibliographic
duplicate
puma
detectionbibliographicduplicatepuma
copydeleteadd this publication to your clipboard
3Copy Detection Mechanisms for Digital Documents.
S. Brin, J. Davis, and H. Garcia-Molina. SIGMOD Conference, page 398-409. ACM Press, (1995)
13 years ago by @stroeh
show all tags
detection
copy
duplicate
detectioncopyduplicate
copydeleteadd this publication to your clipboard
1Algorithms for duplicate documents
A. Broder. (2005)
13 years ago by @stroeh
show all tags
detection
algorithms
duplicate
detectionalgorithmsduplicate
copydeleteadd this publication to your clipboard
5On the resemblance and containment of documents
A. Broder. Compression and Complexity of Sequences, page 21--29. Salerno, Italy, IEEE Computer Society Press, (June 1997)
13 years ago by @stroeh
show all tags
detection
resemblance
duplicate
detectionresemblanceduplicate
copydeleteadd this publication to your clipboard
5Identifying and Filtering Near-Duplicate Documents.
A. Broder. CPM, volume 1848 of Lecture Notes in Computer Science, page 1-10. Springer, (2000)
13 years ago by @stroeh
show all tags
detection
near
duplicate
detectionnearduplicate
copydeleteadd this publication to your clipboard
11Syntactic Clustering of the Web.
A. Broder, S. Glassman, M. Manasse, and G. Zweig. Computer Networks, 29 (8-13): 1157-1166 (1997)
18 years ago by @hotho
show all tags
detection
toread
duplicate
detectiontoreadduplicate
copydeleteadd this publication to your clipboard
2Benchmarking Declarative Approximate Selection Predicates
A. Chandel, O. Hassanzadeh, N. Koudas, M. Sadoghi, and D. Srivastava. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2007), page 353-364. (June 2007)
15 years ago by @oktie
show all tags
approximate
cleaning,
data
join,
duplicate
quality,
Approximate
resolution,
q-gram,
linkage,
record
detection,
Selection,
fuzzy
declarative
qgrams
entity
approximatecleaning,datajoin,duplicatequality,Approximateresolution,q-gram,linkage,recorddetection,Selection,fuzzydeclarativeqgramsentity
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
2
3
⟩
⟩⟩

bookmarks (hide)36 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide)98 displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

browse

related tags

similar tags

bookmarks (hide)36
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)98
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...