group :: regio | BibSonomy

bookmarks (hide)508
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1perma.cc
Perma.cc helps scholars, journals and courts create permanent links to the online sources cited in their work.
10 years ago by @jaeschke
show all tags
bookmark
web
citation
permanent
perma
archive
link
internet
bookmarkwebcitationpermanentpermaarchivelinkinternet
copydelete
- community post
- history of this post
1Host Link Graph JISC UK Web Domain Dataset (1996-2010)
UK Web Archive Open Data
10 years ago by @jaeschke
show all tags
web
dataset
uk
data
host
archive
link
graph
jisc
webdatasetukdatahostarchivelinkgraphjisc
copydelete
- community post
- history of this post
1JISC UK Web Domain Dataset (1996-2013)
UK Web Archive Open Data
10 years ago by @jaeschke
show all tags
domain
web
dataset
uk
data
archive
open
jisc
domainwebdatasetukdataarchiveopenjisc
copydelete
- community post
- history of this post
3HTTP Archive
The HTTP Archive tracks how the Web is built.
10 years ago by @jaeschke
show all tags
web
http
archive
internet
webhttparchiveinternet
copydelete
- community post
- history of this post
3Raw
http://raw.densitydesign.org/
10 years ago by @jaeschke
show all tags
diagram
web
plot
visualization
alluvial
graphics
diagramwebplotvisualizationalluvialgraphics
copydelete
- community post
- history of this post
1Oxford Internet Institute - Research - Projects - Wikipedia's Networks and Geographies: Representation and Power in Peer-Produced Content
This project brings together OII research fellows and doctoral students to shed light on the incorporation of new users and information into the Wikipedia community.
10 years ago by @jaeschke
show all tags
web
webscience
oxford
project
wikipedia
institute
internet
research
webwebscienceoxfordprojectwikipediainstituteinternetresearch
copydelete
- community post
- history of this post
1Web Science
http://eprints.soton.ac.uk/262615/1/Web%20Science.htm
10 years ago by @jaeschke
show all tags
science
web
webscience
sciencewebwebscience
copydelete
- community post
- history of this post
1WDC - Hyperlink Graphs
This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
10 years ago by @jaeschke
show all tags
web
dataset
link
graph
webdatasetlinkgraph
copydelete
- community post
- history of this post
1WIRE Workshop | Cambridge, MA: June 17 – 18, 2014
http://wp.comminfo.rutgers.edu/nsfia/
10 years ago by @jaeschke
show all tags
web
wire
workshop
archive
internet
webwireworkshoparchiveinternet
copydelete
- community post
- history of this post
1ACM Web Science Conference 2014 (WebSci14)
http://www.websci14.org/
11 years ago by @hotho
show all tags
science
pc
web
conference
2014
acm
sciencepcwebconference2014acm
copydelete
- community post
- history of this post
3WebCite
http://webcitation.org/
11 years ago by @jaeschke
show all tags
science
web
citation
webcite
sciencewebcitationwebcite
copydelete
- community post
- history of this post
3Web Data Mining, book by Bing Liu
Web data mining techniques and algorithm
11 years ago by @jaeschke
show all tags
web
mining
book
data
algorithm
webminingbookdataalgorithm
copydelete
- community post
- history of this post
6Speaking JavaScript
http://speakingjs.com/es5/
11 years ago by @jaeschke
show all tags
reference
web
book
tutorial
manual
javascript
programming
referencewebbooktutorialmanualjavascriptprogramming
copydelete
- community post
- history of this post
1Ian Milligan | A Digital, Public, and Youth Historian of 20th-Century Canada
A Digital, Public, and Youth Historian of 20th-Century Canada (by Ian Milligan)
11 years ago by @jaeschke
show all tags
humanities
web
archive
history
warc
digital
humanitieswebarchivehistorywarcdigital
copydelete
- community post
- history of this post
1Welcome to iamResearcher
The Open Knowledge Network
11 years ago by @jaeschke
show all tags
web
social
researcher
network
gaw
research
websocialresearchernetworkgawresearch
copydelete
- community post
- history of this post
1Web Archive Analysis Workshop - Internet Research - IA Webteam Confluence
https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Analysis+Workshop
11 years ago by @jaeschke
show all tags
web
wat
archive
hadoop
analysis
pig
warc
internet
webwatarchivehadoopanalysispigwarcinternet
copydelete
- community post
- history of this post
1internetarchive/ia-web-commons · GitHub
Contribute to ia-web-commons development by creating an account on GitHub.
11 years ago by @jaeschke
show all tags
web
archive
hadoop
warc
webarchivehadoopwarc
copydelete
- community post
- history of this post
2CLARIN-NL | CLARIN-NL
The CLARIN infrastructure is a research infrastructure intended for humanities researchers that work with language data and tools.
11 years ago by @hotho
show all tags
web
data
infrastructure
text
tools
webdatainfrastructuretexttools
copydelete
- community post
- history of this post
3WDC - Hyperlink Graph
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
11 years ago by @hotho
show all tags
hyperlink
web
dataset
graph
hyperlinkwebdatasetgraph
copydelete
- community post
- history of this post
1ia-web-commons/src/main/java/org/archive/hadoop/ResourceRecordReader.java at master · internetarchive/ia-web-commons
https://github.com/internetarchive/ia-web-commons/blob/master/src/main/java/org/archive/hadoop/ResourceRecordReader.java
11 years ago by @jaeschke
show all tags
bigdata
web
archive
crawling
hadoop
analysis
warc
programming
bigdatawebarchivecrawlinghadoopanalysiswarcprogramming
copydelete
- community post
- history of this post

publications (hide)550
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

1What every researcher should know about searching – clarified concepts, search advice, and an agenda to improve finding in academia
M. Gusenbauer, and N. Haddaway. Research Synthesis Methods, 12 (2): 136-147 (2021)
2 months ago by @jaeschke
show all tags
web
search
research
websearchresearch
copydeleteadd this publication to your clipboard
2Click Models for Web Search
A. Chuklin, I. Markov, and M. de Rijke. Springer International Publishing, (2015)
5 months ago by @jaeschke
show all tags
web
click
book
model
search
webclickbookmodelsearch
copydeleteadd this publication to your clipboard
1Effects of European Union Funding and International Collaboration on Estonian Scientific Impact
T. Hirv. Journal of Scientometric Research, (January 2018)
6 months ago by @tobias.koopmann
show all tags
Estonia,
Web
impact,
Collaboration,
Funding
sources,
of
Scientific
Research
diss
social-science-related-work
Science
Estonia,Webimpact,Collaboration,Fundingsources,ofScientificResearchdisssocial-science-related-workScience
copydeleteadd this publication to your clipboard
1A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification
E. Baykan, M. Henzinger, L. Marian, and I. Weber. Transactions on the Web, 5 (3): 1--29 (July 2011)
11 months ago by @jaeschke
show all tags
web
link
classification
url
weblinkclassificationurl
copydeleteadd this publication to your clipboard
1Blocking and Filtering Techniques for Entity Resolution
G. Papadakis, D. Skoutas, E. Thanos, and T. Palpanas. ACM Computing Surveys, 53 (2): 1--42 (March 2020)
12 months ago by @jaeschke
show all tags
semantic
web
data
blocking
resolution
ner
knowledge
graph
open
filtering
linked
entity
semanticwebdatablockingresolutionnerknowledgegraphopenfilteringlinkedentity
copydeleteadd this publication to your clipboard
2Construction of Knowledge Graphs: State and Challenges
M. Hofer, D. Obraczka, A. Saeedi, H. Köpcke, and E. Rahm. (2023)cite arxiv:2302.11509Comment: 43 pages, 5 figures, 3 tables.
12 months ago by @jaeschke
show all tags
lod
semantic
web
data
survey
knowledge
graph
open
linked
lodsemanticwebdatasurveyknowledgegraphopenlinked
copydeleteadd this publication to your clipboard
3What's Really New on the Web?: Identifying New Pages from a Series of Unstable Web Snapshots
M. Toyoda, and M. Kitsuregawa. Proceedings of the 15th International Conference on World Wide Web, page 233--241. New York, NY, USA, ACM, (2006)
a year ago by @tobias.koopmann
show all tags
web
web
copydeleteadd this publication to your clipboard
3Focused Crawl of Web Archives to Build Event Collections
M. Klein, L. Balakireva, and H. Van de Sompel. Proceedings of the 10th ACM Conference on Web Science, page 333--342. New York, NY, USA, ACM, (2018)
a year ago by @tobias.koopmann
show all tags
web
web
copydeleteadd this publication to your clipboard
2CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl
M. Fröbe, J. Bevendorff, L. Gienapp, M. Völske, B. Stein, M. Potthast, and M. Hagen. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, (July 2021)
2 years ago by @jaeschke
show all tags
web
detection
common
copycat
duplicate
crawl
webdetectioncommoncopycatduplicatecrawl
copydeleteadd this publication to your clipboard
2DSDD: Domain-Specific Dataset Discovery on the Web
H. Zhang, A. Santos, and J. Freire. Proceedings of the 30th ACM International Conference on Information &amp$\mathsemicolon$ Knowledge Management, ACM, (October 2021)
2 years ago by @jaeschke
show all tags
unknowndata
web
dataset
data
discovery
crawling
unknowndatawebdatasetdatadiscoverycrawling
copydeleteadd this publication to your clipboard
2Analyzing the Web: Are Top Websites Lists a Good Choice for Research?
T. Alby, and R. Jäschke. Proceedings of the International Conference on Theory and Practice of Digital Libraries, page 11--25. Cham, Springer, (2022)
2 years ago by @jaeschke
show all tags
science
myown
web
tpdl
commoncrawl
archive
2022
alexa
crawl
research
sciencemyownwebtpdlcommoncrawlarchive2022alexacrawlresearch
copydeleteadd this publication to your clipboard
2Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages
T. Alrashed, D. Paparas, O. Benjelloun, Y. Sheng, and N. Noy. The Semantic Web -- ISWC 2021, page 338--356. Cham, Springer International Publishing, (2021)
2 years ago by @jaeschke
show all tags
unknowndata
web
dataset
semantics
markup
semanticweb
extraction
unknowndatawebdatasetsemanticsmarkupsemanticwebextraction
copydeleteadd this publication to your clipboard
3Where are the Datasets? A case study on the German Academic Web Archive
Y. Younes, S. Tiesler, R. Jäschke, and B. Mathiak. Proceedings of the Web Archiving and Digital Libraries Workshop at JCDL 2022, (2022)
2 years ago by @jaeschke
show all tags
myown
german
unknowndata
web
dataset
academic
2022
gaw
crawl
myowngermanunknowndatawebdatasetacademic2022gawcrawl
copydeleteadd this publication to your clipboard
2WebFormer: The Web-page Transformer for Structure Information Extraction
Q. Wang, Y. Fang, A. Ravula, F. Feng, X. Quan, and D. Liu. Proceedings of the ACM Web Conference 2022, ACM, (April 2022)
2 years ago by @jaeschke
show all tags
web
deeplearning
transformer
page
html
ie
webformer
information
extraction
plk
webdeeplearningtransformerpagehtmliewebformerinformationextractionplk
copydeleteadd this publication to your clipboard
1ArchiveSpark
H. Holzmann, V. Goel, and A. Anand. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, ACM, (June 2016)
3 years ago by @jaeschke
show all tags
archivespark
web
spark
archive
warc
archivesparkwebsparkarchivewarc
copydeleteadd this publication to your clipboard
4Googleology is Bad Science
A. Kilgarriff. Computational Linguistics, 33 (1): 147--151 (March 2007)
3 years ago by @jaeschke
show all tags
science
web
google
sciencewebgoogle
copydeleteadd this publication to your clipboard
3The Semantic Web - ISWC 2021 - 20th International Semantic Web Conference, ISWC 2021, Virtual Event, October 24-28, 2021, Proceedings
A. Hotho, E. Blomqvist, S. Dietze, A. Fokoue, Y. Ding, P. Barnaghi, A. Haller, M. Dragoni, and H. Alani (Eds.) volume 12922 of Lecture Notes in Computer Science, Springer, (2021)
3 years ago by @hotho
show all tags
myown
semantic
web
conference
2021
myownsemanticwebconference2021
copydeleteadd this publication to your clipboard
2Improving Relevance Prediction for Focused Web Crawlers
M. Safran, A. Althagafi, and D. Che. 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, page 161-166. (May 2012)
3 years ago by @parismic
show all tags
web
crawler
unkowndata
relevance
webcrawlerunkowndatarelevance
copydeleteadd this publication to your clipboard
1Archiving information from geotagged tweets to promote reproducibility and comparability in social media research
K. Kinder-Kurlanda, K. Weller, W. Zenk-Möltgen, J. Pfeffer, and F. Morstatter. Big Data & Society, 4 (2): 205395171773633 (November 2017)
3 years ago by @jaeschke
show all tags
web
twitter
archive
tweets
webtwitterarchivetweets
copydeleteadd this publication to your clipboard
4AggregateRank: Bringing order to web sites
G. Feng, T. Liu, Y. Wang, Y. Bao, Z. Ma, X. Zhang, and W. Ma. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR \textquotesingle06, ACM Press, (2006)
3 years ago by @jaeschke
show all tags
retrieval
web
ir
aggregaterank
ranking
search
information
pagerank
retrievalwebiraggregaterankrankingsearchinformationpagerank
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
2
3
⟩
⟩⟩