2018. Welche Teile des Webs sollen für zukünftige Generationen archiviert werden? Das erkundet derzeit die Deutsche Nationalbibliothek und befragt Internetnutzer. Im Interview spricht Vizedirektorin Ute Schwens über den Stand der Dinge bei der Webarchivierung und die Auswirkungen des neuen Urheberrechts.
2012. Metadata Statistics for a Large Web Corpus
ABSTRACT
We provide an analysis of the adoption of metadata standards on the Web based a large crawl of the Web. In particular, we look at what forms of syntax and vocabularies publishers are using to mark up data inside HTML pages. We also describe the process that we have followed and the difficulties involved in web data extraction.
This is the public wiki for the Heritrix archival crawler project. Heritrix is the Internet Archive’s open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or mis-said as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits).
arXiv est un service de distribution gratuit et une archive en libre accès de 2 324 586 articles scientifiques dans les domaines de la physique, des mathématiques, de l'informatique, de la biologie quantitative, de la finance quantitative, des statistiques, du génie électrique et de la science des systèmes, ainsi que de l'économie. Les éléments de ce site ne sont pas évalués par arXiv.
Le serveur HAL Thèses a pour objectif de promouvoir l'auto-archivage en ligne des thèses de doctorat et habilitations à diriger des recherches (HDR), qui sont des documents importants pour la communication scientifique entre chercheurs.
arXiv est un service de distribution gratuit et une archive en libre accès de 2 318 920 articles scientifiques dans les domaines de la physique, des mathématiques, de l'informatique, de la biologie quantitative, de la finance quantitative, des statistiques, du génie électrique et de la science des systèmes, ainsi que de l'économie. Les éléments de ce site ne sont pas évalués par arXiv.
ArXiv est une archive ouverte de prépublications électroniques d'articles scientifiques dans les domaines de la physique, des mathématiques, de l'informatique, de la biologie quantitative, de la finance quantitative, de la statistique, de l'ingénierie électrique et des systèmes, et de l'économie1, et qui est accessible gratuitement par Internet.
HAL est une plateforme en ligne développée en 2001 par le Centre pour la communication scientifique directe (CCSD) du CNRS, destinée au dépôt et à la diffusion d'articles de chercheurs publiés ou non, et de thèses, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. L'accès aux données est libre, mais pas nécessairement leur utilisation ou réutilisation.
When learning a second language (L2), one of the most important but tedious components that often demoralizes students with its ineffectiveness and inefficiency is vocabulary acquisition, or more simply put, memorizing words. In light of such, a personalized and educational vocabulary recommendation system that traces a learner’s vocabulary knowledge state would have an immense learning impact as it could resolve both issues. Therefore, in this paper, we propose and release data for a novel task called Pedagogical Word Recommendation (PWR). The main goal of PWR is to predict whether a given learner knows a given word based on other words the learner has already seen. To elaborate, we collect this data via an Intelligent Tutoring System (ITS) called Santa that is serviced to ∼1M L2 learners who study for the standardized English exam, TOEIC. As a feature of this ITS, students can directly indicate words they do not know from the questions they solved to create wordbooks. Finally, we report the evaluation results of a Neural Collaborative Filtering approach along with an exploratory data analysis and discuss the impact and efficacy of this dataset as a baseline for future studies on this task.
The present article reviews a series of selected functional and structural magnetic resonance imaging (MRI) studies focusing on the neuroplasticity of second language vocabulary acquisition as a function of linguistic experience. A clear-cut picture emerging from the review is that brain changes induced by second language vocabulary acquisition are observed at both functional and structural levels. Importantly, second language experience is even able to shape brain structures in short-term training of a few weeks. The evidence that linguistic experience can sculpt the brain in late second language learners, and even solely after a short-term laboratory training, constitutes a strong argument against theoretical approaches postulating that environmental factors are relatively unimportant for language development. Rather, combined neuroimaging data lend support to the determining role of linguistic experience in linguistic knowledge emergence during second language acquisition, at least at the lexical level.
A. Clauset, C. Shalizi, and M. Newman. (2007)cite arxiv:0706.1062Comment: 43 pages, 11 figures, 7 tables, 4 appendices; code available at http://www.santafe.edu/~aaronc/powerlaws/.
F. Hoppe, T. Tietz, D. Dessì, N. Meyer, M. Sprau, M. Alam, and H. Sack. Proceedings of the Third Workshop on Humanities in the Semantic Web (WHiSe), co-located with 15th Extended Semantic Web Conference (ESWC), page 15--20. CEUR WS, (2020)event-place: Virtual Conference.
O. Vsesviatska, T. Tietz, F. Hoppe, M. Sprau, N. Meyer, D. Dessì, and H. Sack. Proceedings of the 36th Annual ACM Symposium on Applied Computing (ACM SAC), page 1855--1863. Association for Computing Machinery, (2021)event-place: Virtual Conference.
F. Hoppe, T. Tietz, D. Dess\`ı, M. Sprau, M. Alam, and H. Sack. Proceedings of the Third Workshop on Humanities in the Semantic Web (WHiSe 2020) co-located with 15th Extended Semantic Web Conference (ESWC 2020), Heraklion, Greece, June 2, 2020 (online), volume 2695 of CEUR Workshop Proceedings, page 15--20. CEUR-WS.org, (2020)
M. Vafaie, O. Bruns, D. Dess\`ı, N. Pilz, and H. Sack. Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021) co-located with ACM/IEEE Joint Conference on Digital Libraries 2021 (JCDL 2021), Online event, September 30-October 1, 2021, volume 2981 of CEUR Workshop Proceedings, CEUR-WS.org, (2021)
M. Paris, and R. Jäschke. Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, volume 12816 of Lecture Notes in Artificial Intelligence, page 1--14. Springer, (2021)
M. Spaniol, and G. Weikum. Proceedings of the 21st international conference companion on World Wide Web - WWW \textquotesingle12 Companion, ACM Press, (2012)
A. Spitz, J. Strötgen, and M. Gertz. Companion Proceedings of the The Web Conference 2018, page 1731--1736. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2018)
J. Singh, and A. Anand. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, page 361--364. New York, NY, USA, ACM, (2017)
T. Tunsch. EVA 2011 Berlin: 9.-11. November 2011 in den Staatlichen Museen zu Berlin am Kulturforum Potsdamer Platz; Elektronische Medien & Kunst, Kultur, Historie; die 18. Berliner Veranstaltung der Internationalen EVA-Serie Electronic Imaging & the Visual Arts; Konferenzband, page 23--42. Berlin, Staatliche Museen zu Berlin, Gesellschaft z. Förderung angewandter Informatik, EVA Conferences International, (2011)
H. SalahEldeen, and M. Nelson. Proceedings of the Second International Conference on Theory and Practice of Digital Libraries, page 125--137. Berlin/Heidelberg, Springer, (2012)
D. Gomes, M. Costa, D. Cruz, J. Miranda, and S. Fontes. Proceedings of the 22Nd International Conference on World Wide Web, page 1059--1066. New York, NY, USA, ACM, (2013)
O. Alonso, J. Strötgen, R. Baeza-Yates, and M. Gertz. Proceedings of the 1st International Temporal Web Analytics Workshop, volume 707 of CEUR Workshop Proceedings, page 1-8. CEUR-WS.org, (2011)
H. SalahEldeen, and M. Nelson. Proceedings of the 22Nd International Conference on World Wide Web, page 1075--1082. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2013)
S. Nunes, C. Ribeiro, and G. David. Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, page 129--136. New York, NY, USA, ACM, (2007)
E. Boese, and A. Howe. Proceedings of the 14th ACM International Conference on Information and Knowledge Management, page 632--639. New York, NY, USA, ACM, (2005)
A. Jatowt, C. Au Yeung, and K. Tanaka. Proceedings of the 22Nd ACM International Conference on Conference on Information &\#38; Knowledge Management, page 2273--2278. New York, NY, USA, ACM, (2013)
M. Costa, D. Gomes, F. Couto, and M. Silva. Proceedings of the 22nd International Conference on World Wide Web Companion, page 1045--1050. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2013)
R. Doumat, E. Egyed-Zsigmond, and J. Pinon. Case-Based Reasoning. Research and Development, volume 6176 of Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, (2010)
D. Gomes, J. Miranda, and M. Costa. Research and Advanced Technology for Digital Libraries, volume 6966 of Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, (2011)
J. Abowd, L. Vilhuber, and W. Block. Privacy in Statistical Databases, volume 7556 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2012)
B. Redslob. Materialien zur Information und Dokumentation Band 17 Verlag für Berlin-Brandenburg, Potsdam, (2002)Verfasserangabe: Beate Redslob ... et al. ; Quelldatenbank: NEBIS ; Format:marcform: print ; Umfang: 170 S. Ill..
F. Bischoff, and U. Schäfer. Forschung in der digitalen Welt - Sicherung, Erschließung und Aufbereitung von Wissensbeständen, Hamburg University Press, Hamburg, (2006)
Y. Chung, M. Toyoda, and M. Kitsuregawa. Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, page 9--16. New York, NY, USA, ACM, (2009)
M. Kitsuregawa, T. Tamura, M. Toyoda, and N. Kaji. Progress in WWW Research and Development, volume 4976 of Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, (2008)
P. Lyman. Building a National Strategy for Preservation: Issues in Digital Media Archiving, Council on Library and Information Resources Washington, D.C. and Library of Congress, Washington, D.C., (April 2002)