Abstract. In order to support web applications to understand the content of HTML pages an increasing number of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, Microformats. The annotations are used by Google, Yahoo!, Yandex, Bing and Facebook to enrich search results and to display entity descriptions within their applications. In this paper, we present a series of publicly accessible Microdata, RDFa, Microformats datasets that we have extracted from three large web corpora dating from 2010, 2012 and 2013.
L. Ehrlinger, J. Schrott, und W. Wöß. Database and Expert Systems Applications - DEXA 2023 Workshops, Seite 3--10. Cham, Springer Nature Switzerland, (2023)
V. Ehrenstein, H. Kharrazi, H. Lehmann, und C. Taylor. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 Internet, Agency for Healthcare Research and Quality (US), (2019)
C. Scholz, J. Illig, M. Atzmueller, und G. Stumme. Proceedings of the 25th ACM Conference on Hypertext and Social Media, Seite 279--284. Santiago, Chile, ACM, (September 2014)
N. Tatti, T. Mielikainen, A. Gionis, und H. Mannila. ICDM '06: Proceedings of the Sixth International Conference on Data Mining, Seite 603--612. Washington, DC, USA, IEEE Computer Society, (2006)