Abstract. In order to support web applications to understand the content of HTML pages an increasing number of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, Microformats. The annotations are used by Google, Yahoo!, Yandex, Bing and Facebook to enrich search results and to display entity descriptions within their applications. In this paper, we present a series of publicly accessible Microdata, RDFa, Microformats datasets that we have extracted from three large web corpora dating from 2010, 2012 and 2013.
M. Araki, and Y. Funakura. Spoken Dialogue Systems for Ambient Environments: Second International Workshop, IWSDS 2010, Gotemba, Shizuoka, Japan, volume 6392 of Lecture Notes in Artificial Intelligence, Springer, Berlin, (2010)
R. Arndt, R. Troncy, S. Staab, L. Hardman, and M. Vacura. The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, (2008)