The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.
L. Antiqueira, T. Pardo, M. Nunes, O. Oliveira Jr., and L. Costa. Fourth Workshop in Information and Human Language Technology (TIL'06) in the Proceedings of International Joint Conference IBERAMIA-SBIA-SBRN, Ribeirão Preto, Brazil, ICMC-USP, (October 2006)
T. Adler, and L. de Alfaro. WWW '07: Proceedings of the 16th international conference on World Wide Web, page 261--270. New York, NY, USA, ACM Press, (2007)
M. Pavlov, and R. Ichise. Proceedings of the Workshop on Finding Experts on the Web with Semantics (FEWS2007) at ISWC/ASWC2007, Busan, South Korea, (November 2007)