The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.
Aman Shakya, Vilas Wuwongse, Hideaki Takeda, and Ikki Ohmukai. Proceedings of the Semantic Authoring, Annotation and Knowledge Markup, page 47-54. Whistler, British Columbia, Canada, (October 2007)Located at the 4th International Conference on Knowledge Capture (KCap 2007).
Aman Shakya, Hideaki Takeda, Vilas Wuwongse, and Ikki Ohmukai. Proceedings of the IADIS International Conference WWW/Internet 2007, 1, page 371-380. Vila Real, Portugal, International Association for Development of the Information Society, IADIS Press, (October 2007)