The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computational-behavioral studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female), for a total of 34000 sentences. Sentences are of the form "put red at G9 now". audio_25k.zip contains the wav format utterances at a 25 kHz sampling rate in a separate directory per talker alignments.zip provides word-level time alignments, again separated by talker s1.zip, s2.zip etc contain .jpg videos for each talker [note that due to an oversight, no video for talker t21 is available] The Grid Corpus is described in detail in the paper jasagrid.pdf included in the dataset.
D. Schmidt, A. Zehe, J. Lorenzen, L. Sergel, S. Düker, M. Krug, and F. Puppe. Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, page 49--56. Punta Cana, Dominican Republic (online), Association for Computational Linguistics, (November 2021)
D. Schmidt, A. Zehe, J. Lorenzen, L. Sergel, S. Düker, M. Krug, and F. Puppe. Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, page 49--56. Punta Cana, Dominican Republic (online), Association for Computational Linguistics, (November 2021)
N. Dehouche, and A. Wongkitrungrueng. Proceedings of ANZMAC 2018: The 20th Conference of the Australian and New Zealand Marketing Academy. Adelaide (Australia), page 3--5 December. (2018)