The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computational-behavioral studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female), for a total of 34000 sentences. Sentences are of the form "put red at G9 now". audio_25k.zip contains the wav format utterances at a 25 kHz sampling rate in a separate directory per talker alignments.zip provides word-level time alignments, again separated by talker s1.zip, s2.zip etc contain .jpg videos for each talker [note that due to an oversight, no video for talker t21 is available] The Grid Corpus is described in detail in the paper jasagrid.pdf included in the dataset.
The purpose of these datasets is to support equivalence and subsumption ontology matching. There are five ontology pairs extracted from MONDO and UMLS: Source Ontology Pair Category MONDO OMIM-ORDO Disease MONDO NCIT-DOID Disease UMLS SNOMED-FMA Body UMLS SNOMED-NCIT Pharm UMLS SNOMED-NCIT Neoplas Each pair is associated with three folders: "raw_data", "equiv_match", and "subs_match", corresponding to the downloaded source ontologies, the package for equivalence matching, and the package for subsumption matching. See detailed documentation at: https://krr-oxford.github.io/DeepOnto/#/om_resources. See the incoming OAEI Bio-ML track at: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/. See our resource paper at: https://arxiv.org/abs/2205.03447.
R. Bommasani, and C. Cardie. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 8075--8096. Online, Association for Computational Linguistics, (November 2020)
T. McCoy, E. Pavlick, and T. Linzen. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, page 3428--3448. Florence, Italy, Association for Computational Linguistics, (July 2019)
O. Kashefi, and R. Hwa. Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), page 200--208. Online, Association for Computational Linguistics, (November 2020)
G. Cohen, S. Afshar, J. Tapson, and A. van Schaik. (2017)cite arxiv:1702.05373Comment: The dataset is now available for download from https://www.westernsydney.edu.au/bens/home/reproducible_research/emnist. This link is also included in the revised article.
M. Baroni, F. Chantree, A. Kilgarriff, and S. Sharoff. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, European Language Resources Association (ELRA), (May 2008)
K. Jiang, D. Wu, and H. Jiang. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), page 318--323. (2019)
J. Berant, A. Chou, R. Frostig, and P. Liang. Proceedings of the 2013 conference on empirical methods in natural language processing, page 1533--1544. (2013)
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. (2016)cite arxiv:1606.05250Comment: To appear in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP).
X. Wang, Z. Wang, X. Han, W. Jiang, R. Han, Z. Liu, J. Li, P. Li, Y. Lin, and J. Zhou. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 1652--1671. Online, Association for Computational Linguistics, (November 2020)
S. Wunderlich, M. Ring, D. Landes, and A. Hotho. International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019) - Seville, Spain, May 13-15, 2019, Proceedings, volume 951 of Advances in Intelligent Systems and Computing, page 14--24. Springer, (2019)
R. Snow, B. O'Connor, D. Jurafsky, and A. Ng. EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 254--263. Morristown, NJ, USA, Association for Computational Linguistics, (2008)