The purpose of these datasets is to support equivalence and subsumption ontology matching. There are five ontology pairs extracted from MONDO and UMLS: Source Ontology Pair Category MONDO OMIM-ORDO Disease MONDO NCIT-DOID Disease UMLS SNOMED-FMA Body UMLS SNOMED-NCIT Pharm UMLS SNOMED-NCIT Neoplas Each pair is associated with three folders: "raw_data", "equiv_match", and "subs_match", corresponding to the downloaded source ontologies, the package for equivalence matching, and the package for subsumption matching. See detailed documentation at: https://krr-oxford.github.io/DeepOnto/#/om_resources. See the incoming OAEI Bio-ML track at: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/. See our resource paper at: https://arxiv.org/abs/2205.03447.
The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computational-behavioral studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female), for a total of 34000 sentences. Sentences are of the form "put red at G9 now". audio_25k.zip contains the wav format utterances at a 25 kHz sampling rate in a separate directory per talker alignments.zip provides word-level time alignments, again separated by talker s1.zip, s2.zip etc contain .jpg videos for each talker [note that due to an oversight, no video for talker t21 is available] The Grid Corpus is described in detail in the paper jasagrid.pdf included in the dataset.
A. Jaiswal, S. Singh, и S. Tripathy. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), стр. 1-6. IEEE, (июля 2023)