sign in · help · news · about · deen

BibSonomy ::  user :: unhammer ::

The blue social bookmark and publication sharing system.
 

bookmarks

 (1)
<< < 1 > >> 
  • monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fo...
    monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu. The EMILLE monolingual corpora contain approximately 92,799,000  words (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu). The parallel corpus consists of 200,000 words of text in English and its accompanying translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. The annotated component includes the Urdu monolingual and parallel corpora annotated for parts-of-speech, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use. The corpus is marked up using CES-compliant SGML, and encoded using Unicode.
    to Assamese Bengali Gujarati Hindi Kannada Kashmiri Malayalam Marathi Oriya Punjabi Sinhala Tamil Telegu Urdu corpus parallel by unhammer on Apr 27, 2009, 2:57 PM
    (0)
<< < 1 > >>bookmarks per page: 5 10 20 50 100  

publications

 (4)
<< < 1 > >> 
<< < 1 > >>publications per page: 5 10 20 50 100  
a gripper