- monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fo...monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu. The EMILLE monolingual corpora contain approximately 92,799,000 words (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu). The parallel corpus consists of 200,000 words of text in English and its accompanying translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. The annotated component includes the Urdu monolingual and parallel corpora annotated for parts-of-speech, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use. The corpus is marked up using CES-compliant SGML, and encoded using Unicode.
- Parallel corpora, freely available
- «WeScience: Registration now Open -- RSVP!»
- Proceedings of Treebanks and Linguistic Theories TLT '06, Prague, ÚFAL, (2006)
- Proceedings of the Corpus Linguistics 2001 Conference, page 466--475. Lancaster, UK, UCREL, (2001)
- Proceedings of the 31st annual meeting on Association for Computational Linguistics, page 17--22. Morristown, NJ, USA, Association for Computational Linguistics, (1993)
- Conference Proceedings: the tenth Machine Translation Summit, page 79--86. Phuket, Thailand, AAMT, AAMT, (2005)
- Proceedings of the first SIGHAN Workshop on Chinese Language Processing, 18, page 1--5. Morristown, NJ, Association for Computational Linguistics, Association for Computational Linguistics, (2002)
- HLT '94: Proceedings of the workshop on Human Language Technology, page 114--119. Morristown, NJ, USA, Association for Computational Linguistics, (1994)
- (2007)
- Cambridge University Press, (2008)
- (2008)Submitted to the Research Council of Norway. .
- Uppsala University, Uppsala, (2003)Studia Linguistica Upsaliensia 1, ISSN 1652-1366, ISBN 91-554-5815-7 .
- (2006)
- Meta 43(4):542--556 (1998)
- A Festschrift for Kjell Johan Sæbø -- in partial fulfilment of the requirements for the celebration of his 50th birthday, Unipub, Oslo, (2006)
- Nordic Journal of Linguistics 30(02):185--208 (2007)
- Blackwell Publishing, Malden, Mass., (2008)
- Corpus Linguistics and Linguistic Theory 1(2):277--294 (2005)
- International Journal of Corpus Linguistics (2003)
- (2008)
- (2006)
- (2007)


user