The chapter discusses the various types of corpora, and provides a sense of how words behave inside them. Quantitative exploration of individual words in corpus is shown using frequency and information content measures. Quantitative exploration of co-occurrences of words, called collocations, is shown using the point-wise mutual information and other measures. Concordancers, a tool for viewing words in their immediate contextual environment within a corpus, are introduced for qualitative exploration of corpora. Experiment: Comparing word frequencies between domain-specific corpora.
%0 Book Section
%1 barriere_exploring_2016
%A Barrière, Caroline
%B Natural Language Understanding in a Semantic Web Context
%D 2016
%I Springer International Publishing
%K linguistik
%P 59--84
%R 10.1007/978-3-319-41337-2_5
%T Exploring Corpora
%U http://link.springer.com/chapter/10.1007/978-3-319-41337-2_5
%X The chapter discusses the various types of corpora, and provides a sense of how words behave inside them. Quantitative exploration of individual words in corpus is shown using frequency and information content measures. Quantitative exploration of co-occurrences of words, called collocations, is shown using the point-wise mutual information and other measures. Concordancers, a tool for viewing words in their immediate contextual environment within a corpus, are introduced for qualitative exploration of corpora. Experiment: Comparing word frequencies between domain-specific corpora.
%@ 978-3-319-41335-8 978-3-319-41337-2