Named Entity Recognition Using Web Document Corpus
W. Karaa. International Journal of Managing Information Technology (IJMIT), 3 (1):
46 to 55(февраля 2011)
Аннотация
This paper introduces a named entity recognition approach in textual corpus. This Named Entity (NE)
can be a named: location, person, organization, date, time, etc., characterized by instances. A NE is
found in texts accompanied by contexts: words that are left or right of the NE. The work mainly aims at identifying contexts inducing the NE’s nature. As such, The occurrence of the word "President" in a text, means that this word or context may be followed by the name of a president as President Öbama". Likewise, a word preceded by the string "footballer" induces that this is the name of a
footballer. NE recognition may be viewed as a classification method, where every word is assigned to
a NE class, regarding the context. The aim of this study is then to identify and classify the contexts that are most relevant to recognize a NE, those which are frequently found with the NE. A learning approach using training corpus: web documents, constructed from learning examples is then suggested. Frequency representations and modified tf-idf representations are used to calculate the context weights associated to context frequency, learning example frequency, and document frequency in the corpus.
%0 Journal Article
%1 noauthororeditor
%A Karaa, Wahiba Ben Abdessalem
%D 2011
%J International Journal of Managing Information Technology (IJMIT)
%K Information Learning Named Web document. entity extraction tf-idf
%N 1
%P 46 to 55
%T Named Entity Recognition Using Web Document Corpus
%U http://airccse.org/journal/ijmit/papers/3111ijmit04.pdf
%V 3
%X This paper introduces a named entity recognition approach in textual corpus. This Named Entity (NE)
can be a named: location, person, organization, date, time, etc., characterized by instances. A NE is
found in texts accompanied by contexts: words that are left or right of the NE. The work mainly aims at identifying contexts inducing the NE’s nature. As such, The occurrence of the word "President" in a text, means that this word or context may be followed by the name of a president as President Öbama". Likewise, a word preceded by the string "footballer" induces that this is the name of a
footballer. NE recognition may be viewed as a classification method, where every word is assigned to
a NE class, regarding the context. The aim of this study is then to identify and classify the contexts that are most relevant to recognize a NE, those which are frequently found with the NE. A learning approach using training corpus: web documents, constructed from learning examples is then suggested. Frequency representations and modified tf-idf representations are used to calculate the context weights associated to context frequency, learning example frequency, and document frequency in the corpus.
@article{noauthororeditor,
abstract = {This paper introduces a named entity recognition approach in textual corpus. This Named Entity (NE)
can be a named: location, person, organization, date, time, etc., characterized by instances. A NE is
found in texts accompanied by contexts: words that are left or right of the NE. The work mainly aims at identifying contexts inducing the NE’s nature. As such, The occurrence of the word "President" in a text, means that this word or context may be followed by the name of a president as President "Obama". Likewise, a word preceded by the string "footballer" induces that this is the name of a
footballer. NE recognition may be viewed as a classification method, where every word is assigned to
a NE class, regarding the context. The aim of this study is then to identify and classify the contexts that are most relevant to recognize a NE, those which are frequently found with the NE. A learning approach using training corpus: web documents, constructed from learning examples is then suggested. Frequency representations and modified tf-idf representations are used to calculate the context weights associated to context frequency, learning example frequency, and document frequency in the corpus.},
added-at = {2020-06-05T06:07:10.000+0200},
author = {Karaa, Wahiba Ben Abdessalem},
biburl = {https://www.bibsonomy.org/bibtex/24cfe7bdda83efec14d12d7cf19d54fc5/ijmit_journal},
interhash = {175aedf9e2892e4286bdd67b3c5f5a6b},
intrahash = {4cfe7bdda83efec14d12d7cf19d54fc5},
journal = {International Journal of Managing Information Technology (IJMIT)},
keywords = {Information Learning Named Web document. entity extraction tf-idf},
language = {English},
month = {February},
number = 1,
pages = {46 to 55},
timestamp = {2020-06-05T06:07:10.000+0200},
title = {Named Entity Recognition Using Web Document Corpus
},
url = {http://airccse.org/journal/ijmit/papers/3111ijmit04.pdf},
volume = 3,
year = 2011
}