copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The impact of vocabulary normalization

D. Binkley, and D. Lawrie. Journal of Software: Evolution and Process, 27 (4): 255--273 (2015)

Abstract

Software development, evolution, and maintenance depend on ever increasing tool support. Recent tools have incorporated increasing analysis of the natural language found in source code, predominately in the identifiers and comments. However, when coders combine abbreviations and acronyms to form multi-word identifiers, they, in essence, invent new vocabulary making the source code's vocabulary differ from that of other software artifacts. This vocabulary mismatch is a potential problem for many techniques imported from information retrieval and natural language processing, which implicitly assume the use of a single common vocabulary. Vocabulary normalization aims to bring the vocabulary of the source in line with that of other artifacts.A prior small-scale experiment demonstrated the value of vocabulary normalization for C code. A more comprehensive experiment using Java code is presented where normalization fails to bring benefit. To investigate the potential underlying causes, over 20,000 non-dictionary words extracted from the program JabRef were normalized by hand (often requiring significant external information). The experiment, repeated using the hand-normalized identifiers, again found that normalization brought no improvement. In response to this unexpected result, the vocabulary differences between Java and C codes are considered and used to help frame directions for future work. Copyright Â© 2015 John Wiley & Sons, Ltd.

Links and resources

BibTeX key: Binkley2015a
entry type: article
year: 2015
journal: Journal of Software: Evolution and Process
number: 4
pages: 255--273
volume: 27
file: :Binkley 2015 impact of vocabulary normalization.pdf:PDF

@publishnetwork's tags highlighted

Cite this publication

@article{Binkley2015a, abstract = {Software development, evolution, and maintenance depend on ever increasing tool support. Recent tools have incorporated increasing analysis of the natural language found in source code, predominately in the identifiers and comments. However, when coders combine abbreviations and acronyms to form multi-word identifiers, they, in essence, invent new vocabulary making the source code's vocabulary differ from that of other software artifacts. This vocabulary mismatch is a potential problem for many techniques imported from information retrieval and natural language processing, which implicitly assume the use of a single common vocabulary. Vocabulary normalization aims to bring the vocabulary of the source in line with that of other artifacts.A prior small-scale experiment demonstrated the value of vocabulary normalization for C code. A more comprehensive experiment using Java code is presented where normalization fails to bring benefit. To investigate the potential underlying causes, over 20,000 non-dictionary words extracted from the program JabRef were normalized by hand (often requiring significant external information). The experiment, repeated using the hand-normalized identifiers, again found that normalization brought no improvement. In response to this unexpected result, the vocabulary differences between Java and C codes are considered and used to help frame directions for future work. Copyright Â© 2015 John Wiley & Sons, Ltd.}, added-at = {2015-05-04T00:48:05.000+0200}, author = {Binkley, Dave and Lawrie, Dawn}, biburl = {https://www.bibsonomy.org/bibtex/2ac147b347c8c3d7cf904d67197cb4702/publishnetwork}, file = {:Binkley 2015 impact of vocabulary normalization.pdf:PDF}, interhash = {ddd4c53746f7c1310e717086b0f6a7b0}, intrahash = {ac147b347c8c3d7cf904d67197cb4702}, journal = {Journal of Software: Evolution and Process}, keywords = {abbreviationsidentifier architecture articles associates bought contentmanagement}, number = 4, pages = {255--273}, timestamp = {2015-06-17T15:09:01.000+0200}, title = {The impact of vocabulary normalization}, volume = 27, year = 2015 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The impact of vocabulary normalization

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML The impact of vocabulary normalization

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The impact of vocabulary normalization

Comments and Reviews
(0)