Inproceedings,

Automatic Classiﬁcation of Large Changes into Maintenance Categories

A. Hindle, D. German, M. Godfrey, and R. Holt.
International Conference on Program Comprehension, Vancouver, (May 2009)in press.

Abstract

Large software systems undergo signiﬁcant evolution during their lifespan, yet often individual changes are not well documented. In this work, we seek to automatically classify large changes into various categories of maintenance tasks — corrective, adaptive, perfective, feature addition, and non-functional improvement — using Machine Learning techniques. In a previous paper, we found that many commits could be classiﬁed easily and reliably based solely on the manual analysis of the commit metadata and commit messages (i.e., without reference to the source code). Our extension is the automation of classiﬁcation by training Machine Learners on features extracted from the commit metadata, such as the word distribution of a commit message, commit author, and modules modiﬁed. We validated the results of the learners via 10-fold cross validation, which achieved accuracies consistently above 50%, indicating good to fair results. We found that the identity of the author of a commit provided much information about the maintenance class of a commit, almost as much as the words of the commit message. This implies that for most large commits, the SCS commit messages plus the commit author identity is enough information to accurately and automatically categorize the nature of the maintenance task.

BibTeX key: hindle09icpc
entry type: inproceedings
address: Vancouver
booktitle: International Conference on Program Comprehension
year: 2009
month: May
note: in press

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{hindle09icpc, abstract = {Large software systems undergo signiﬁcant evolution during their lifespan, yet often individual changes are not well documented. In this work, we seek to automatically classify large changes into various categories of maintenance tasks — corrective, adaptive, perfective, feature addition, and non-functional improvement — using Machine Learning techniques. In a previous paper, we found that many commits could be classiﬁed easily and reliably based solely on the manual analysis of the commit metadata and commit messages (i.e., without reference to the source code). Our extension is the automation of classiﬁcation by training Machine Learners on features extracted from the commit metadata, such as the word distribution of a commit message, commit author, and modules modiﬁed. We validated the results of the learners via 10-fold cross validation, which achieved accuracies consistently above 50%, indicating good to fair results. We found that the identity of the author of a commit provided much information about the maintenance class of a commit, almost as much as the words of the commit message. This implies that for most large commits, the SCS commit messages plus the commit author identity is enough information to accurately and automatically categorize the nature of the maintenance task. }, added-at = {2009-03-08T23:33:18.000+0100}, address = {Vancouver}, author = {Hindle, Abram and German, Daniel M. and Godfrey, Michael W. and Holt, Richard C.}, biburl = {https://www.bibsonomy.org/bibtex/233287fee72a60711dd47c1dc8d84ae13/neilernst}, booktitle = {International Conference on Program Comprehension}, interhash = {fe3464fc3e7c4c5916cef498c0e3fa52}, intrahash = {33287fee72a60711dd47c1dc8d84ae13}, keywords = {evolution machine-learning maintenance software}, month = May, note = {in press}, timestamp = {2009-03-08T23:33:18.000+0100}, title = {Automatic Classiﬁcation of Large Changes into Maintenance Categories }, year = 2009 }

BibSonomy

Automatic Classiﬁcation of Large Changes into Maintenance Categories

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on