Big Data = Big Insights? Operationalising Brooks' Law in a Massive
GitHub Data Set
C. Gote, P. Mavrodiev, F. Schweitzer, and I. Scholtes. (2022)cite arxiv:2201.04588Comment: Conference: ICSE 2022 - The 44th International Conference on Software Engineering, 25 pages, 4 figures, 3 tables.
Abstract
Massive data from software repositories and collaboration tools are widely
used to study social aspects in software development. One question that several
recent works have addressed is how a software project's size and structure
influence team productivity, a question famously considered in Brooks' law.
Recent studies using massive repository data suggest that developers in larger
teams tend to be less productive than smaller teams. Despite using similar
methods and data, other studies argue for a positive linear or even
super-linear relationship between team size and productivity, thus contesting
the view of software economics that software projects are diseconomies of
scale. In our work, we study challenges that can explain the disagreement
between recent studies of developer productivity in massive repository data. We
further provide, to the best of our knowledge, the largest, curated corpus of
GitHub projects tailored to investigate the influence of team size and
collaboration patterns on individual and collective productivity. Our work
contributes to the ongoing discussion on the choice of productivity metrics in
the operationalisation of hypotheses about determinants of successful software
projects. It further highlights general pitfalls in big data analysis and shows
that the use of bigger data sets does not automatically lead to more reliable
insights.
%0 Generic
%1 gote2022insights
%A Gote, Christoph
%A Mavrodiev, Pavlin
%A Schweitzer, Frank
%A Scholtes, Ingo
%D 2022
%K caidas-area-css
%T Big Data = Big Insights? Operationalising Brooks' Law in a Massive
GitHub Data Set
%U http://arxiv.org/abs/2201.04588
%X Massive data from software repositories and collaboration tools are widely
used to study social aspects in software development. One question that several
recent works have addressed is how a software project's size and structure
influence team productivity, a question famously considered in Brooks' law.
Recent studies using massive repository data suggest that developers in larger
teams tend to be less productive than smaller teams. Despite using similar
methods and data, other studies argue for a positive linear or even
super-linear relationship between team size and productivity, thus contesting
the view of software economics that software projects are diseconomies of
scale. In our work, we study challenges that can explain the disagreement
between recent studies of developer productivity in massive repository data. We
further provide, to the best of our knowledge, the largest, curated corpus of
GitHub projects tailored to investigate the influence of team size and
collaboration patterns on individual and collective productivity. Our work
contributes to the ongoing discussion on the choice of productivity metrics in
the operationalisation of hypotheses about determinants of successful software
projects. It further highlights general pitfalls in big data analysis and shows
that the use of bigger data sets does not automatically lead to more reliable
insights.
@misc{gote2022insights,
abstract = {Massive data from software repositories and collaboration tools are widely
used to study social aspects in software development. One question that several
recent works have addressed is how a software project's size and structure
influence team productivity, a question famously considered in Brooks' law.
Recent studies using massive repository data suggest that developers in larger
teams tend to be less productive than smaller teams. Despite using similar
methods and data, other studies argue for a positive linear or even
super-linear relationship between team size and productivity, thus contesting
the view of software economics that software projects are diseconomies of
scale. In our work, we study challenges that can explain the disagreement
between recent studies of developer productivity in massive repository data. We
further provide, to the best of our knowledge, the largest, curated corpus of
GitHub projects tailored to investigate the influence of team size and
collaboration patterns on individual and collective productivity. Our work
contributes to the ongoing discussion on the choice of productivity metrics in
the operationalisation of hypotheses about determinants of successful software
projects. It further highlights general pitfalls in big data analysis and shows
that the use of bigger data sets does not automatically lead to more reliable
insights.},
added-at = {2023-01-23T10:31:00.000+0100},
author = {Gote, Christoph and Mavrodiev, Pavlin and Schweitzer, Frank and Scholtes, Ingo},
biburl = {https://www.bibsonomy.org/bibtex/212a83210fffee035016c86e6f58247e8/ifland},
interhash = {0e04b79dacb27448c25a7c72d682b53b},
intrahash = {12a83210fffee035016c86e6f58247e8},
keywords = {caidas-area-css},
note = {cite arxiv:2201.04588Comment: Conference: ICSE 2022 - The 44th International Conference on Software Engineering, 25 pages, 4 figures, 3 tables},
timestamp = {2023-01-23T10:31:00.000+0100},
title = {Big Data = Big Insights? Operationalising Brooks' Law in a Massive
GitHub Data Set},
url = {http://arxiv.org/abs/2201.04588},
year = 2022
}