The evolution of a software system can be studied in terms of how various properties as reflected by software metrics change over time. Current models of software evolution have allowed for inferences to be drawn about certain attributes of the software system, for instance, regarding the architecture, complexity and its impact on the development effort. However, an inherent limitation of these models is that they do not provide any direct insight into where growth takes place. In particular, we cannot assess the impact of evolution on the underlying distribution of size and complexity among the various classes. Such an analysis is needed in order to answer questions such as 'do developers tend to evenly distribute complexity as systems get bigger?', and 'do large and complex classes get bigger over time?'. These are questions of more than passing interest since by understanding what typical and successful software evolution looks like, we can identify anomalous situations and take action earlier than might otherwise be possible. Information gained from an analysis of the distribution of growth will also show if there are consistent boundaries within which a software design structure exists. The specific research questions that we address in Chapter 5 (Growth Dynamics) of the thesis this data accompanies are: What is the nature of distribution of software size and complexity measures? How does the profile and shape of this distribution change as software systems evolve? Is the rate and nature of change erratic? Do large and complex classes become bigger and more complex as software systems evolve? In our study of metric distributions, we focused on 10 different measures that span a range of size and complexity measures. In order to assess assigned responsibilities we use the two metrics Load Instruction Count and Store Instruction Count. Both metrics provide a measure for the frequency of state changes in data containers within a system. Number of Branches, on the other hand, records all branch instructions and is used to measure the structural complexity at class level. This measure is equivalent to Weighted Method Count (WMC) as proposed by Chidamber and Kemerer (1994) if a weight of 1 is applied for all methods and the complexity measure used is cyclomatic complexity. We use the measures of Fan-Out Count and Type Construction Count to obtain insight into the dynamics of the software systems. The former offers a means to document the degree of delegation, whereas the latter can be used to count the frequency of object instantiations. The remaining metrics provide structural size and complexity measures. In-Degree Count and Out-Degree Count reveal the coupling of classes within a system. These measures are extracted from the type dependency graph that we construct for each analyzed system. The vertices in this graph are classes, whereas the edges are directed links between classes. We associate popularity (i.e., the number of incoming links) with In-Degree Count and usage or delegation (i.e., the number of outgoing links) with Out-Degree Count. Number of Methods, Public Method Count, and Number of Attributes define typical object-oriented size measures and provide insights into the extent of data and functionality encapsulation. The raw metric data (4 .txt files and 1 .log file in a .zip file measuring ~0.5MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
History
Parent title
Originally presented as an appendix to: Vasa, R. (2010). Growth and change dynamics in open source software systems. PhD thesis