Code refactoring is widely practiced by software developers. There is an explicit assumption that code refactoring improves the structural quality of a software project, thereby also reducing its bug proneness. However, refactoring is often applied with different purposes in practice. Depending on the complexity of certain refactorings, developers might unconsciously make the source code more susceptible to have bugs. In this paper, we present a longitudinal study of 5 Java open source projects, where 20,689 refactorings, and 1,033 bug reports were analyzed. We found that many bugs are introduced in the refactored code as soon as the first immediate change is made on it. Furthermore, code elements affected by refactorings performed in conjunction with other changes are more prone to have bugs than those affected by pure refactorings.
This phase consists of selecting a set of software projects for analysis. We relied on open source projects selected from GitHub. To conduct our study, we selected 5 open source projects which follow three criteria as follows. First, they are highly popular on GitHub and from different domains. Second, users actively use their issue tracking systems such as Bugzilla and the GitHub issue management system for bug reporting and improvement suggestions. Third, at least 90\% of the code repository is written in Java, which is a very popular language. Table 1 provides general data about the analyzed projects. The first column presents the name of the software project. The second column presents the number of lines of code. The third column presents the number of classes. The fourth column presents the analyzed period. The fifth column presents the number of commits. The sixth column presents the number of bug reports.
Software Project | LOC | #Classes | Analyzed Period | #Commits | #Bug Reports |
---|---|---|---|---|---|
Ant | 137,314 | 1,784 | 2000-01 to 2016-07 | 13,331 | 70 |
Derby | 1,760,766 | 3,741 | 2004-08 to 2016-12 | 8,135 | 173 |
Okhttp | 49,739 | 642 | 2011-05 to 2016-08 | 2,645 | 270 |
Presto | 350,976 | 4,146 | 2012-08 to 2016-08 | 8,056 | 296 |
Tomcat | 668,720 | 2,275 | 2006-03 to 2016-12 | 296 | 282 |
We choose to study the 11 most commonly investigated refactoring types in the literature (Murphy Hill, Parnin and Black, 2012). These refactoring types are defined in Fowler's catalog (Fowler, 1991). Moreover, we used the Refactoring Miner tool (Tsantalis et al., 2013) to identify refactoring operations in the selected projects. Tsantalis et al., have reported that Refactoring Miner has a precision of 96.4% and low rates of false positives for all refactoring types, which we confirmed in our validation process, as discussed in Phase 3. Refactoring Miner detects all 11 refactoring types investigated in our study (Murphy Hill, Parnin and Black, 2012). We identified 20,689 refactoring operations in total. Table 2 presents the refactoring types analyzed in our study. The first column presents the refactoring type. The second column describes the problem that is intended to be addressed by each refactoring type. The third column describes the solution intended by applying each refactoring type.
Refactoring Type | Problem | Solution |
---|---|---|
Extract Method | Parts of code should be gathered in a single method | Create a new method with the extracted code |
Extract Interface | Class that implement commonly used resources | Extract the subset into an interface or two classes have part of their interfaces in common |
Extract Superclass | There are two classes with similar features | Create a superclass and move the common features to the superclass |
Inline Method | When a method body is more obvious than the method | Replace calls to the method with the method’s itself, use this technique content and delete the method itself |
Move Field | A field is, or will be, used by another class more than the class in which it's defined | Create a new field in the target class, change all its users |
Move Method | A method is, or will be, using or used by more features of another class than the class in which it is defined | Create a new method with a similar body in the class it uses most. Either turn the old method into a simple delegation, or remove it altogether |
Rename Method | The name of a method does not reveal its purpose | Change the name of the method |
Pull up Field | Two subclasses have the same field | Move the field to the superclass |
Pull up Method | There are methods with identical results on subclasses | Move them to the superclass |
Push down Field | A field is used only by some subclasses | Move the field to those subclasses |
Push down Method | The behavior on a superclass is relevant only for some of its subclasses | Move it to those subclasses |
In this work, it is considered as refactored elements all those directly affected by the refactoring. If a refactoring is applied only in a method body, only this method is considered as refactored element. For instance, lets consider the Move Method refactoring. In this refactoring type, a method m is moved from class A to B. Hence, the considered refactored elements in this case would be {m, A, B}. All m method callers are affected by this refactoring, but we do not consider them as refactored elements. As another example, let us consider the Rename Method refactoring. In this scenario, a new name is given to the method m and the refactored element set would be just {m}. For each refactoring type a different refactored element set is used. Table 3 presents the considered refactored elements for each type of refactoring.
Refactoring | Refactored Elements |
---|---|
Extract Interface | classes implementing the new interface. |
Extract Method | (i) method created; (ii) method from where the new method was extracted; and (iii) class containing both methods. |
Extract Superclass | (i) classes extending the new class; and (ii) new class created. |
Inline Method | (i) the method which received the new code; and (ii) class containing the method. |
Move Field | the two classes affected by the change: the class which the field used to reside and the class which received the field. |
Move Method | the two classes affected by the change: the class which the method used to reside and the class which received the method. |
Pull Up Field | the two classes affected by the change: the class which the field used to reside and the class which received the field. |
Pull Up Method | the two classes affected by the change: the class which the method used to reside and the class which received the method. |
Push Down Field | the two classes affected by the change: the class which the field used to reside and the class which received the field. |
Push Down Method | the two classes affected by the change: the class which the method used to reside and the class which received the method. |
Rename Method | the renamed method and the class that contains it. |
We conducted a manual validation of the refactorings identified by the Refactoring Miner tool to ensure the reliability of our data. Such validation covered a random set of refactoring operations from different refactoring types since the precision of the Refactoring Miner tool could vary due to the rules implemented to detect each refactoring type. We recruited ten undergraduate students to analyze the samples. The samples were divided into ten disjointed sets, and each student validated a different one. After applying a statistical test with a confidence level of 95%, we observed a high precision of the tool for each refactoring, with a median of 88.36%. By applying the Grubb outlier test (Grubbs, 1969) (alpha = 0.05), we could not find any outliers, indicating that no refactoring type strongly influences the median precision found. Thus, the obtained results represent a key factor in the reliability of the results reported in this study.
We also evaluate root-canal and floss refactoring, and we conducted a manual inspection of a randomly selected sample of 2,119 refactorings. We manually analyzed whether the changes performed during the refactoring do not modify the behavior (root-canal refactoring). We classify a change as floss refactoring when there are behavioral changes, such as an addition of methods or changes in the method body that are not related to refactoring transformations. When we did not identify behavioral changes, the refactoring was classified as root-canal. This inspection was performed by three researchers. Two of them are very experienced refactoring researchers. The most experienced one solved the conflicts. As a result, we found that developers apply root-canal refactoring in 31.5% of the cases. The confidence level for this number is 95% with a confidence interval of 5%.
We selected bug reports with status resolved fixed, verified fixed, closed, or closed fixed for analysis. Furthermore, we chose to analyze only bugs labeled as bug in the issue tracking system. Table 1 presents the number of bug reports for each software project (column #Bug Reports).
A common practice among developers is to include the bug report number in the commit comment whenever they fix a bug associated with it (Śliwerski, Zimmermann, and Zeller, 2005). In this way, to map a bug report with its fix commit, we automatically search log messages for references to bug reports such as "bug 23442" or "fix for bug 23442" as proposed by Dallmeier and Zimmerman (Dallmeier, and Zimmermann, 2007). We ignored bug reports that we could not find the commit of the fix because, without the fix commit, we cannot find the fixed files. Thus, these bug reports are considered not functional (Ye, Bunescu, and Liu, 2014). We consider as buggy elements, all code elements that were modified in the fix commit.
Given the bug-fix commit and the bug-fix elements identified, we used the bug-introducing change identification algorithm proposed by Śliwerski, Zimmermann, and Zeller (the SZZ Algorithm) to identify when the bug was introduced in the project. SZZ is currently the most used algorithm for automatically identify fix-inducing commits (Costa et al., 2017). SZZ aims at identifying the lines modified in a bug-fixing commit, and then it identifies the fix-inducing change immediately before each line of the bug-fixing commit. As the original version of SZZ may have false positives and false negatives, we have used a combination of heuristics proposed by (Kim et al.) and (Williams and Spacco). Kim et al. mention two limitations of the original SZZ: (i) not all changes are fixes, i.e., even if a file change is defined as a bug-fix by developers, not all hunks in the change are bug-fixes; (ii) there is not enough information in bug tracking systems, and because of this an incorrect bug-inducing commit may be chosen. Using their approach, we can remove 38-51% of false positives and 14% of false negatives as compared to the original implementation of SZZ. SZZ outputs a list of commits related to the introduction of the bug in the software system. The results provided by SZZ will be used to compute the distance between the refactored commit and the commit where the bug was introduced (see Phase 7). For analysis purposes, we considered only the newest commit reported by SZZ.
Previous research (Herzig, Just, and Zeller, 2013) mentions that bug report classifications are unreliable. Thus, we performed a bug report manual classification to identify which bug reports actually represent bugs in the projects of Apache Tomcat, Apache Derby, and Apache Ant. This classification was performed in pairs by 14 researchers. Each person of the pair was responsible for manually classify the same bug report as "bug" or "not bug". When there was a divergence in opinion, the pair should talk and define the final classification of such bug. In the final analysis, we considered only bug reports that represent bugs in such projects. We manually validated 1,477 bug reports, in which 516 (35.00%) were classified as "bug" and 961 (65.00%) as "not bug".
To answer our RQ, we compute the distance in number of changes between the refactored commit and the bug code commit. To do that, we take into account only commits where the buggy element was touched by any change.
To measure the bug proneness of refactored code elements, we computed the quartiles based on distance values and observed how far or how close a bug appears after a refactoring operation considering the distance classification. Figure 2 presents an example of the bug proneness of refactored code elements. In the Figure, method X was refactored in commit 1 and had a bug in commit 10. From commit 1 to 10, method X was changed 2 times (in commit 3 and 5). Thus, we say that the distance between the refactored commit and the bug code commit (Distance (r,b)) is equal to 2. In this case, a bug is close to the refactored commit. In our RQ, we will also analyze the bug proneness of each refactoring tactic, namely root-canal and floss refactoring. In the end, we will compare if root-canal refactoring is more bug-prone than floss refactoring.
# | Artefact | Description |
---|---|---|
1 | Distances by Project | This file contains the complete list of all relationships between refactorings and bugs analyzed in this study. There is a file per software project. |
2 | Submited Paper | Complete text submited to ICSE 2018 |
Any question/suggestion please contact the authors of this work.
# | Name | |
---|---|---|
1 | Isabella Ferreira | iferreira@inf.puc-rio.br |
2 | Eduardo Fernandes | emfernandes@inf.puc-rio.br |
3 | Diego Cedrim | dcgrego@inf.puc-rio.br |
4 | Anderson Uchôa | auchoa@inf.puc-rio.br |
5 | Ana Carla Bibiano | abibiano@inf.puc-rio.br |
6 | Alessandro Garcia | afgarcia@inf.puc-rio.br |
7 | João Lucas Correia | jlmc@ic.ufal.br |
8 | Filipe Santos | filipebatista@ic.ufal.br |
9 | Gabriel Nunes | gabrielnunes@ic.ufal.br |
10 | Caio Barbosa | cbvs@ic.ufal.br |
11 | Baldoino Fonseca | baldoino@ic.ufal.br |
12 | Rafael de Mello | rmaiani@inf.puc-rio.br |