Browsing by Author "Schneider, K.A."
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Metadata only An automatic framework for extracting and classifying near-miss clone genealogies(IEEE, 2011) Saha, R.K.; Roy, C.K.; Schneider, K.A.Extracting code clone genealogies across multiple versions of a program and classifying them according to their change patterns underlies the study of code clone evolution. While there are a few studies in the area, the approaches do not handle near-miss clones well and the associated tools are often computationally expensive. To address these limitations, we present a framework for automatically extracting both exact and near-miss clone genealogies across multiple versions of a program and for identifying their change patterns using a few key similarity factors. We have developed a prototype clone genealogy extractor, applied it to three open source projects including the Linux Kernel, and evaluated its accuracy in terms of precision and recall. Our experience shows that the prototype is scalable, adaptable to different clone detection tools, and can automatically identify evolution patterns of both exact and near-miss clones by constructing their genealogies.Item Metadata only Bug introducing changes: A case study with Android(IEEE, 2012) Asaduzzaman, M.; Bullock, M.C.; Roy, C.K.; Schneider, K.A.Changes, a rather inevitable part of software development can cause maintenance implications if they introduce bugs into the system. By isolating and characterizing these bug introducing changes it is possible to uncover potential risky source code entities or issues that produce bugs. In this paper, we mine the bug introducing changes in the Android platform by mapping bug reports to the changes that introduced the bugs. We then use the change information to look for both potential problematic parts and dynamics in development that can cause maintenance implications. We believe that the results of our study can help better manage Android software development.Item Metadata only Connectivity of co-changed method groups: a case study on open source systems(IBM, 2012) Mondal, M.; Roy, C.K.; Schneider, K.A.Software maintenance is an important and challenging phase of the software development life cycle because changes during this phase without proper awareness of dependencies among program modules can introduce faults in the software system. There is also a common intuition that cloned code introduces additional software maintenance challenges and difficulties. To support successful accomplishment of maintenance activities we consider two issues: (i) identifying coding characteristics that cause high source code modifications, and (ii) guidance for minimizing source code modifications. Focusing on these two issues we investigated the effects of method sharing (among different functionality) on method co-changeability and source code modifications. We proposed and empirically evaluated two metrics, (i) COMS (Co-changeability of Methods), and (ii) CCMS (Connectivity of Co-changed Method Groups). COMS measures the extent to which a method co-changes with other methods. CCMS quantifies the extent to which a particular functionality in a software system is connected with other functionality in that system. In other words CCMS measures the intensity of method sharing among different functionality or tasks (defined later). We investigated the impact of CCMS on COMS and source code modifications. Our comprehensive study on hundreds of revisions of six open source subject systems covering three programming languages (Java, C and C#) suggests that - (i) higher CCMS causes higher COMS as well as increased source code modifications, (ii) COMS in the cloned regions of a software system is negligible as compared to the COMS in the non-cloned regions, and (iii) in-spite of some issues (described later) cloning can be a possible way to reduce CCMS.Item Metadata only Dispersion of changes in cloned and non-cloned code(IEEE, 2012) Mondal, M.; Roy, C.K.; Schneider, K.A.Currently, the impacts of clones in software maintenance activities are being investigated by different researchers in different ways. Comparative stability analysis of cloned and non-cloned regions of a subject system is a well-known way of measuring the impacts where the hypothesis is that, the more a region is stable the less it is harmful for maintenance. Each of the existing stability measurement methods lacks to address one important characteristic, dispersion, of the changes happening in the cloned and non-cloned regions of software systems. Change dispersion of a particular region quantifies the extent to which the changes are scattered over that region. The intuition is that, more dispersed changes require more efforts to be spent in the maintenance phase. Measurement of Dispersion requires the extraction of method genealogies. In this paper, we have measured the dispersions of changes in cloned and non-cloned regions of several subject systems using a concurrent and robust framework for method genealogy extraction. We implemented the framework on Actor Architecture platform which facilitates coarse grained parallellism with asynchronous message passing capabilities. Our experimental results on 12 open-source subject systems written in three different programming languages (Java, C and C#) using two clone detection tools suggest that, the changes in cloned regions are more dispersed than the changes in non-cloned regions. Also, Type-3 clones exhibit more dispersion as compared to the Type-1 and Type-2 clones. The subject systems written in Java and C show higher dispersions as well as increased maintenance efforts as compared to the subject systems written in C#.Item Metadata only An Empirical Study of the Impacts of Clones in Software Maintenance(IEEE, 2011) Mondal, M.; Rahman, M.S.; Saha, R.K.; Krinke, J.; Schneider, K.A.The impacts of clones on software maintenance is a long-lived debate on whether clones are beneficial or not. Some researchers argue that clones lead to additional changes during the maintenance phase and thus increase the overall maintenance effort. Moreover, they note that inconsistent changes to clones may introduce faults during evolution. On the other hand, other researchers argue that cloned code exhibits more stability than non-cloned code. Studies resulting in such contradictory outcomes may be a consequence of using different methodologies, using different clone detection tools, defining different impact assessment metrics, and evaluating different subject systems. In order to understand the conflicting results from the studies, we plan to conduct a comprehensive empirical study using a common framework incorporating nine existing methods that yielded mostly contradictory findings. Our research strategy involves implementing each of these methods using four clone detection tools and evaluating the methods on more than fifteen subject systems of different languages and of a diverse nature. We believe that our study will help eliminate tool and study biases to resolve conflicts regarding the impacts of clones on software maintenance.Item Metadata only Evaluating Code Clone Genealogies at Release Level: An Empirical Study(IEEE, 2010) Saha, R.K.; Asaduzzaman, M.; Zibran, M.F.; Roy, C.K.; Schneider, K.A.Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.Item Metadata only Improving the detection accuracy of evolutionary coupling(IEEE, 2013) Mondal, M.; Roy, C.K.; Schneider, K.A.If two or more program entities (e.g., files, classes, methods) co-change frequently during software evolution, these entities are said to have evolutionary coupling. The entities that frequently co-change (i.e., exhibit evolutionary coupling) are likely to have logical coupling (or dependencies) among them. Association rules and two related measurements, Support and Confidence, have been used to predict whether two or more co-changing entities are logically coupled. In this paper, we propose and investigate a new measurement, Significance, that has the potential to improve the detection accuracy of association rule mining techniques. Our preliminary investigation on four open-source subject systems implies that our proposed measurement is capable of extracting coupling relationships even from infrequently co-changed entity sets that might seem insignificant while considering only Support and Confidence. Our proposed measurement, Significance (in association with Support and Confidence), has the potential to predict logical coupling with higher precision and recall.Item Metadata only Insight into a method co-change pattern to identify highly coupled methods: An empirical study(IEEE, 2013) Mondal, M.; Roy, C.K.; Schneider, K.A.In this paper, we describe an empirical study of a unique method co-change pattern that has the potential to pinpoint design deficiency in a software system. We automatically identify this pattern by inspecting the method co-change history using reasonable constraints on method association rules. We also investigate the effect of code clones on the method co-changes identified according to the pattern, because there is a common intuition that clone fragments from the same clone class often require corresponding changes to ensure they remain consistent with each other. According to our in-depth investigation on hundreds of revisions of seven open-source software systems considering three types of clones (Type 1, Type 2, Type 3), our identified pattern helps us detect methods that are logically coupled with multiple other methods and that exhibit a significantly higher modification frequency than other methods. We call the methods detected by the pattern MMCGs (Methods appearing in Multiple Commit Groups) considering the pattern semantic. MMCGs can be considered as the candidates for restructuring in order to minimize coupling as well as to reduce the change-proneness of a software system. According to our observation, code clones have a significant effect on method co-changes as well as on MMCGs. We believe that clone refactoring can help us minimize evolutionary coupling among methods.Item Metadata only On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems(IEEE, 2011) Uddin, M.S.; Roy, C.K.; Schneider, K.A.; Hindle, A.Clone detection techniques essentially cluster textually, syntactically and/or semantically similar code fragments in or across software systems. For large datasets, similarity identification is costly both in terms of time and memory, and especially so when detecting near-miss clones where lines could be modified, added and/or deleted in the copied fragments. The capability and effectiveness of a clone detection tool mostly depends on the code similarity measurement technique it uses. A variety of similarity measurement approaches have been used for clone detection, including fingerprint based approaches, which have had varying degrees of success notwithstanding some limitations. In this paper, we investigate the effectiveness of simhash, a state of the art fingerprint based data similarity measurement technique for detecting both exact and near-miss clones in large scale software systems. Our experimental data show that simhash is indeed effective in identifying various types of clones in a software system despite wide variations in experimental circumstances. The approach is also suitable as a core capability for building other tools, such as tools for: incremental clone detection, code searching, and clone management.Item Metadata only SimCad: An extensible and faster clone detection tool for large scale software systems(IEEE, 2013) Uddin, M.S.; Roy, C.K.; Schneider, K.A.Code cloning is an inevitable phenomenon in evolution of software systems. To reduce the harmful effects of clones in software evolution, they need to be identified correctly as well in a time efficient way. There might be various types of clones in a software system. Earlier research shows detection of near-miss clones in large datasets appears to be costly in terms of time and memory. Among the clone detection tools available in practice, not very many of them are found effective in that regard. In this paper we present a standalone clone detection tool SimCad. It is based on a highly scalable and faster clone detection algorithm designed to detect both exact and near-miss clones in large-scale software systems. One of the potential aspects of SimCad is that its clone detection function is made more portable by packaging it into a library called SimLib. Thus, SimLib now can be used as an off-the-shelf clone detection library that can be easily integrated into other applications that are designed to work based on detected clones. For example, a standalone tool or an Integrated Development Environment (IDE) plugin can use SimLib for realtime clone detection while providing its own services like clone visualization and/or clone management functionalities. We hope that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspects of detection and management of clones in software.