On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems

dc.contributor.authorUddin, M.S.
dc.contributor.authorRoy, C.K.
dc.contributor.authorSchneider, K.A.
dc.contributor.authorHindle, A.
dc.date.accessioned2015-07-29T19:39:12Z
dc.date.available2015-07-29T19:39:12Z
dc.date.issued2011
dc.description.abstractClone detection techniques essentially cluster textually, syntactically and/or semantically similar code fragments in or across software systems. For large datasets, similarity identification is costly both in terms of time and memory, and especially so when detecting near-miss clones where lines could be modified, added and/or deleted in the copied fragments. The capability and effectiveness of a clone detection tool mostly depends on the code similarity measurement technique it uses. A variety of similarity measurement approaches have been used for clone detection, including fingerprint based approaches, which have had varying degrees of success notwithstanding some limitations. In this paper, we investigate the effectiveness of simhash, a state of the art fingerprint based data similarity measurement technique for detecting both exact and near-miss clones in large scale software systems. Our experimental data show that simhash is indeed effective in identifying various types of clones in a software system despite wide variations in experimental circumstances. The approach is also suitable as a core capability for building other tools, such as tools for: incremental clone detection, code searching, and clone management.en_US
dc.description.refereedYesen_US
dc.identifier.doi10.1109/WCRE.2011.12
dc.identifier.urihttp://hdl.handle.net/1880/50714
dc.identifier.urihttps://doi.org/10.11575/PRISM/46183
dc.publisherIEEEen_US
dc.publisher.urlhttp://dx.doi.org/10.1109/WCRE.2011.12en_US
dc.titleOn the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systemsen_US
dc.typeunknown
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.84 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections