PRISM :: Browsing by Author "Walker, Robert James"

Browsing by Author "Walker, Robert James"

Now showing 1 - 5 of 5

Open Access
API Usage Templates via Structural Generalization
(2023-05-03) Mahmoud, May Abdelrheem Sayed; Walker, Robert James; Denzinger, Jorg; Maurer, Frank O.; Aycock, John Daniel; Hindle, Abram
Application programming interfaces (APIs) are key in software development, but determining how to use one can be challenging. Developers often refer to a small set of API usage examples, analyzing the information in them to understand the API usage and adapting them to their own context. Generalization of these examples would aid in understanding their commonalities and differences, thereby reducing information overload. Work on API usage mining seeks recurrent information in usage examples. Some approaches seek frequent subsequences of method calls (e.g., Monperrus et al., 2010;Wasylkowski and Zeller, 2011; Fowkes and Sutton, 2016). Others use graph-based representations, applying frequent subgraph mining techniques (e.g., Nguyen et al., 2009; Amann et al., 2019). However, all such approaches focus on frequently occurring commonalities; this results in either excluding variations in the usage of the API elements in similar contexts or subdividing such variations across several patterns, forcing developers to manually determine variability in the API elements’ usage. Approaches that aim to select the best examples (e.g., Moreno et al., 2013) ignore variation. Approaches that generate examples (e.g., Barnaby et al., 2020) focus on producing maximally succinct examples rather than representing whatever commonality is present. In this thesis, we propose ASGard (for API usage templates via Structural Generalization), a novel approach that automatically generates API usage templates from usage examples based on the generalization of the examples’ syntactic structure and some semantic structure. API usage templates are a code-based representation generalizing similar API usage contexts, showing the commonality of the usage examples, where the varying aspects of the input examples are replaced with structural variables intended as placeholders. ASGard takes a set of API usage examples and a simple indication of the API of interest, as input. We proceed in two phases. (1) For the sake of improved performance, we cluster the examples based on the similarity of the API usage. (2) We then use an approximation of the formalism of E-generalization (Burghardt, 2005) to infer API usage templates from the examples. We start with matching the nodes of the ASTs of the examples, seeking to preserve common elements in the nodes while abstracting away the differences. The generalization proceeds iteratively, permitting increasing abstraction of the template as long as no API usage information is eliminated. The final templates are representations of the generalized ASTs. We perform a manual evaluation of the output templates from ASGard, which generalize a set of 231 usage examples across 5 different APIs, finding that our approach provides a mean 62% coverage of the API usage elements found in the usage examples as opposed to 48% coverage by the best alternative. Furthermore, we automatically evaluate the templates from our approach and the code representation of the patterns generated from PAM and MUDetect (two prominent API usage mining approaches), using a total of 1,954 API usage examples across 59 different APIs. We measure two aspects of the quality of the resulting templates: (1) how complete each template is relative to each concrete example; and (2) how well each template set compresses the set of API usage examples. We find that, compared to the output from PAM and MUDetect, ASGard provides templates that have superior completeness (51% vs. 12% for PAM and 25% for MUDetect) and far superior compression (81% vs. 54% for PAM and 26% for MUDetect). We perform a user study on ASGard with 12 participants to compare the use of these templates in solving programming tasks compared to MUDetect. We find that participants solved the programming tasks in significantly less time with ASGard: 48% for a coding task and 31% for a debugging task. Participants expressed a preference for using ASGard templates and perceived that the approach helped them better understand the API usage; they were more willing to use the approach again than the best alternative.
Open Access
Dependency Detection and Migration in Software Systems and Libraries
(2014-10-01) Cossette, Bradley Edward; Walker, Robert James
Software systems must change over time or risk becoming obsolete, but direct changes can impact dependent functionality. Software developers perform change impact analysis and redress using automated tools to identify dependency relationships affected by change, and to recommend adaptations. However, these tools are restricted in their application: dependency analysis tools are language-specific but many systems are implemented using multiple languages, while recommenders are poor at identifying adaptations for change impacts from external libraries. This work proposes using semi-automated approaches for supporting change impact analysis and redress, in which the developer is relied upon to provide key details of dependency syntax or examples of correct adaptations. From such information, tool support can be generated that is appropriately configured for the developer’s software, and that can be further refined by the developer through additional details or examples until it is sufficiently accurate for their needs. Four studies validate this thesis. The first involves the semi-automated DSKETCH tool for polylingual dependency analysis, which requires developers to detail only key syntax using a simplified notation; it then uses this syntax to identify where potential dependencies exist. Developers were able to successfully configure and use DSKETCH on polylingual systems with only a short period of training. The second study examines how software libraries and their application programming interfaces (APIs) evolve over successive versions. The study found that existing recommenders are generally unsuccessful, and that most observed changes could not be automatically migrated. The third study introduces the UMAMI tool that detects correspondences between the syntactic structure of the old API functionality and possible replacements in the new library version to recommend adaptations for API changes. The fourth study examines how change recommenders could be hybridized in a flexible fashion, by relying on developer-provided examples of correct redress of API changes to tailor recommendations to a library’s particular characteristics.
Open Access
Efficient Extension of a Software Analysis Framework to Additional Languages
(2023-07) Soltanpour, Shahryar; Walker, Robert James; Aycock, John; Oehlberg, Lora
In the current era of software development, multi-language codebases are common, and change propagation in these codebases is challenging. The existing change propagation tool ModCP is a solution that can assist software developers with propagating changes across several languages, but only one at a time. However, ModCP has some architectural problems in that make supporting new languages hard to develop and maintain for a long time. In addition, supporting change propagation across code snippets consisting of a programming language embedded inside a different programming language would be a useful feature for ModCP. To achieve this, we must detect the embedded code snippets in a code being analyzed by ModCP. In this thesis, we develop a new, more efficient architecture for ModCP, involving a single abstract model that each language extends for its usage, resulting in complete isolation between language results. We compare our approach with a baseline version that uses the same, concrete model for all languages and adds new models when necessary. Our approach reduces code complexity and development time and makes code more compatible with best practices of development compared to the baseline. Moreover, we design a system for ModCP to guess and validate the programming language used in code snippets, based on the initial detection of keywords, as input to execute change propagation for multi-language codes embedded inside each other. We compare our keyword detection approach with existing deep learning and brute force approaches and show that our method is the best choice if accuracy, performance, and scalability are needed simultaneously.
Open Access
Graph Generalization for Software Engineering
(2024-01-23) Kianifar, Mohammad Reza; Walker, Robert James; Denzinger, Jorg; Maleki, Farhad
Graph generalization is a powerful concept with a wide range of potential applications, not only within software engineering but also across various domains. While established algorithms exist for generalizing simple graphs, such as trees, the development of practical methods for applying generalization techniques to more complex graphs remains a critical challenge. In this thesis, we introduce a novel formal model and algorithm, referred to as GGA (Graph Generalization Algorithm), dedicated to generalizing labelled directed graphs. We evaluate GGA by focusing on key aspects including its information preservation relative to its input graphs, its scalability in execution, and for three applications each utilizing differing kinds of graph: (1) abstract syntax trees (ASTs); (2) class graphs; and (3) call graphs. In the first case, GGA is compared against ASGard and Diff-Sitter, two existing approaches for tree-based generalization; in the latter two cases, GGA is compared against Diff and CodeMetrics. Our findings reveal GGA's superiority over the alternatives. In the AST application, GGA outperforms ASGard by an average of 5-18% on metrics related to information preservation. GGA's results for the AST differencing context also matched 100% with Diff-Sitter by applying symmetrical filtering to skip strictness configuration. In the context of class graphs, GGA achieves 77.1% in precision@5, while in the case of call graphs, it exhibits 60% in precision@5. We also performed an extensive performance test for the first two applications, and the result shows that GGA's execution time scales linearly with respect to the product of vertex count and edge count. Our research not only introduces a novel algorithm for graph generalization but also demonstrates its ability to preserve information in diverse applications while performing efficiently. These results signify the potential of GGA to advance the field of graph generalization and its practical applicability across various domains, specifically in software engineering.
Open Access
Towards Usable API Documentation
(2023-07) Khan, Junaed Younus; Uddin, Gias; Barcomb, Ann; Walker, Robert James
The learning and usage of an API is supported by documentation. Like source code, API documentation is itself a software product. Several research results show that bad design in API documentation can make the reuse of API features difficult. Indeed, similar to code smells, poorly designed API documentation can also exhibit 'smells'. Such documentation smells can be described as bad documentation styles that do not necessarily produce incorrect documentation but make the documentation difficult to understand and use. This thesis aims to enhance API documentation usability by addressing such documentation smells in three phases. In the first phase, we developed a catalog of five API documentation smells consulting literature on API documentation issues and online developer discussion. We validated their presence in the real world by creating a benchmark of 1K official Java API documentation units and conducting a survey of 21 developers. The developers confirmed that these smells hinder their productivity and called for automatic detection and fixing. In the second phase, we developed machine-learning models to detect the smells using the 1K benchmark, however, they performed poorly when evaluated on larger and more diverse documentation sources. We explored more advanced models; employed re-training and hyperparameter tuning to further improve the performance. Our best-performing model, RoBERTa, achieved F1-scores of 0.71-0.93 in detecting different smells. In the third phase, we first focused on evaluating the feasibility and impact of fixing various smells in the eyes of practitioners. Through a second survey of 30 practitioners, we found that fixing the lazy smell was perceived as the most feasible and impactful. However, there was no universal consensus on whether and how other smells can/should be fixed. Finally, we proposed a two-stage pipeline for fixing lazy documentation, involving additional textual description and documentation-specific code example generation. Our approach utilized a large language model, GPT- 3, to generate enhanced documentation based on non-lazy examples and to produce code examples. The generated code examples were refined iteratively until they were error-free. Our technique demonstrated a high success rate with a significant number of lazy documentation instances being fixed and error-free code examples being generated.

Browsing by Author "Walker, Robert James"

Results Per Page

Sort Options

Libraries & Cultural Resources