Software Evolution and Maintenance: Reengineering (PDF)
Document Details
Uploaded by FabulousCentaur
Tripathy & Naik
Tags
Summary
This document provides a detailed overview of the software reengineering process, including the phase reengineering model, code reverse engineering techniques used, and decompilation vs. reverse engineering details.
Full Transcript
Software Evolution and Maintenance A Practitioner’s Approach Chapter 4 Reengineering Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.2 Source Code Reengineering Reference...
Software Evolution and Maintenance A Practitioner’s Approach Chapter 4 Reengineering Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.2 Source Code Reengineering Reference Model The framework, depicted in Figure, consists of four kinds of elements: – function, – documentation, – repository database, and – metrication. Figure 4.6 Source code reengineering reference model © IEEE, 1990 Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model The model comprises five phases: analysis and planning, renovation, target system testing, redocumentation, and acceptance testing and system transition, as depicted in Figure 4.8. Figure 4.8 Software reengineering process phases © IEEE, 1992 Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model Analysis and planning: Analysis addresses three technical and one economic issue. – The first technical issue concerns the present state of the system to be reengineered and understanding its properties. – The second technical issue concerns the identification of the need for the system to be reengineered. – The third technical issue concerns the specification of the characteristics of the new system to be produced. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model Renovation: An operational system is modified into the target system in the renovation phase. Two main aspects of a system are considered in this phase: (i) representation of the system. It refers to source code, but it may include the design model and the requirement specification of the existing system. (ii) representation of external data. It refers to the database and/or data files used by the system. Often the external data are reengineered, and it is known as data reengineering. An operational system can be renovated in many ways, depending upon the objectives of the project, the approach followed, and the starting representation of the system. It may be noted that the starting representation can be source code, design, or requirements. Table 4.1 discussed earlier recommends several alternatives to renovate a system. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model Renovation: Example A project in which the objective is to re-code the system from Fortran to C. Figure 4.9 shows the three possible replacement strategies. First, to perform source-to-source translation, program migration is used. Second, a high-level design is constructed from the operational source code, say, in Fortran, and the resulting design is re-implemented in the target language, C in this case. Finally, a mix of compilation and decompilation is used to obtain the system implementation in C Figure 4.9 Replacement strategies for recording Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model Target system testing: In this phase for system testing, faults are detected in the target system. Those faults might have been introduced during reengineering. Fault detection is performed by applying the target system test plan on the target system. The same testing strategies, techniques, methods, and tools that are used in software development are used during reengineering. For example, apply the existing system-level test cases to both the existing and the new system. Assuming that the two systems have identical requirements, the test results from both the scenarios must be the same. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model Redocumentation: In the redocumentation phase, documentations are rewritten, updated, and/or replaced to reflect the target system. Documents are revised according to the redocumentation plan. The two major tasks within this phase are: (i) analyze new source code, and (ii) create documentation. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.4.3 Phase Reengineering Model Acceptance and system transition: In this final phase, the reengineered system is evaluated by performing acceptance testing. Acceptance criteria should already have been established in the beginning of the project. Should the reengineered system pass those tests, preparation begins to transition to the new system. On the other hand, if the reengineered system fails some tests, the faults must be fixed; in some cases, those faults are fixed after the target system is deployed. Upon completion of the acceptance tests, the reengineered system is made operational, and the old system is put out of service. System transition is guided by the prior developed transition plan. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6 Techniques Used for Reverse Engineering The well-known analysis techniques that facilitate reverse engineering are: 1. Lexical analysis. 2. Syntactic analysis. 3. Control flow analysis. 4. Data flow analysis. 5. Program slicing. 6. Visualization. 7. Program metrics. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.1 Lexical Analysis Lexical analysis is the process of decomposing the sequence of characters in the source code into its constituent lexical units. A program performing lexical analysis is called a lexical analyzer, and it is a part of a programming language’s compiler. Typically it uses rules describing lexical program structures that are expressed in a mathematical notation called regular expressions. Modern lexical analyzers are automatically built using tools called lexical analyzer generators, namely, lex and flex (fast lexical analyzer). Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.2 Syntax Analysis Syntactic analysis is performed by a parser. Similar to syntactic analyzers, parsers can be automatically constructed from a description of the programatical properties of a programming language. YACC is one of the most commonly used parsing tools. Two types of representations are used to hold the results of syntactic analysis: parse tree and abstract syntax tree. A parse tree contains details unrelated to actual program meaning, such as the punctuation, whose role is to direct the parsing process. Grouping parentheses are implicit in the tree structure, which can be pruned from the parse tree. Removal of those extraneous details produces a structure called an Abstract Syntax Tree (AST). An AST contains just those details that relate to the actual meaning of a program. Many tools have been based on the AST concept; to understand a program, an analyst makes a query in terms of the node types. The query is interpreted by a tree walker to deliver the requested information. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.3 Control Flow Analysis After determining the structure of a program, control flow analysis (CFA) can be performed onit. The two kinds of control flow analysis are: Intraprocedural: It shows the order in which statements are executed within a subprogram. Interprocedural: It shows the calling relationship among program units. Intraprocedural analysis: The idea of basic blocks is central to constructing a CFG. A basic block is a maximal sequence of program statements such that execution enters at the top of the block and leaves only at the bottom via a conditional or an unconditional branch statement. A basic block is represented with one node in the CFG, and an arc indicates possible flow of control from one node to another. A CFG can directly be constructed from an AST by walking the tree to determine basic blocks and then connecting the blocks with control flow arcs. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.3 Control Flow Analysis Interprocedural analysis: Interprocedural analysis is performed by constructing a call graph. Calling relationships between subroutines in a program are represented as a call graph which is basically a directed graph. Specifically, a procedure in the source code is represented by a node in the graph, and the edge from node f to g indicates that procedure f calls procedure g. Call graphs can be static or dynamic. A dynamic call graph is an execution trace of the program. Thus a dynamic call graph is exact, but it only describes one run of the program. On the other hand, a static call graph represents every possible run of the program. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.4 Data Flow Analysis Data flow analysis (DFA) concerns how values of defined variables flow through and are used in a program. CFA can detect the possibility of loops, whereas DFA can determine data flow anomalies. One example of data flow anomaly is that an undefined variable is referenced. Another example of data flow anomaly is that a variable is successively defined without being referenced in between. Data flow analysis enables the identification of code that can never execute, variables that might not be defined before they are used, and statements that might have to be altered when a bug is fixed. Control flow analysis cannot answer the question: Which program statements are likely to be impacted by the execution of a given assignment statement? To answer this kind of questions, an understanding of definitions (def) of variables and references (uses) of variables is required. If a variable appears on the left hand side of an assignment statement, then the variable is said to be defined. If a variable appears on the right hand side of an assignment statement, then it is said to be referenced in that statement. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.5 Program Slicing Originally introduced by Mark Weiser, program slicing has served as the basis of numerous tools. In Weiser’s definition, a slicing criterion of a program P is S < p; v > where p is a program point and v is a subset of variables in P. A program slice is a portion of a program with an execution behavior identical to the initial program with respect to a given criterion, but may have a reduced size. A backward slice with respect to a variable v and a given point p comprises all instructions and predicates which affect the value of v at point p. Backward slices answer the question “What program components might effect a selected computation?” The dual of backward slicing is forward slicing. With respect to a variable v and a point p in a program, a forward slide comprises all the instructions and predicates which may depend on the value of v at p. Forward slicing answers the question “What program components might be effected by a selected computation?” Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.5 Program Slicing:Example of Backward Slice Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.5 Program Slicing: Example of Forward Slice Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.6.6 Visualization Software visualization is a useful strategy to enable a user to better understand software systems. In this strategy, a software system is represented by means of a visual object to gain some insight into how the system has been structured. The visual representation of a software system impacts the effectiveness of the code analysis or design recovery techniques. Two important notions of designing software visualization using 3D graphics and virtual reality technology are Representation: This is the depiction of a single component by means of graphical and other media. Visualization: It is a configuration of an interrelated set of individual representations related information making up a higher level component. For effective software visualization, one needs to consider the properties and structure of the symbols used in software representation and visualization. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik Program Metrics Based on a module’s fan-in and fan-out information flow characteristics, Henry and Kafura define a complexity metric, Cp = (fan-in × fan- out). A large fan-in and a large fan-out may be symptoms of a poor design. Six performance metrics are found in the Chidamber-Kemerer CK metric suite: Weighted Methods per Class (WMC) – This is the number of methods implemented within a given class. Lack of Cohesion in Methods (LCOM) – For each attribute in a given class, calculate the percentage of the methods in the class using that attributes. Next, compute the average of all those percentages, and subtract the average from 100 percent. Coupling between Object Class (CBO) – This is the number of distinct non- inheritance related classes on which a given class is coupled. Depth of Inheritance Tree (DIT) – This is the length of the longest path from a given class to the root in the inheritance hierarchy. Number of Children (NOC) – This is the number of classes that directly inherit from a given class. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.7 Decompilation Versus Reverse Engineering Figure 4.14 Relationship between decompilation and traditional reengineering ©2007 Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.7 Decompilation Versus Reverse Engineering A decompiler takes an executable binary file and attempts to produce readable high-level language source code from it. The output will, in general, not be the same as the original source code, and may not even be in the same language. The decompiler does not provide the original programmers’ annotations that provide vital instructions as to the functioning of the software. Disassemblers are programs that take a program’s executable binary as input and generate text files that contain the assembly language code for the entire program or parts of it. Decompilation, or disassembly, is a reverse engineering process, since it creates representations of the system at a higher level of abstraction However, traditional reverse engineering from source code entails the recognition of “goals”, or “plans”, which must be known in advance. However, compilation is not considered part of the forward engineering, since it is an automatic step. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.7 Decompilation Versus Reverse Engineering Decompilers aided program migration from one machine to another. As decompilation capabilities have increased, a wide range of potential applications emerged. Examples of new applications are: – recovery of lost source code. – error correction. – security testing. – learning algorithms. – recovery of someone else’s source code Not all uses of decompilers are legal uses. Most of the applications must be examined from the patent and/or copyright infringement point of view. It is recommended to seek legal counsel before starting any low-level reverse engineering project. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.8 Data Reverse Engineering Data Reverse Engineering (DRE) is defined as “the use of structured techniques to reconstitute the data assets of an existing system”. By means of structured techniques, existing situations are analyzed and models are constructed prior to developing the new system. The two vital aspects of a DRE process are: (i) recover data assets that are valuable; (ii) reconstitute the recovered data assets to make them more useful. The purpose of DRE is as follows: 1. Knowledge acquisition. 6. Data conversion. 2. Tentative requirements. 7. Software assessment. 3. Documentation. 8. Quality assessment. 4. Integration. 9. Component reuse. 5. Data administration. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.8 Data Reverse Engineering Reverse engineering of a data-oriented application, including its user interface, begins with DRE. Recovering the specifications, that is the conceptual schema in database realm, of such applications is known as database reverse engineering (DBRE). A DBRE process facilitates understanding and redocumenting an application’s database and files. By means of a DBRE process, one can recreate the complete logical and conceptual schemas of a database physical schema. The conceptual schema is an abstract, implementation independent description of the stored data. A logical schema describes the data structures in concrete forms as those are implemented by the data manager. The physical schema of a database implements the logical schema by describing the physical constructs. Deep understanding of the forward design process is needed to reverse engineer a database. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.8 Data Reverse Engineering The forward design process of a database comprises three basic phases as follows: Conceptual phase: In this phase, user requirements are gathered, studied, and formalized into a conceptual schema. Logical phase: In this phase, the conceptual schema is expressed as a simple model, which is suitable for optimization reasoning. Physical phase: Now the logical schema is described in the data description language (DDL) of the data management system and the host programming language. The process is divided into two main phases, namely, – data structure extraction. – data structure conceptualization. The two phases relate to the recovery of two different schemas: (i) the first one retrieves the present structure of data from their host language representation. (ii) the second one retrieves a conceptual schema that describes the semantics underlying the existing data structures. Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik 4.9 Reverse Engineering Tools Software reverse engineering is a complex process that tools can only support, not completely automate. There is a need of human intervention with any reverse engineering project. The tools can provide a new view of the product, as shown in Figure 4.16. The basic structure of reverse engineering tools is as follows: The software system to be reverse engineered is analyzed. The results of the analysis are stored in an information base. View composers use the information base to produce alternative views of the system. Figure 4.16 Basic structure of reverse engineering tools ©IEEE 1990 Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik Summary General Idea Reengineering Concepts A General Model for Software Engineering Reengineering Process Code Reverse Engineering Techniques used for Reverse Engineering Decompilation versus Reverse Engineering Data Reverse Engineering Reverse Engineering Tools Software Evolution and Maintenance (Chapter 4: Reengineering) © Tripathy & Naik