On-the-methodology-of-three-way-structured-merge 2023.pdf

Journal of Systems Architecture 145 (2023) 103011 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc On the methodology of three-way structured merge in version control systems: Top-down, bottom-up, or both Fengmin Zhu a,d ,1 ,2 , Xingyu Xie a ,2 , Dongyu Feng a , Na Meng e , Fei He a,b,c ,∗ a School of Software, Tsinghua University, China Key Laboratory for Information System Security, MoE, China Beijing National Research Center for Information Science and Technology, China d Max Planck Institute for Software Systems, Germany e Virginia Tech, USA b c ARTICLE INFO Keywords: Version control systems Three-way merging Structured merging Shifted code ABSTRACT Three-way merging is an essential component of version control systems. Despite the efficiency of the conventional line-based textual methods, syntax-based structured approaches have demonstrated benefits in improving merge accuracy. Prior structured merging approaches visit abstract syntax trees in a topdown manner, which struggles to identify and merge shifted code generally. This work introduces a novel methodology combining a top-down and a bottom-up visit of abstract syntax trees, which manipulates shifted code effectively and elegantly. Especially, we reduce the merge problem of ordered lists to computing a topological sort of strongly connected components of the constraint graph. Compared with four representative merge tools in 40,533 real-world merge scenarios, our approach achieves the highest merge accuracy while being 2.5 x as fast as a state-of-the-art structured merge tool. 1. Introduction Thanks to the wide application of version control systems such as Git and SVN, three-way merging has become an indispensable task in contemporary software development. A three-way merge scenario (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡) consists of three versions of a program, where the two variants 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡 are both evolved independently, possibly by different developers, from their ancestor 𝑏𝑎𝑠𝑒. A three-way merge algorithm integrates the changes made by the variants and produces a merged version called a target. When the two variants (i.e., branches) introduce changes that are contradicting, according to three-way merge principles, a conflict shall be reported, leaving the developers to manually resolve them. The three-way merge principles conservatively describe whether the changes could be correctly incorporated. Unstructured merge is a mature merging approach that regards programs as a sequence of lines of plain text. Since the context-free syntax is neglected, merge accuracy is yet unsatisfying—studies [1,2] have shown the presence of false conflicts (i.e., the conflicts that should have been avoided), which increases the user burden of manual resolution. To enhance merge accuracy, structured merge, representing programs as abstract syntax trees (ASTs), has gained significant research interest in recent decades [2–8]. A structured merge algorithm takes a set of mappings between different program versions as input and computes a merged version as output. The mappings are obtained by AST differencing (also known as AST matching) algorithms [9–11]. To compute a target AST, a structured merge algorithm needs to traverse the input ASTs (i.e., 𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, and 𝑟𝑖𝑔ℎ𝑡). Prior approaches [2– 8] all use a top-down order, which is quite natural and intuitive as it follows the structure of ASTs. Such a top-down AST comparison is usually restricted to be level-wise—only AST nodes at the same level (or depth) get compared, which makes it hard to detect if one piece of code is shifted into another, namely shifted code [12]. To identify shifted code in a top-down manner, one could search for the largest common embedded subtree. This problem, however, is known to be  -hard and difficult to approximate for general cases [13]. A more scalable approach is to employ syntax-aware looking ahead matching [12], but: looking ahead is only enabled for a few types of AST nodes; the maximum looking-ahead distance is short for efficiency considerations. The work in [12] focuses on the AST matching problem; how to correctly merge shifted code remains an issue. Thinking oppositely, we find a bottom-up traversing order a better option. The key to detecting shifted code is to allow node mappings ∗ Corresponding author at: School of Software, Tsinghua University, China. E-mail address: [email protected] (F. He). 1 Early revisions of this work were done when Fengmin Zhu was in Tsinghua University. 2 Fengmin Zhu and Xingyu Xie contributed equally. https://doi.org/10.1016/j.sysarc.2023.103011 Received 5 February 2023; Received in revised form 31 July 2023; Accepted 6 October 2023 Available online 13 October 2023 1383-7621/© 2023 Elsevier B.V. All rights reserved. Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. across AST levels, which is natural and easier via a bottom-up manner. Meanwhile, top-down merging cannot handle across-level mappings, which means bottom-up merging is needed as follow-up of the matching phase. Sometimes, the bottom-up visit alone incurs redundant computations. As an extreme example, if 𝑙𝑒𝑓 𝑡 is the same as 𝑏𝑎𝑠𝑒, meaning no changes are introduced on 𝑙𝑒𝑓 𝑡, then by three-way merge principles, 𝑟𝑖𝑔ℎ𝑡 introduces unique changes and should be the target version— there is no need to further inspect any of their descendants as in a bottom-up manner. We fix this issue by bringing in top-down merging. Combining a top-down pruning pass and a bottom-up pass, we present a novel three-way structured merge algorithm, where the trivial merge scenarios are processed in the former pass, and other nontrivial merge scenarios, which may involve shifted code, are carefully operated in the latter pass. Because our algorithm is non-backtracking, the time complexity is linear. Like in JDime [2], we distinguish if the children list of an AST node can be ‘‘safely permuted’’ (i.e., the permutation preserves semantics). If not, the list is called ordered (e.g., a sequence of statements) and a good merge algorithm should preserve, as much as possible, the original occurrence order of the children in the merged versions, which we call order-preservation. We reduce this problem into computing strongly connected components and a topological sort of the directed acyclic graph formed by the components, which is solvable by linear-time graph algorithms, e.g., Tarjan’s algorithm [14]. We implemented our approach as a structured merge tool called Mastery.3 To measure its usability and practicality in real scenarios, we extract 40,533 merge scenarios from 78 open-source Java projects. We identified that shifted code occurs in 38.54% merge scenarios, and conducted experimental comparisons with four representative tools: JDime (structured), jFSTMerge (tree-based semistructured), IntelliMerge (graph-based semistructured), and GitMerge (unstructured). Our results show: (1) Mastery achieves the highest merge accuracy of 82.90%; (2) Mastery reports the fewest 9.33% conflicts and the fewest 7,057 conflict blocks, excluding radical IntelliMerge; (3) Mastery is about 2.5× as fast as JDime, and about 1.4× as fast as jFSTMerge. Our tool and evaluation data are publicly available: https://github.com/thufv/mastery/. Organization of this paper. We start in Section 2 with an introduction to three-way structured merging. In Section 3, we provide a bird’seye view of our whole merging framework. Later on, we explain the technical details of our matching algorithm in Section 4 and merging algorithm in Section 5. We briefly present the tool implementation in Section 6 and carefully report our evaluation results in Section 7. Finally, we discuss related work in Section 8 and conclude the paper in Section 9. 2. Preliminaries AST nodes. In structured merging, programs are represented as abstract syntax trees (ASTs), parsed from source files. An AST is a labeled rooted tree with four types of nodes, each annotated with the name of its production rule in the grammar, called its label (𝑙𝑏𝑙). Node 𝑣 𝖫𝖾𝖺𝖿(𝑙𝑏𝑙, 𝑥) 𝖢𝗍𝗈𝗋𝑘 (𝑙𝑏𝑙, 𝑣1 , … , 𝑣𝑘 ) 𝖴𝖫𝗂𝗌𝗍(𝑙𝑏𝑙, {𝑣1 , … , 𝑣𝑛 }) 𝖮𝖫𝗂𝗌𝗍(𝑙𝑏𝑙, [𝑣1 , … , 𝑣𝑛 ]) (leaf) (𝑘-ary constructor) (unordered list) (ordered list) A leaf node represents a lexical token where the value 𝑥 is the text of that token. A 𝑘-ary constructor node has exactly 𝑘 children as its arguments. For instance, an if-statement – consisting of a Boolean condition, a true branch, and a false branch – is represented as a 3-ary constructor node. An arbitrary number of children is allowed in a list node, which is further divided into unordered – children can be safely permuted – and ordered (the opposite). For instance, a class member declaration list is unordered, while a statement list is ordered. The children of an unordered list node are denoted by a set {𝑣1 , … , 𝑣𝑘 }, while the children of an ordered list node are denoted by a list [𝑣1 , … , 𝑣𝑘 ] ([] for an empty list). In case of merge conflicts, we introduce conflicting nodes in target ASTs. We assume that conflict nodes can never appear in any input AST (i.e., 𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, or 𝑟𝑖𝑔ℎ𝑡), as any previously reported conflicts should have already been resolved at the moment a new merge process is requested. AST matching. A merging algorithm relies on a set of mappings between different versions of the programs to compute the merged version. The set of mappings between two ASTs 𝑇1 and 𝑇2 are represented by a matching set  = {(𝑢𝑖 , 𝑣𝑖 )}𝑖 , where each pair (𝑢𝑖 , 𝑣𝑖 ) consists of two nodes 𝑢𝑖 ∈ 𝑇1 and 𝑣𝑖 ∈ 𝑇2 . Intuitively, two mapped nodes are likely to be the same node in different versions: if we transform 𝑇1 to 𝑇2 by inserting and deleting nodes as few as possible, 𝑢𝑖 will probably become 𝑣𝑖 along the transformation. The mappings shall be injective—two nodes cannot be matched to the same node on the other AST simultaneously. Moreover, the matched nodes shall have the same label. Contributions. To sum up, this paper makes the following contributions: • We present a novel structured merge algorithm that visits ASTs in both a top-down and a bottom-up manner. The top-down pruning pass avoids a mass of redundant computations. The bottomup pass makes it possible to handle shifted code elegantly and efficiently. • The proposed merge algorithm is linear-time. • We adapt a state-of-the-art tree matching algorithm to better fit our merge algorithm. • We conduct comprehensive experiments on real-world merge scenarios. Results show that Mastery is competitive with state-ofthe-art merge tools in the aspects of merge accuracy, the number of conflicts, and efficiency. Definition 1 (Shifted Code). Given two mappings (𝑢′ , 𝑣′ ), (𝑢, 𝑣) ∈ , if 𝑢 is a child of 𝑢′ , whereas 𝑣 is not a child (i.e., direct descendant) but a later descendant of 𝑣′ , then (𝑢, 𝑣) is said a shifted mapping. Meanwhile, the code fragment corresponding to the subtree of 𝑣 is called a shifted code. This paper is an extended version of a preliminary conference paper [15]. Compared to [15], this paper gives the following new materials. First, we elucidate how we utilize and adapt the state-of-theart matching algorithm, GumTree [9], in order to make it better fit our merging algorithm. Second, we improve the ordered merging algorithm and report new experimental results. Whereas [15] adopts a topological sorting algorithm for merging ordered lists, in this paper, we compute strongly connected components in advance, trying to merge each component and concatenate the merged results of components in a topological sort of the components. This improvement makes Mastery able to handle trivial order-altering changes, e.g., swapping statements. 3 ∶∶= ∣ ∣ ∣ For example, on the merge scenario shown in Fig. 1: in 𝑙𝑒𝑓 𝑡, the code fragment of the InfixExpr is a shifted code and is shifted into a CastExpr; in 𝑟𝑖𝑔ℎ𝑡, the code fragment of the ExprStmt is a shifted code and is shifted into a ForStmt. Three-way merge principles. A three-way merge algorithm must abide by a couple of principles, as presented in Table 1. To avoid repetition, the dual cases of rows 1, 3, and 4 are not displayed. The first two rules are applicable to all types of nodes. If a node is modified by exactly one of the variants, then the change is unique and itself gives the target (row 1). If a node is concurrently modified by both variants inconsistently, a conflict is reported, as the algorithm has no adequate information to decide which one to take (row 2). If the modifications are equal, then of course these changes are consistent and thus not Merging abstract syntax trees in a reasonable way. 2 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. 4. Matching algorithm This section presents our tree-matching algorithm. We adapt GumTree [9] to compute the set of mappings between two ASTs . We choose GumTree because it can search for cross-level mappings, which enables the detection of shifted code. We will first review its main steps in Section 4.1 and then introduce our adaptions in Section 4.2. When it comes to a three-way merge scenario, we have in total three ASTs to compare: between which pairs shall we perform the underlying AST matching algorithms such as GumTree? Prior works [2,3,12] decided to do it between each pair of them, but we find this mechanism can sometimes lead to anti-intuitive results. To tackle this issue, we will propose a revised mechanism in Section 4.3. Fig. 1. A merge scenario with shifted code. (Shifted mappings are depicted as dashed arrows.) 4.1. GumTree overview Table 1 Three-way merge principles (dual cases omitted). Type Version 𝑏𝑎𝑠𝑒 Version 𝑙𝑒𝑓 𝑡 Version 𝑟𝑖𝑔ℎ𝑡 Target 𝑇 Explanation 1 2 Node Node 𝑒 𝑒 𝑒 𝑒𝐿 𝑒′ 𝑒𝑅 𝑒′ conflict left-change inconsistent change 3 4 List List 𝑒 ∈𝑏𝑎𝑠𝑒 𝑒 ∉𝑏𝑎𝑠𝑒 e ∉𝑙𝑒𝑓 𝑡 𝑒 ∈𝑙𝑒𝑓 𝑡 e ∈𝑟𝑖𝑔ℎ𝑡 𝑒 ∉𝑟𝑖𝑔ℎ𝑡 𝑒∉𝑇 𝑒 ∈ 𝑇 or conflict left-deletion left-insertion In a nutshell, the GumTree algorithm traverses the two input ASTs and searches for three kinds of mappings between them. An edit script can be deduced from the mappings using algorithms such as [16]. However, this step is not needed in our merge framework. To identify mappings, GumTree first performs a top-down visit of ASTs in decreasing height, finding isomorphic (equal) subtrees between them, each of which forms an anchor mapping. Then, it performs a bottomup visit for identifying mappings between yet unmatched nodes. A container mapping is established if the two nodes have a large number of descendants (greater than a threshold) being matched, or, intuitively, being ‘‘quite similar’’. Once a container mapping is found, recovery mappings are then established (e.g., using the RTED algorithm [17]) between some of their unmatched descendants. GumTree is capable of identifying cross-level mappings, making the detection of shifted code available. Cross-level mappings can be established in both passes: (1) in the top-down pass, anchor mappings are established between two subtrees of the same height – not the same depth – thus cross-level isomorphic subtrees are probably regarded as matches; (2) in the bottom-up pass, container mappings are established if the two subtrees are similar, and again their depths may differ; the situation is the same for recovery mappings. conflicting. The last two rules are applicable for only (ordered and unordered) list nodes. If an element of 𝑏𝑎𝑠𝑒 presents in exactly one of the variants, then it is regarded as being removed and will be excluded from the target list (row 3). In contrast, if a new node is introduced in exactly one of the variants, it will be inserted into the target list (row 4). For ordered lists, conflicts may occur if the insertion position is ambiguous. For unordered lists, the insertion position is arbitrary and thus not conflicting. 3. Framework overview Fig. 2 presents the main process and the workflow of Mastery. The tree matcher and the tree merger only manipulate ASTs produced by a parser, and thus are language-agnostic. To enable structured merging for a certain programming language , a parser that builds ASTs from source code written in , and a pretty printer that outputs source code from ASTs are required to be integrated into the framework. The framework takes three source files (they should be in the same language) as input, which forms a three-way merge scenario, and outputs another file as the merge result. The workflow consists of four sequential steps: 4.2. Our adaption: Monotone matchings Classical unstructured merging approaches [18] exploit line-based matching. The matching task is to compute the longest common sequence of two sequences of lines, where a common element in the longest common sequence corresponds to a mapping. Thus, the matching between sequences must be non-crossing, i.e., the linear order of a sequence is preserved on the matching. As a natural extension of this non-crossing property, the mappings between trees ought to preserve the ancestor-descendant relationship. This is what we call monotone. 1. A parser translates the three source files into three ASTs, referred to as 𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡. 2. A tree matcher (Section 4) compares and establishes mappings between the three ASTs. In our three-way tree matching algorithm, 𝑏𝑎𝑠𝑒 is compared with 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡 respectively, and each produces a matching set 𝐿 and 𝑅 . The tree matching algorithm is an adaption of GumTree [9]. 3. A tree merger (Section 5) then applies amalgamation on ASTs with the help of 𝐿 and 𝑅 , and builds a target AST according to three-way merge principles. The algorithm is composed of two passes: the top-down pruning pass identifies trivial merge scenarios and processes them in advance; the bottom-up merging pass processes the remaining non-trivial merge scenarios using tree amalgamation algorithms. In case conflicts occur, special conflicting nodes are created to record the conflicting blocks. 4. A pretty printer traverses the target AST and outputs source code into a file. If the target AST contains conflict nodes, they will be displayed using standard conflicting markers (as in GitMerge). Definition 2 (Monotonicity). A matching set  is monotone if the following holds: for every (𝑠, 𝑡) ∈ , if there exists 𝑢 ∈ 𝑠 (𝑢 is a descendant of 𝑠) such that (𝑢, 𝑣) ∈  for some node 𝑣, then 𝑣 ∈ 𝑡 (𝑣 is a descendant of 𝑡). To compute a monotone matching set, we use an ad-hoc filter strategy. Every time a new mapping is identified, we check if the monotonicity property is preserved by definition. If it is preserved, the new mapping is accepted, otherwise, rejected. Noting the fact that ‘‘for two given nodes 𝑥 and 𝑦 of a tree 𝑇 , 𝑥 is an ancestor of 𝑦 if and only if 𝑥 occurs before 𝑦 in the preorder and after 𝑦 in the postorder’’, the monotonicity (Definition 2) could be described by the order of pairs of numbers. Thus, by precomputing the preorder and postorder of the nodes and caching previously-computed results, the checking procedure takes only constant time. In other words, it does not increase the worst-case time complexity of GumTree. 3 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. Fig. 2. Workflow of our merge framework Mastery. from each other. In reality, they are usually developed independently, and sometimes the developers may even not know each other (common in open-source projects). Thus, the two variants are not that related. As the essence of the three-way merge is to integrate changes – rather than to recognize differences – we find it unnecessary to compare 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡. One may concern that the above strategy can sometimes miss a mapping that should be established between the two variants (since they are not compared to each other at all). To identify such mappings and to preserve the transitivity in the meantime, we could additionally compare and establish mappings between the nodes of the two variants that are not matched with any node of 𝑏𝑎𝑠𝑒. 5. Merging Algorithm Fig. 3. A merge scenario (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡) with established mappings (dashed or dotted lines). The red dotted line shows a mapping established by comparing 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡. Assume the similarity (measured by a heuristic cost function) between Field1 and Field2 exceeds a given threshold. Given a three-way merge scenario (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡), our merging algorithm accepts two matching sets from the matching algorithm as input—𝐿 the matches between 𝑏𝑎𝑠𝑒 and 𝑙𝑒𝑓 𝑡, and 𝑅 the matches between 𝑏𝑎𝑠𝑒 and 𝑟𝑖𝑔ℎ𝑡. The algorithm generates a new tree, namely a target AST, as the merge result. Merging is performed on the matched nodes only, as unmatched nodes are assumed to have no relation. In the case of three-way merging, the two variants should match the base version. Formally, a merge scenario (𝑏, 𝑙, 𝑟) is said proper if (𝑏, 𝑙) ∈ 𝐿 and (𝑏, 𝑟) ∈ 𝑅 . The merge algorithm only needs to manipulate proper merge scenarios. Non-proper merge scenarios can be safely omitted because they are regarded as deletions and thus do not appear in the target. Fig. 4 presents the high-level workflow of our merge algorithm. It consists of a top-down pass followed by a bottom-up one. In the topdown pass (see Section 5.1 for details), input ASTs get traversed in pre-order, and any trivial merge scenario – any two of the three versions are equal – is processed immediately. Meanwhile, we collect other nontrivial proper merge scenarios in the list 𝑆. In the bottom-up pass, the merge scenarios in 𝑆 get processed in the reverse order, i.e., in postorder. Since matched nodes have the same label, the three versions in a proper merge scenario must be homogeneous—they must all be leaf nodes, constructor nodes, unordered list nodes, or ordered list nodes. A challenging problem in the bottom-up pass is that: a sub-scenario (𝑏, 𝑙, 𝑟) may not be proper, say 𝑏 and 𝑙 do not match. Even though it is rational to assume 𝑏 matches some descendant of 𝑙 (or else we simply report a conflict), which happens when 𝑙 has shifted code. We encode this condition as a relevant-to relation: 𝑢 is relevant to 𝑣, written 𝑢 ≃ 𝑣, iff there exists a descendant 𝑤 ∈ 𝑣 such that 𝑢 matches 𝑤. With this notion, the assumption we make on a sub-scenario (𝑏, 𝑙, 𝑟) is given by 𝑏 ≃ 𝑙 ∧ 𝑏 ≃ 𝑟. Merging such a sub-scenario requires us to take shifted code into account. We will present this algorithm in Section 5.3. The merge result of the merge scenario (𝑏, 𝑙, 𝑟) is recorded in a map  so that the algorithm can query it later on demand. Instead of using the entire merge scenario (𝑏, 𝑙, 𝑟) as the index (or key) for the map , realizing that any node 𝑏 of 𝑏𝑎𝑠𝑒 appears in at most one merge scenario (by the injectivity of the matching sets), we simply use 𝑏 as the index. In the end, the target AST of the top-most merge scenario (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡) is obtained by querying (𝑏𝑎𝑠𝑒). 4.3. Which ASTs to compare? Given a three-way merge scenario (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡), which pairs of ASTs should be compared? A trivial solution (used by [2,3,12]) might be: to compare each pair of them, i.e., (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡), (𝑏𝑎𝑠𝑒, 𝑟𝑖𝑔ℎ𝑡), (𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡). Each tree matching problem yields a matching set, denoted by 𝐵𝐿 , 𝐵𝑅 and 𝐿𝑅 . This looks like a reasonable method at first sight, however, we recognize an exhibition of anti-intuitive circumstances. For example, it could be the case that there exist nodes 𝑏 ∈ 𝑏𝑎𝑠𝑒, 𝑙 ∈ 𝑙𝑒𝑓 𝑡 and 𝑟 ∈ 𝑟𝑖𝑔ℎ𝑡 such that (𝑏, 𝑙) ∈ 𝐵𝐿 and (𝑙, 𝑟) ∈ 𝐿𝑅 , meanwhile (𝑏, 𝑟) ∉ 𝐵𝑅 . Typically, it is expected that the matching relation is transitive, say given (𝑏, 𝑙) ∈ 𝐵𝐿 and (𝑙, 𝑟) ∈ 𝐿𝑅 , we should also have (𝑏, 𝑟) ∈ 𝐵𝑅 . However, the above case violates the transitivity. Fig. 3 presents a merge scenario (𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, 𝑟𝑖𝑔ℎ𝑡). Neglecting the mappings, one can recognize that both Field1 and Field2 are deletions – by row 3 of Table 1. Therefore, the expected merge result is simply a class definition without any field. We now take a closer look at the established mappings: 𝑣1 is matched with 𝑣3 (isomorphic), and 𝑣3 is matched with 𝑣4 (though not isomorphic, they are mapped because of their high similarity). However, 𝑣1 is not matched with 𝑣4 , which violates the transitivity of the matching relation. Instead, we see 𝑣2 is matched with 𝑣4 , as they are isomorphic. In the subsequent AST amalgamation pass, we have to answer a tricky question: shall we perform merging on the merge scenario (𝑣1 , 𝑣3 , 𝑣4 )? If yes, then by row 1 of Table 1, Field2 gives the merge result of the above merge scenario. Later, Field2 will be included—however, it should be regarded as a deletion. If no, then there would be no way to figure out that Field1 is a deletion. In either case, the amalgamation algorithm will not give the expected merge. To resolve the above dilemma, we propose to exclude the comparison between 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡. In the above example, that means the mapping (𝑣3 , 𝑣4 ) (red dotted line) no longer exists and by three-way merge principles, the expected merge will be successfully produced. In this way, (𝑏, 𝑙, 𝑟) is regarded as a ‘‘proper’’ merge scenario (so that amalgamation is performed on it) if and only if (𝑏, 𝑙) ∈ 𝐵𝐿 and (𝑏, 𝑟) ∈ 𝐵𝑅 . Our rationale is that, since the two variants both evolve from 𝑏𝑎𝑠𝑒, they each should be compared with 𝑏𝑎𝑠𝑒 for identifying the changes they made. Conversely, the two variants are neither evolved 5.1. Top-down merging In the top-down pass, we visit 𝑏𝑎𝑠𝑒 in a descendant recursive manner by invoking TopDownVisit (Algorithm 1). This function returns all non-trivial merge scenarios that need processing later in the bottom-up 4 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. Fig. 4. High-level workflow of our merge algorithm. Algorithm 1: Top-down pruning pass 1 2 3 4 5 6 Function TopDownVisit(𝑏: Node): 𝑆 ← []; if ∃𝑙 ∈ 𝑙𝑒𝑓 𝑡, 𝑟 ∈ 𝑟𝑖𝑔ℎ𝑡 ∶ (𝑏, 𝑙) ∈ 𝐿 ∧ (𝑏, 𝑟) ∈ 𝑅 then if 𝑏 = 𝑟 then (𝑏) ← 𝑙; return []; if 𝑏 = 𝑙 or 𝑙 = 𝑟 then (𝑏) ← 𝑟; return []; 𝑆 += (𝑏, 𝑙, 𝑟); 8 foreach child 𝑐 of node 𝑏 do 𝑆 ++= TopDownVisit(𝑐 ); 9 return 𝑆; 7 Algorithm 2: Shifted Code Merging 1 2 3 4 5 6 7 Function IssueShifted(𝑏: Node, 𝑙: Node, 𝑟: Node): let 𝑙′ , 𝑟′ be nodes s.t. (𝑏, 𝑙′ ) ∈ 𝐿 , (𝑏, 𝑟′ ) ∈ 𝑅 ; if 𝑙′ = 𝑙 ∧ 𝑟′ = 𝑟 then return (𝑏); if 𝑙′ ≠ 𝑙 ∧ 𝑟′ = 𝑟 then return 𝑙[(𝑏)∕𝑙′ ]; if 𝑟′ ≠ 𝑟 ∧ 𝑙′ = 𝑙 then return 𝑟[(𝑏)∕𝑟′ ]; if 𝑙[(𝑏)∕𝑙′ ] = 𝑟[(𝑏)∕𝑟′ ] then return 𝑙[(𝑏)∕𝑙′ ]; return 𝖢𝗈𝗇𝖿 𝗅𝗂𝖼𝗍(𝑙, 𝑟); (left-shifting) If 𝑏 matches 𝑟 but not 𝑙, then there exists a 𝑙′ such that it is shifted into 𝑙. To integrate this shifting, we first make a copy of 𝑙 and replace 𝑙′ with the merge result of (𝑏, 𝑙′ , 𝑟′ ) i.e., (𝑏) (line 4). The notation 𝑢[𝑤∕𝑣] gives an updated tree by replacing a subtree 𝑣 with 𝑤 on 𝑢. (right-shifting) Line 5 is symmetric to the above case. (consistent-shifting) If both variants involve shifted code, the only circumstance we can safely merge is when they yield the same result (line 6). (inconsistent-shifting) Otherwise, report a conflict (line 7). pass. We use a list 𝑆 to collect them. For short, we use two notations: 𝑆 += 𝑒 for appending an element 𝑒 to 𝑆, and 𝑆 ++= 𝑆 ′ for appending all elements in 𝑆 ′ to 𝑆. Upon traversing, if any trivial merge scenario is encountered, we immediately store the target AST in  and prune any further visit of its sub-scenarios by returning an empty list (lines 4– 6). Otherwise, we proceed to collect merge scenarios recursively (lines 8–9). Consider the example in Fig. 1. When merging the merge scenario consisting of the three Assignments, CastExpr from 𝑙𝑒𝑓 𝑡, InfixExpr from 𝑏𝑎𝑠𝑒 and InfixExpr from 𝑟𝑖𝑔ℎ𝑡 form the arguments 𝑏, 𝑙, 𝑟 in Algorithm 2. The result is computed by taking the subtree of CastExpr and replacing its subtree of InfixExpr with the target one (by line 4). In merging the top-most merge scenario, the result is computed from the subtree of ForStmt by replacing its child ExprStmt with the target of ExprStmts (by line 5). In this way, the shifted changes made by the two variants are integrated. 5.2. Two base cases in bottom-up merging The first case, all nodes are leaf nodes, is the base case of our merge algorithm. This case is straightforward by three-way merge principles: we either take the only-changed variant as the target (by row 1 of Table 1) or report a conflict due to the inconsistent changes (by row 2). The second case, merging constructor nodes of the same label and arity, gives a constructor node of that label and arity too, and each child node is recursively merged from the sub-scenarios formed by the children at the corresponding index. Merging list nodes gives list nodes too, and it contains elements recursively merged from certain sub-scenarios drawn from the elements in the input lists (see Section 5.4 and Section 5.5 for details). 5.4. Unordered merging Let 𝐵, 𝐿, and 𝑅 respectively be the set of elements of three unordered list nodes that form a merge scenario. The goal of unordered merging is to compute a set of elements 𝑇 – without worrying about the order – that should appear in the target list. These elements are classified as follows: 5.3. Shifted code merging (shifting) If an element 𝑏 ∈ 𝐵 satisfies 𝑏 ≃ 𝑙 ∧ 𝑏 ≃ 𝑟 for some 𝑙 ∈ 𝐿 and 𝑟 ∈ 𝑅, then the merge result is obtained by invoking IssueShifted(𝑏, 𝑙, 𝑟). (left/right-insertion) If an element of 𝐿 or 𝑅 is not related to any element of 𝐵, then by row 4 of Table 1 it is an insertion. (left/right-deletion-change conflict) If an element 𝑏 satisfies, for example (dual case is similar), 𝑏 ≃ 𝑟 for some 𝑟 ∈ 𝑅, then it is a left-deletion (thus not included in 𝑇 ) when 𝑏 = 𝑟; and a left-deletion-change conflict when 𝑏 ≠ 𝑟. Shifted code may exhibit in any type of node (except leaf node) in merge scenarios, which indicates the method of merging shifted codes is a really important utility. Algorithm 2 presents a unified algorithm for dealing with shifted code. It requires 𝑏 ≃ 𝑙 ∧ 𝑏 ≃ 𝑟. Merging is performed according to where the shifted code involves: (no shifting) If (𝑏, 𝑙, 𝑟) is proper, then we simply query  (line 3). 5 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. Algorithm 3: Unordered merging 1 2 3 4 5 6 7 8 9 Function Unordered(𝐵: Set, 𝐿: Set, 𝑅: Set ): 𝑇 ← ∅; foreach 𝑏 ∈ 𝐵 do if ∃𝑙 ∈ 𝐿, 𝑟 ∈ 𝑅 ∶ 𝑏 ≃ 𝑙 ∧ 𝑏 ≃ 𝑟 then 𝑇 ← 𝑇 ∪ {IssueShifted(𝑏, 𝑙, 𝑟)}; mark 𝑙, 𝑟 as ‘‘visited’’; Fig. 6. A swap-change merge scenario of three statement-lists (𝐵, 𝐿, 𝑅) with the target 𝑇 . A dashed edge between two statements indicates they are matched. else if ∃𝑙 ∈ 𝐿 ∶ 𝑏 ≃ 𝑙 then // right case is symmetric if 𝑏 ≠ 𝑙 then 𝑇 ← 𝑇 ∪ {𝖢𝗈𝗇𝖿 𝗅𝗂𝖼𝗍(𝑙, 𝜀)}; mark 𝑙 as ‘‘visited’’; Algorithm 4: Ordered merging 1 10 11 𝑇 ← 𝑇 ∪ {𝑒 ∣ 𝑒 ∈ 𝐿 ∪ 𝑅, 𝑒 is not ‘‘visited’’}; return 𝑇 ; 2 3 4 Function Ordered(𝐵: List, 𝐿: List, 𝑅: List ): 𝛷 ← GenConstraints(𝐵, 𝐿, 𝑅); Represent 𝛷 as a directed graph 𝐺𝛷 = ⟨𝑉 , 𝐸⟩;  ← Tarjan(𝐺𝛷 ) ; /* a topological sort of SCCs */ 5 6 for 𝑖 ← 2..|| do if there is no edge (𝑢, 𝑣) ∈ 𝐸 s.t. 𝑢 ∈ [𝑖 − 1] ∧ 𝑣 ∈ [𝑖] then /* multiple topological sorts */ 7 8 9 10 return ‘‘conflict’’ ; 𝑇 ← []; for 𝑖 ← 1..|| do if 𝜋𝐵 |[𝑖] ≠ 𝜋𝐿 |[𝑖] ∧ 𝜋𝐵 |[𝑖] ≠ 𝜋𝑅 |[𝑖] ∧ 𝜋𝐿 |[𝑖] ≠ 𝜋𝑅 |[𝑖] then /* three unique constraints in a SCC */ Fig. 5. A conflicting merge scenario of three statement-lists (𝐵, 𝐿, 𝑅) with the target 𝑇 . A dashed edge between two statements indicates they are matched. 11 12 13 The above is realized as Algorithm 3: First, traverse the elements in 𝐵 and collect any shifting (line 4) or left-deletion-change conflict (line 7, right-deletion-change conflict is symmetric) in 𝑇 . Meanwhile, mark every relevant left/right element as ‘‘visited’’ (lines 6 and 9). Then, all elements yet not marked must be left/right-insertions: thus insert them into 𝑇 (line 10). 14 15 16 return ‘‘conflict’’ ; if 𝜋𝐵 |[𝑖] = 𝜋𝐿 |[𝑖] then 𝑋 ← 𝑅 ; else if 𝜋𝐵 |[𝑖] = 𝜋𝑅 |[𝑖] then 𝑋 ← 𝐿 ; else 𝑋 ← 𝐿 ; /* 𝜋𝐿 |[𝑖] = 𝜋𝑅 |[𝑖] foreach 𝑢 in the order 𝜋𝑋 |[𝑖] do 𝑇 += 𝑢 ; */ return 𝑇 ; relation is denoted by ⊲𝑋 (for 𝑋 ∈ {𝐵, 𝐿, 𝑅}), formally defined as 𝑋[𝑖] ⊲𝑋 𝑋[𝑗] ⟺ 𝑖 < 𝑗. Let 𝑆 be the set of elements that should appear in the target list 𝑇 , computed by Unordered(𝐵, 𝐿, 𝑅). In Section 5.4, we classify these elements into: shifting, left/right-insertion, and left/rightdeletion-change conflict. Each element is computed from certain elements in the input merge scenario. For example, if 𝑡∗ ∈ 𝑆 is a left-insertion, then 𝑡∗ is a copy of some 𝑙∗ ∈ 𝐿. We encode their relationship as partial functions 𝜋𝐵 ∶ 𝐵 ⇀ 𝑇 , 𝜋𝐿 ∶ 𝐿 ⇀ 𝑇 and 𝜋𝑅 ∶ 𝑅 ⇀ 𝑇 . For the above example, we let 𝜋𝐿 (𝑙∗ ) = 𝑡∗ . We encode the relationship as partial functions 𝜋𝐵 ∶ 𝐵 ⇀ 𝑇 , 𝜋𝐿 ∶ 𝐿 ⇀ 𝑇 and 𝜋𝑅 ∶ 𝑅 ⇀ 𝑇 , associating an element in the input merge scenario with the corresponding element in the target list. For example, if 𝑡∗ ∈ 𝑆 is a left-insertion, then 𝑡∗ is a copy of some 𝑙∗ ∈ 𝐿, thus we let 𝜋𝐿 (𝑙∗ ) = 𝑡∗ . 5.5. Ordered merging Merging ordered lists is more complex than merging unordered lists in that the elements of the target list should be in an order preserving the original occurrence order of associated elements in the merge scenario; and it is necessary to decide whether such an order uniquely exists—if not, to fit the merge algorithm into a conservative setting, conflicts shall be reported as well. For example, Fig. 5 depicts a merge scenario where the three ordered lists 𝐵, 𝐿, and 𝑅 each represent the statements inside a block. For clearance, we display the elements of each list in a linked chain rather than an AST. A dashed edge between two nodes indicates they are matched. By three-way merge principles, Stmt1 and Stmt5 should be included in the target list 𝑇 : since Stmt1 precedes Stmt4 in 𝐵, stmt1 must also precede Stmt5 in 𝑇 . Next, both Stmt2 and Stmt3 should be included in 𝑇 as they are insertions. However, their insertion order is ambiguous: no information can tell if Stmt2 should precede Stmt3 or the other way around; indeed, we just know either must come in between Stmt1 and Stmt5. Due to the ambiguous insertion order, a conflict has to be reported. In some extreme cases, strictly preserving the order is not desired, particularly, when the change introduced is just permutating the order. For example, in Fig. 6, Stmt1 and Stmt2 are swapped in 𝐿. By three-way merge principles, since only 𝐿 introduces a swap-change, the target 𝑇 should accept this swap-change. Definition 3 (Order-preserving). We say an ordered list 𝑇 is orderpreserving w.r.t. (𝐵, 𝐿, 𝑅) if 𝑇 is a permutation of 𝑆 = Unordered(𝐵, 𝐿, 𝑅) such that 𝜋𝐵 , 𝜋𝐿 , and 𝜋𝑅 are monotone. A partial function 𝑓 ∶ 𝑋 ⇀ 𝑌 is said monotone if for every 𝑥1 , 𝑥2 ∈ 𝑋 such that 𝑓 (𝑥1 ) and 𝑓 (𝑥2 ) are both defined, 𝑥1 ⊲𝑋 𝑥2 entails 𝑓 (𝑥1 ) ⊲𝑌 𝑓 (𝑥2 ). In the above, we use the monotonicity condition to formalize our requirement of ‘‘preserving the original occurrence order’’. Algorithm. The main goal of the algorithm (4) is to solve an orderpreserving list and to decide the uniqueness of such lists. We compute an order-preserving list via constraint-solving—the constraints encode the monotonicity condition for the target list 𝑇 by Definition 3. Technically each constraint has the form 𝑒1 ⊲𝑒2 , meaning ‘‘𝑒1 precedes 𝑒2 in 𝑇 ’’. We propose an algorithm GenConstraints (Algorithm 5) to produce them by traversing the elements of 𝐵, 𝐿, and 𝑅 in their occurrence order. We explain its correctness by the following lemma: Order-preserving. Before presenting the merge algorithm, we first need a formal interpretation of ‘‘preserving the original occurrence order’’. Since the occurrence order is a partial order relation, it is natural to regard the three ordered lists in the merge scenario (𝐵, 𝐿, 𝑅) as three ordered sets ⟨𝐵, ⊲𝐵 ⟩, ⟨𝐿, ⊲𝐿 ⟩ and ⟨𝑅, ⊲𝑅 ⟩. The occurrence order 6 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. Now let us move back to Algorithm 4. Let 𝛷 be the set of constraints returned by Algorithm 5 (line 2). We represent the constraints as a directed graph (line 3) 𝐺𝛷 = ⟨𝑉 , 𝐸⟩, where: (1) the set of vertices are the elements of Unordered(𝐵, 𝐿, 𝑅), i.e., 𝑉 = 𝑆, and (2) for each constraint (𝑒1 ⊲ 𝑒2 ) ∈ 𝛷, let (𝑒1 , 𝑒2 ) ∈ 𝐸 be an edge of 𝐺𝛷 . It is well-known from graph theory that: there is a one-one correspondence between a topological sort of 𝐺𝛷 and a satisfying solution of 𝛷, which further implies that: there is a one-one correspondence between an order-preserving list and a topological sort of 𝐺𝛷 . The first algorithm that comes to mind is perhaps a topology sort algorithm such as Kahn’s algorithm [19]. However, it is too strict to handle cases like Fig. 6. A more flexible way is to compute strongly connected components (SCCs) first, try to merge each SCC on its own, then try to find a unique topological sort of the directed acyclic graph formed by the SCCs. Algorithm 4 shows the details. In line 4, we facilitate Tarjan’s strongly connected components algorithm to compute SCCs. Contracting each SCC as a supernode, we get a component graph, which is directed acyclic. As a byproduct, Tarjan’s algorithm also produces a topological sort of component graph. We store the computed topological sort of SCCs as a one-based list in . Lines 5 – 7 check if the topological sort is unique, which is equivalent to that there is an edge between each pair of adjacent SCCs in the component graph. And, this is equivalent to check that whether there are two connected nodes respectively from the adjacent SCCs (line 6, highlighted). Then we try to merge each SCC and concatenate the merged results sequentially in 𝑇 (lines 8–15). We use 𝜋𝐵 |[𝑖] to denote the constraints generated for SCC [𝑖] from 𝐵. The notations for 𝐿 and 𝑅 are similar. Lines 10 – 11 consider a case that is too complicated to deal with: the order constraints generated from the three versions are pairwise distinct. We conversely report a conflict for this case. If it is not the case, we can merge the SCC along the order constraints of one version. Lines 12 – 14 choose which version to obey: (1) if the constraints of 𝐵 and 𝐿 are equal, which means only 𝑅 possibly introduces changes, we choose 𝑅 (line 12); (2) symmetrically, if the constraints of 𝐵 and 𝑅 are equal, which means only 𝐿 possibly introduces changes, we choose 𝐿 (line 13); (3) if 𝐿 and 𝑅 introduce the same changes, we adopt the changes (line 14). The concatenating operation takes place in line 15. When the concatenation finishes and no conflict has been found, line 16 returns the target list. The ordered merge algorithm is linear because both Tarjan’s algorithm and the generation of constraints are linear. Moreover, the other algorithms mentioned before are also linear, thus: Algorithm 5: Constraints generation 1 2 3 4 5 6 Function GenConstraints(𝐵: List, 𝐿: List, 𝑅: List ): foreach 𝑏 ∈ 𝐵 do if ∃𝑙 ∈ 𝐿, 𝑟 ∈ 𝑅 ∶ 𝑏 ≃ 𝑙 ∧ 𝑏 ≃ 𝑟 then 𝑡 ← IssueShifted(𝑏, 𝑙, 𝑟); 𝜋𝐵 ← 𝜋𝐵 ∪ {𝑏 ↦ 𝑡}, 𝜋𝐿 ← 𝜋𝐿 ∪ {𝑙 ↦ 𝑡}, 𝜋𝑅 ← 𝜋𝑅 ∪ {𝑟 ↦ 𝑡}; mark 𝑙, 𝑟 as ‘‘visited’’; else if ∃𝑙 ∈ 𝐿 ∶ 𝑏 ≃ 𝑙 then 𝑡 ← IssueShifted(𝑏, 𝑙, 𝑟); 𝜋𝐵 ← 𝜋𝐵 ∪ {𝑏 ↦ 𝑡}, 𝜋𝐿 ← 𝜋𝐿 ∪ {𝑙 ↦ 𝑡}; mark 𝑙 as ‘‘visited’’; 7 8 9 10 else if ∃𝑟 ∈ 𝑅 ∶ 𝑏 ≃ 𝑟 then 𝑡 ← IssueShifted(𝑏, 𝑙, 𝑟); 𝜋𝐵 ← 𝜋𝐵 ∪ {𝑏 ↦ 𝑡}, 𝜋𝑅 ← 𝜋𝑅 ∪ {𝑟 ↦ 𝑡}; mark 𝑟 as ‘‘visited’’; 11 12 13 14 foreach 𝑙 ∈ 𝐿, 𝑙 is not ‘‘visited’’ do 𝜋𝐿 ← 𝜋𝐿 ∪ {𝑙 ↦ 𝑙}; 15 16 foreach 𝑟 ∈ 𝑅, 𝑟 is not ‘‘visited’’ do 𝜋𝑅 ← 𝜋𝑅 ∪ {𝑟 ↦ 𝑟}; 17 18 22 𝛷 ← ∅; for 𝑋 ∈ {𝐵, 𝐿, 𝑅} do for 𝑖 ∈ 2 to |𝑋| do 𝛷 ← 𝛷 ∪ {𝜋𝑋 (𝑋[𝑖 − 1]) ⊲ 𝜋𝑋 (𝑋[𝑖])}; 23 return 𝛷; 19 20 21 Lemma 1. Given a merge scenario (𝐵, 𝐿, 𝑅) of three ordered lists. If 𝛷 is the constraint set returned by Algorithm 5, then the set of satisfying solutions to 𝛷 is equivalent to the set of order-preserving lists, i.e., every satisfying solution of 𝛷 is an order-preserving list and vice versa. Proof. Let 𝑇 be an order-preserving list. The monotonicity requirement is translated into the following logical formula: 𝛩 = {𝜋𝐵 (𝐵[𝑖]) ⊲𝑇 𝜋𝐵 (𝐵[𝑗]) ∣ 𝑖 < 𝑗, 𝜋𝐵 (𝐵[𝑖]) and 𝜋𝐵 (𝐵[𝑗]) are defined} ∪ {𝜋𝐿 (𝐿[𝑖]) ⊲𝑇 𝜋𝐿 (𝐿[𝑗]) ∣ 𝑖 < 𝑗, 𝜋𝐿 (𝐿[𝑖]) and 𝜋𝐿 (𝐿[𝑗]) are defined} ∪ {𝜋𝑅 (𝑅[𝑖]) ⊲𝑇 𝜋𝑅 (𝑅[𝑗]) ∣ 𝑖 < 𝑗, 𝜋𝑅 (𝑅[𝑖]) and 𝜋𝑅 (𝑅[𝑗]) are defined} Therefore, it suffices to show: ⋀ ⋀ 𝜃 ⟺ 𝜑. 𝜃∈𝛩 Theorem 1. The time complexity of the entire structured merge algorithm is linear (to the size of the input merge scenario). (1) 𝜑∈𝛷 On the one hand, by comparing lines 20–22 of the algorithm (i.e., the definition of 𝛷) to the definition of 𝛩, we observe that 𝛷 ⊆ 𝛩, which implies ⋀ ⋀ 𝜃 ⟹ 𝜑. (2) 𝜃∈𝛩 6. Implementation We implemented the proposed approach as a structured merge framework, Mastery, written in Java. This framework consists of four modules: 𝜑∈𝛷 1. a parser that translates input source files into ASTs; 2. a tree matcher that generates mappings between different program versions, using an adapted GumTree [9] algorithm; 3. a tree merger that computes the target AST, following the algorithms presented in Section 5; 4. a pretty printer that outputs the formatted code from the merged AST. On the other hand, 𝛩 is the transitive closure of 𝛷 over the relation ⊲𝑇 , we thus have ⋀ ⋀ 𝜑 ⟹ 𝜇, 𝜑∈𝛷 𝜇∈𝛩⧵𝛷 which implies ⋀ ⋀ 𝜑 ⟹ 𝜃. 𝜑∈𝛷 (3) Mastery currently supports merging Java programs. We rely on JavaParser,4 a mature open-source parsing library for the Java language, to build ASTs from source code and pretty print Java source 𝜃∈𝛩 By Eq. (2) and Eq. (3), we see Eq. (1) holds. □ We propose an algorithm GenConstraints to produce them by traversing the elements of 𝐵, 𝐿, and 𝑅 in their occurrence order, e.g., we generate the constraint ‘‘𝜋𝐵 (𝑏1 ) ⊲ 𝜋𝐵 (𝑏2 )’’ for 𝑏1 ⊲𝐵 𝑏2 . 4 7 https://javaparser.org/ Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. code from ASTs. To integrate with our merge algorithm, we convert the ASTs generated by JavaParser into our AST data structure (defined in Section 2). 7.2. Frequency of shifted code (RQ1) In this evaluation, we aim to understand how often shifted code presents in practice. To calculate the frequency of shifted code, we use the state-of-the-art AST differencing tool, GumTree [9], to compute matchings among 𝑏𝑎𝑠𝑒, 𝑙𝑒𝑓 𝑡, and 𝑟𝑖𝑔ℎ𝑡. Note that we do not need any merge tool for this evaluation. Then, on the computed matchings, we count the number of shifted mappings (i.e. shifted code) according to Definition 1. Fig. 7 presents how many merge scenarios in each studied project involve shifted code, meaning at least one shifted mapping is detected. Among the 40,533 merge scenarios, we find 15,620 merge scenarios involve shifted code—the frequency is 38.54%. In those merge scenarios, we detect 90,982 shifted mappings—on average 2.24 shifted mappings per merge scenario. 7. Evaluation To measure the usability and practicality of our approach in realworld scenes, we extract 40,533 merge scenarios from 78 Java opensource projects hosted on GitHub, and then conduct a series of experimental evaluations to answer the following research questions: RQ1: How often does shifted code occur in real-world merge scenarios? RQ2: What is the merge accuracy of Mastery when compared to state-of-the-art merge tools? RQ3: How many merge conflicts are reported by these tools? RQ4: What is their performance from the perspective of runtime? 7.3. Taxonomy of results To understand the behavioral performance of the merge tools, we classify a merged result (for each merge scenario of each tool) into one of the following four categories: 7.1. Experimental setup To select realistic and representative merge scenarios as our evaluation dataset, we seek the top 100 most popular open-source Java projects hosted on GitHub5 ; and then exclude any non-software-project (e.g., tutorials). On the remaining 78 projects, we extract merge scenarios via an analysis of their commit histories: • expected: the merged file and the expected version (the ground truth) are syntactically equivalent, i.e., their ASTs are isomorphic to each other (allowing permutations of elements in an unordered list node); • unexpected: the merged file is conflict-free and is nonequivalent to the expected version; • conflicting : there is at least one conflicting block in the merged file; • failed: either the tool crashes or the execution exceeds the time limit 300 s. 1. Using standard git commands, we extract all merging commits with its two parents, and their base commit s. The corresponding Java source files in the three version commits are collected as a merge scenario. In terms of three-way merge, the one in the base commit is the base version, and the source files in the two parent commits are the variants (i.e. left and right) version. The corresponding source file in the merge commit itself is marked as the expected version and will be used as the ground truth. 2. Usually, not all source files are changed from one version to another—only a few are affected. Furthermore, given the base, left and right versions of a source file, if any of the two are equivalent, by three-way merge principles, we immediately know the target version without performing the merge. This kind of merge scenarios is not worth evaluating, as any merge approach should produce the correct output. To better examine the differences between the merge tools, we instead remove this kind of merge scenarios, that is, we only collect the three versions of a source file that are pairwise distinct. To achieve this, we use git diff to distinguish whether two files are distinct. 3. Some source files cannot be parsed correctly, e.g. they include unresolved conflicts. We remove them as structured approaches assume the input files must have valid syntax (checked by JavaParser). Table 2 shows the distribution of the results of the five tools. The percentage of each kind of result is visualized in Fig. 8. JDime and IntelliMerge failed on a considerable number of scenarios, mainly caused by their implementation bugs. IntelliMerge represents each program version as a program element graph, and adopts a radical merging algorithm to construct the merged graph, which may violate the three-way merge principles when a deletion happens. Thus, it tends to produce more unexpected results. For GitMerge, only 1.95% results are unexpected. To understand this phenomenon, we have to notice that all projects use GitMerge as default. If GitMerge does not report any conflict, the merged codes will usually become the ground truth in our evaluation, without being reviewed by the developers. Thus, the expected version is a kind of biased ground truth in favor of GitMerge. This finding is consistent with a previous work [20]. 7.4. Merge accuracy (RQ2) In this evaluation, we study the merge accuracy, calculated as the percentage of expected results, of the five tools on our dataset. As shown in Table 2, Mastery achieves the highest accuracy of 82.90% among all tools. Compared to JDime, Mastery gains 3.46% higher accuracy. Among the 2,028 scenarios where Mastery’s results are expected whereas JDime’s are not, we find 49.65% involves shifted code—11.11% higher than the overall frequency. To illustrate the capacity of Mastery on handling shifted codes, consider the merge scenario depicted in Fig. 9 (extracted from our dataset, slightly simplified for readability): the method call expression invocation.getAttachments(...) is shifted into a type cast expression (cast to String type) in the right version, and the left version modifies the argument of the method call (the prefix Constants is deleted). Mastery produces the desired merge, that is, to collaborate the above two changes, whereas JDime reports a conflict. Recall that IntelliMerge adopts a radical merging strategy, which makes its accuracy only 24.12%. As discussed in Section 7.3, the ground Further, we evaluate Mastery by comparing with four state-of-theart merge tools: • JDime [2], a state-of-the-art structured merge tool, • JFSTMerge [6], an improved version of FSTMerge, a wellknown tree-based semistructured merge tool, • IntelliMerge [20], a refactoring-aware graph-based semistructured merge tool, and • GitMerge, the default merging algorithm in Git. All experiments were conducted on a workstation with AMD EPYC 7H12 64-Core CPU and 1TB memory, running Ubuntu 20.04.3 LTS. 5 According to the following list, until July 12, 2021: https://github.com/ EvanLi/Github-Ranking/blob/master/Top100/Java.md. 8 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. Fig. 7. Distribution of shifted code in each project. The projects are sorted by their number of merge scenarios in ascending order. We split them into two subfigures according to if the number of merge scenarios is less than 500 (left) or not (right). Table 2 Distribution of the merged results. Tool Mastery JDime jFSTMerge IntelliMerge GitMerge Expected Unexpected Conflicting Failed Number Accuracy Number Percentage Number Percentage Number Percentage 33602 32200 30062 9777 30643 82.90% 79.44% 74.17% 24.12% 75.60% 3148 2406 3837 24553 791 7.77% 5.94% 9.47% 60.58% 1.95% 3782 4538 6626 3442 9099 9.33% 11.20% 16.35% 8.49% 22.45% 1 1389 8 2761 0 0.00% 3.43% 0.02% 6.81% 0.00% Fig. 8. Distribution of the merging results. truth is biased in favor of GitMerge, causing the accuracy of GitMerge to be even larger than jFSTMerge. The scenarios where GitMerge produces unexpected or conflicting results are of special interest to us—in these merge scenarios, the expected versions (i.e., the merged versions in Git histories) must have been reviewed by the developers. If we consider only these 9,890 scenarios, the accuracy of the five tools except GitMerge are: Mastery 33.35% JDime 31.17% jFSTMerge 17.26% IntelliMerge 6.98% Fig. 9. A merge scenario from commit 9f5cc83 of project dubbo, where an expression gets shifted. Mastery’s result is expected, while JDime’s is conflicting. Mastery still achieves the highest accuracy. 7.5. Reported conflicts (RQ3) In this evaluation, we measure the number of conflicts reported by the five tools on all merge scenarios. The numbers and percentages of conflicting results are listed in Table 2. We also count the number of conflicting blocks (i.e. conflicting hunks) reported by the five tools, which are depicted in Fig. 10. Especially, IntelliMerge’s radical strategy ignores three-way merge principles, making it achieve the lowest in both metrics. The other four tools all follow the three-way merge principles. Among them, Mastery reports the fewest 7,057 conflicting blocks and the fewest 3,782 conflicting scenarios. Among the 1,556 Fig. 10. The numbers of conflicting blocks of the five tools. 9 Journal of Systems Architecture 145 (2023) 103011 F. Zhu et al. Fig. 11. Time cost of merging. Fig. 12. A non-monotone matching between 𝑇1 and 𝑇2 (dashed lines show mappings). scenarios where JDime’s results are conflicting whereas Mastery’s are not, we find 52.76% involves shifted code—14.23% higher than the overall frequency. 3. As discussed in Section 7.3, because GitMerge is the default merging tool that developers use, if GitMerge reports no conflicts, its merging results will usually become the ground truth without a careful review by developers, even if the merging results are actually wrong. 4. By empirical inspection, we found developers introduce additional changes in some merge scenarios, rather than only collaborating on the changes from 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡. 7.6. Runtime performance (RQ4) In this evaluation, we evaluate the performance from the perspective of runtime. As unstructured GitMerge is inherently particularly efficient, we only compare the runtime performance among semistructured and structured tools. Note that this comparison is fair as all these four tools are written in Java; they are executed on JVM under the same environment. Ignoring the failed runs, Fig. 11 shows the runtime on merge scenarios sorted by the size (i.e., total file size in unit of byte) of merge scenarios in ascending order. Since the runtime of each tool has considerable ups and downs even on the merge scenarios of a similar size, for clearer illustration, we plot each point as the average of adjacent 100 merge scenarios. The average times for the four tools are: Mastery 10.16 s JDime 25.11 s jFSTMerge 13.73 s The ideal ground truth is to only merge changes in 𝑙𝑒𝑓 𝑡 and 𝑟𝑖𝑔ℎ𝑡 correctly without introducing additional changes, and conflict blocks get reported if the changes are indeed semantically contradicted. Unfortunately, manual efforts seem inevitable approaching this ideality. Limitations. Among the 1,109 merge scenarios where Mastery produces unexpected results while JDime produces expected results, we manually studied 10 random samples. We found that: In 4 merge scenarios, the expected versions introduce additional changes by developers or break three-way merge principles in other ways. Mastery produces the desired merge results w.r.t. three-way merge principles. The other 6 scenarios failed due to our limited support for two-way merging, where a merge scenario consists of only the two variants but not the base version. JDime realizes some heuristic two-way merging strategies, which can handle these merge scenarios better. These strategies can be realized in Mastery in the future. IntelliMerge 4.30 s Mastery is about 2.5 x as fast as JDime, and about 1.4 x as fast as jFSTMerge, which shows Mastery, as a structured merging tool, has competitive efficiency to semistructured merging tools. 7.7. Discussions Another limitation: Non-monotone matching. Generally but practically rarely, there could be non-monotone matching that obeys the original intuition of what matching is: to align nodes that should be the same one in the developing. For example, a member could be moved from one method to another method in development, which means the ancestor-descendant relationship between the statement and the methods is changed. However, even without such correct matching, it is also possible to correctly merge, which we will show by the example of Fig. 12. Fig. 12 presents a non-monotone matching that can be produced by GumTree. The dashed lines plot the established mappings between 𝑇1 and 𝑇2 . We see two class definitions ClassA and ClassB are matched. However, not all members of ClassA are mapped to members of ClassB – M2 is mapped to M5, which is a member of ClassC. By definition, we see the matching is not monotone. Such a matching brings troubles to AST amalgamation: in which class should the merge result of integrating M2 and M5 be on the target AST? It could be a member of ClassA (ClassB), or ClassC, and without further information, we cannot tell which is expected. Suppose we let 𝑇1 be 𝑏𝑎𝑠𝑒 and 𝑇2 be a variant, one may interpret the mapping (𝙼𝟸, 𝙼𝟻) as M2 is removed from ClassA and then inserted into ClassC. In that situation, M5 Threats to validity. We use expected versions, i.e. source files in merge commits, as ground truth. However, there are three threats to this ground truth: 1. There is no positive ground truth in merge commits since developers do not manually produce conflicts, but always try to resolve every conflict. But the resolving of conflicting changes in left and right versions must rely on additional information, which is usually available to a merging tool. Because of the lacking of positive ground truth, we cannot automatically recognize true posi

On-the-methodology-of-three-way-structured-merge 2023.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue