A Refinement Theory for Alloy

Texto

(1)Universidade Federal de Pernambuco Centro de Informática. Pós-gradua¸cão em Ciência da Computa¸cão. A Refinement Theory for Alloy Rohit Gheyi Tese de Doutorado. Orientador: Prof. Paulo Borba. Recife, 2007.

(2) UNIVERSIDADE FEDERAL DE PERNAMBUCO ´ CENTRO DE INFORMATICA. Rohit Gheyi. A Refinement Theory for Alloy. Este trabalho foi apresentado à Pós-gradua¸cão em Ciência da Computa¸cão do Centro de Informática da Universidade Federal de Pernambuco como requisito para a aprova¸cão da tese de doutorado.. SUPERVISOR: Prof. Paulo Henrique Monteiro Borba. i.

(3) ii.

(4) Dedico este trabalho ` a minha fam´ılia.. iii.

(5) Agradecimentos Após 100 mil kilometros viajando entre Campina Grande e Recife durante o mestrado e doutorado, consegui alcan¸car mais um objetivo em minha vida. Sem a ajuda de algumas pessoas, com certeza não teria conseguido chegar até aqui. Abaixo estão meus agradecimentos: • a Deus por estar sempre presente em minha vida; • a minha fam´ılia que foi de fundamental importância por ter chegado até aqui, pela ajuda e por sempre me proporcionar todas as condi¸cões necessárias para que pudesse terminar o doutorado; • ao professor Paulo Borba pela amizade, aprendizado, pela preocupa¸cão com a minha forma¸cão, e por todos os ensinamentos que levarei pelo restante da minha vida; • a Tiago Massoni pelo companherismo em todas as horas, além das sugestões que ajudaram a enriquecer o trabalho; • aos professores Augusto Sampaio e Alexandre Mota pelas discussões que ajudaram não só neste trabalho mas na minha forma¸cão; • ao professor Daniel Jackson pelos ensinamentos e pela oportunidade de passar alguns meses no Massachusetts Institute of Technology; • ao professor Edilson Ferneda pela amizade e motiva¸cão durante a inicia¸cão cient´ıfica na UFCG; • aos membros da banca de avalia¸cão (Alexandre Mota, Ana C. V. Melo, Anamaria Moreira, Augusto Sampaio e David Naumann) pela sugestões que permitiram melhorar este trabalho; • aos amigos Adalberto Cajueiro, Alan Kelon, Alberto Neto, Ayla Dantas, Cleiton Silva, Lu´ıs Silvera, Márcio Cornélio, Marcos Dósea, Rodrigo Ramos, Sérgio Soares, Thiago Santos; • aos membros do Software Productivity Group (CIn/UFPE), Software Reliability Group (CIn/UFPE) e Software Design Group (SDG/MIT); • ao CNPq por financiar a minha pesquisa desde a inicia¸cão cient´ıfica.. iv.

(6) Resumo Refatoramentos são geralmente propostos de maneira ad hoc, porque é dif´ıcil provar formalmente que eles preservam comportamento. Na prática, desenvolvedores, mesmo utilizando ferramentas de refatoramento, têm que usar compila¸cão e testes para garantir que os refatoramentos são corretos. Esse cenário não é desejado principalmente no desenvolvimento de sistemas cr´ıticos. No caso de refatoramento de modelos de objetos, boa parte das transforma¸cões se baseia em argumenta¸cões informais. Um outro problema é que as no¸cões de equivalência para modelos de objetos são muito concretas, no sentido que elas assumem que os modelos devem possuir opera¸cões, ou os mesmos nomes e estruturas. Isso não é adequado em várias situa¸cões: durante refatoramento de modelos, quando usamos elementos do modelo que são auxiliares, ou quando os modelos comparados possuem elementos distintos, mas que são relacionados. Neste trabalho, nosso objetivo é propor um conjunto de transforma¸cões que preservam semântica para Alloy, que é uma linguagem formal de modelagem orientada a objetos. Nós especificamos em PVS um conjunto de regras de boa forma¸cão e estendemos a semântica para Alloy, e mostramos que as transforma¸cões propostas são corretas no provador de teoremas de PVS. Mostramos também que este conjunto de transforma¸cões é relativamente completo no sentido que, com ele, podemos derivar um conjunto representativo de transforma¸cões. Além disso, propomos uma no¸cão de refinamentos mais abstrata e flex´ıvel para modelos de objetos, na qual nosso conjunto de transforma¸cões se baseia. Esta no¸cão foi especificada em PVS, onde provamos algumas propriedades da mesma. Além de provarmos que ela é composicional, relacionamos a mesma com a no¸cão de refinamento de dados para Z. Estas transforma¸cões são u ´teis não só para derivarmos refatoramentos formalmente, como também para otimiza¸cões. Além disso, mostramos que as transforma¸cões podem ser utilizadas para derivar refatoramentos que introduzem formalmente padrões de projeto em Alloy.. Palavras-chave: refatoramentos, refinamentos, modelos de objetos, prova de teoremas.. v.

(7) Abstract Refactorings are usually proposed in an ad hoc way because it is hard to guarantee their soundness with respect to a formal semantics. In practice, even developers using refactoring tools must rely on compilation and tests to guarantee semantics preservation, which may not be satisfactory to critical software development. In the case of object model refactorings, most proposed transformations rely on informal argumentation. An additional problem is that equivalence notions for object models are usually too concrete in the sense that they assume that the compared models have operations or are formed of elements with the same names or structures. This is not adequate in several situations: during model refactoring, when using auxiliary model elements, or when the compared models comprise distinct but corresponding elements. In this work, we propose a set of semantics-preserving structural model transformations for Alloy, which is a formal object-oriented modeling language. We specify in PVS well-formedness rules and extend the semantics proposed for Alloy, and show that these transformations are sound in the PVS theorem prover. This set of transformations is proved relatively complete in the sense that, from it, we can derive a representative set of model transformations. Moreover, we propose an abstract and flexible refinement notion for object models. Our semantics-preserving transformations are based on that. We encoded this notion in PVS and proved some properties of it. Additionally, we show that it is compositional and relate it to the data refinement notion for Z. Such semantics-preserving transformations can be applied for deriving object model refactorings and optimizing analysis. Additionally, we show how our transformations can be used to derive refactorings that formally introduce design patterns in Alloy.. Keywords: refactorings, refinements, object models, theorem proving.. vi.

(8) Contents 1 Introduction 1.1 Problem . . . . . . . . . . . 1.2 Motivating Examples . . . . 1.3 Solution . . . . . . . . . . . 1.4 Summary of Contribuitions 1.5 Organization . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 2 State of the Art on Refactoring 2.1 Software Maintenance . . . . . . . . . . . . 2.2 Program Refactoring . . . . . . . . . . . . . 2.2.1 Definition . . . . . . . . . . . . . . . 2.2.2 Goal . . . . . . . . . . . . . . . . . . 2.2.3 Example . . . . . . . . . . . . . . . . 2.2.4 Activities . . . . . . . . . . . . . . . 2.2.5 Behavior Preservation . . . . . . . . 2.2.6 Program Refactoring Overview . . . 2.3 Model Refactoring . . . . . . . . . . . . . . 2.3.1 Definition . . . . . . . . . . . . . . . 2.3.2 Motivating Examples . . . . . . . . . 2.3.3 Object Model Refactoring Overview . 3 Alloy and PVS Overview 3.1 Alloy 4 . . . . . . . . . . . . . . . . . . . . . 3.1.1 Graphical Notation . . . . . . . . . . 3.1.2 Signatures . . . . . . . . . . . . . . . 3.1.3 Facts . . . . . . . . . . . . . . . . . . 3.1.4 Functions and Predicates . . . . . . . 3.1.5 Analysis . . . . . . . . . . . . . . . . 3.1.6 Alloy 4 Core Language . . . . . . . . 3.1.7 Differences between Alloy 2, 3 and 4 3.2 PVS . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Types . . . . . . . . . . . . . . . . . 3.2.2 Formula . . . . . . . . . . . . . . . . 3.2.3 Theory . . . . . . . . . . . . . . . . . 3.2.4 Abstract Datatypes . . . . . . . . . . 3.2.5 Prover . . . . . . . . . . . . . . . . . vii. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. 1 1 2 4 5 6. . . . . . . . . . . . .. 7 7 8 9 9 9 12 12 16 21 21 22 23. . . . . . . . . . . . . . .. 26 26 26 27 30 31 32 33 33 34 34 36 37 37 38.

(9) 4 Alloy Semantics 4.1 Our Subset of Alloy Language . . 4.2 Abstract Syntax . . . . . . . . . . 4.2.1 Signatures and Relations . 4.2.2 Expressions and Formulae 4.3 Well-formedness Rules . . . . . . 4.3.1 Type System . . . . . . . 4.4 Semantics . . . . . . . . . . . . . 4.4.1 Explicit Constraints . . . 4.4.2 Implicit Constraints . . . 4.4.3 Well-formed Instances . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 39 39 41 41 42 43 44 49 50 52 52. 5 A Refinement Notion 5.1 Example . . . . . . . . . . . . . . . . . . . . . . . 5.2 Formalization . . . . . . . . . . . . . . . . . . . . 5.2.1 Equivalence Notion . . . . . . . . . . . . . 5.2.2 Valid View . . . . . . . . . . . . . . . . . . 5.2.3 Refinement Notion . . . . . . . . . . . . . 5.3 Properties . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Basic Properties of the Refinement Notion 5.3.2 Basic Properties of the Equivalence Notion 5.3.3 Decreasing a View . . . . . . . . . . . . . 5.3.4 Increasing a View . . . . . . . . . . . . . . 5.3.5 Decreasing an Alphabet . . . . . . . . . . 5.3.6 Empty Alphabet . . . . . . . . . . . . . . 5.3.7 Adding Formulae . . . . . . . . . . . . . . 5.4 Compositionality . . . . . . . . . . . . . . . . . . 5.4.1 Adding Signatures and Relations . . . . . 5.4.2 Adding Formulae . . . . . . . . . . . . . . 5.5 Relationship with Data Refinement . . . . . . . . 5.5.1 Data Refinement Overview . . . . . . . . . 5.5.2 Relationship . . . . . . . . . . . . . . . . . 5.6 Unsoundness . . . . . . . . . . . . . . . . . . . . . 5.6.1 The Z Backward Simulation Rule . . . . . 5.6.2 Applying Data Refinement in Alloy . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. 54 55 58 58 59 61 63 63 65 66 67 68 69 70 70 74 79 82 82 87 94 94 98. 6 Modeling Laws 6.1 Primitive Laws for Alloy . 6.1.1 Laws for Signatures 6.1.2 Laws for Relations 6.1.3 Laws for Formulae 6.1.4 Other Laws . . . . 6.2 Soundness . . . . . . . . . 6.2.1 Theorems . . . . . 6.2.2 Laws . . . . . . . . 6.3 Completeness . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 102 102 103 107 109 111 112 112 114 117. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. viii. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . ..

(10) 6.4. 6.5. 6.3.1 Removing Syntactic Sugar Constructs . 6.3.2 Removing Top-level Signatures . . . . 6.3.3 Replacing Formulae . . . . . . . . . . . 6.3.4 Generalization . . . . . . . . . . . . . . Case Studies . . . . . . . . . . . . . . . . . . . 6.4.1 Hotel Room Locking . . . . . . . . . . 6.4.2 Java Types . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . 6.5.1 Object Model Refactorings . . . . . . . 6.5.2 Model-driven Program Refactorings . . 6.5.3 Compilation Process . . . . . . . . . . 6.5.4 Optimizations . . . . . . . . . . . . . . 6.5.5 Other Applications . . . . . . . . . . .. 7 Conclusions 7.1 Related Work . . . . . . . 7.1.1 Alloy’s Semantics . 7.1.2 Refinement Notion 7.1.3 Model Refactoring 7.2 Future Work . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . .. 144 . 146 . 146 . 147 . 149 . 152. Appendices. 118 118 120 120 122 122 128 135 135 138 139 140 141. 154. A Alloy Semantics Formalization A.1 Lists . . . . . . . . . . . . . . . . . . . . A.2 Syntax . . . . . . . . . . . . . . . . . . . A.3 Well-Formedness Rules and Type System A.4 Semantics . . . . . . . . . . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 154 . 154 . 156 . 159 . 168. B Refinement Notion Formalization. 175. C Introduce Signature Law Proof C.1 Syntax and Conditions . . . . . . . . . . . . . . . . . C.2 Well-formedness Rules and Type System Preservation C.2.1 Introduce an Empty Signature . . . . . . . . . C.2.2 Remove an Empty Signature . . . . . . . . . . C.3 View’s Validity Preservation . . . . . . . . . . . . . . C.3.1 Introduce an Empty Signature . . . . . . . . . C.3.2 Remove an Empty Signature . . . . . . . . . . C.4 Semantics Preservation . . . . . . . . . . . . . . . . . C.4.1 Introduce an Empty Signature . . . . . . . . . C.4.2 Remove an Empty Signature . . . . . . . . . . Bibliography. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 186 186 187 187 188 188 189 189 189 190 190 193. ix.

(11) List of Figures 1.1 Push Down Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Converting a Generalization into a Relation . . . . . . . . . . . . . . . .. 3 4. 2.1 2.2 2.3. Introduce Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Introduce Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Part of the MDA Approach [92] . . . . . . . . . . . . . . . . . . . . . . . 23. 3.1. Bank System Object Model . . . . . . . . . . . . . . . . . . . . . . . . . 27. 4.1 4.2. Banking System Application . . . . . . . . . . . . . . . . . . . . . . . . . 44 Banking System Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . 49. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15. Two Object Models Representing Part of a Banking System Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . View Application . . . . . . . . . . . . . . . . . . . . . . . . Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equivalence Notion of an Extended Banking System . . . . . Decreasing a View . . . . . . . . . . . . . . . . . . . . . . . Decreasing an Alphabet . . . . . . . . . . . . . . . . . . . . Decreasing a View in a Refinement Chain . . . . . . . . . . Compositionality . . . . . . . . . . . . . . . . . . . . . . . . Syntactically Different Formulae . . . . . . . . . . . . . . . . Adding Signatures and Relations . . . . . . . . . . . . . . . Adding Formulae . . . . . . . . . . . . . . . . . . . . . . . . Forward Simulation . . . . . . . . . . . . . . . . . . . . . . . Relationship of Our Refinement Notion and the Z Notion . . Light Bulb Traces . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. 55 56 57 58 59 66 68 69 71 72 75 79 84 91 95. 6.1 6.2 6.3 6.4 6.5 6.6. Initial Model of a Banking System . . Final Model of a Banking System . . Java Types Object Model . . . . . . Refactored Java Types Object Model Model-driven Refactoring . . . . . . . Components’ Specification Matching. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 117 119 128 130 142 143. 7.1. Extending a Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 149. x. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . ..

(12) List of Tables 3.1 Summary of Signature Qualifiers 3.2 Summary of Relation Qualifiers . 3.3 Summary of Alloy Quantifiers . . 3.4 Summary of Alloy Operators . . . 3.5 Summary of Expression Qualifiers. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 27 28 31 31 32. 4.1 4.2 4.3. Well-Typed Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Well-Typed Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Expression Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48. 5.1. Relating Alloy and Z Formulae and Expressions . . . . . . . . . . . . . . 90. xi.

(13) Chapter 1 Introduction Evolution is an important and demanding software development activity, as the originally designed software structure usually does not accommodate adaptations, demanding new ways to reorganize software. Modern development practices, such as program refactoring [45], improve programs while maintaining their original behavior, in order, for instance, to prepare software for change.. 1.1. Problem. In the context of refactoring application, little has been done for easing the task of developers when refactoring programs and consistently updating the resulting changes to the associated models. One approach is to reverse engineering the refactored program in order to yield its structure. However, model visualization is cluttered by details inherent to implementation, usually restraining abstraction. It is usually difficult to extract structural design intent (for example, model invariants) from code. Another approach is to refactor the model and regenerate the code. This is usually ineffective, due to the representation gap between model and program elements. For instance, a model may have elements, such as constraints, which are not directly mapped to elements in the program. Since program statements usually refer to abstractions that have been changed by the model refactoring, this may not be useful. As a consequence, most projects abandon design information in the form of models early in the life cycle. In current practice, despite refactoring tool support, programmers still rely on successive compilation to ensure the absence of type errors and test suite executions to preserve behavior [45]. As Dijkstra said: “testing proves the presence, not the absence, of bugs.” Moreover, modifying the structure of a program may imply updating unit tests, which is a time-consuming and error-prone task. Therefore, relying on a test suite is insufficient to guarantee behavior preservation. An object model [99] describes the structural part of a system. An object model refactoring, which is a structural model transformation, improves design structure while preserving semantics. It defines a transformation from one structural model to another semantically equivalent. If they are synchronized with similar program refactorings, they may be useful for maintaining the consistency of structural models when refactoring programs [104]. In order to do that, first we need a set of sound object model. 1.

(14) 2 refactorings. However, most structural model refactorings rely on informal argumentation. Soundness proof with respect to a formal semantics may require expertise since each transformation may affect, for instance, a number of well-formedness and type rules. Thus, each refactoring may have a number of conditions, which guarantee typing and behavior preservation. The refactoring can only be applied if the conditions are satisfied. However, defining all conditions required for a transformation to be semantics-preserving usually requires attention. Even a number of object model transformations proposed in the literature may lead to models with type errors or subtle semantic changes in some situations, as we show in Section 1.2. Furthermore, we have to be careful in order to avoid introducing inconsistencies. Moreover, to our knowledge, there is no comprehensive set of object model refactorings to help designers improve their structural models. An additional problem is that current refinement notions, which define when a model is better than another, for object models are usually too concrete. For example, some of them require the object models to have operations. However, since object models are usually abstract, they do not always have operations. As another example, a different notion assumes that the compared models are formed of elements with the same names and/or the same structure. This is not adequate in several situations. For example, object model refactoring changes the structure of models, yet maintaining the original semantics. Nevertheless, it is difficult to verify whether the resulting model preserves semantics, especially when model elements are replaced by alternative structures. Furthermore, auxiliary model elements may be used, which should be ignored when calculating equivalences. Also, when the compared models comprise distinct but corresponding elements, we can find models that are intuitively equivalent but cannot be proved so based on the usual equivalence notion. Problem. In summary, there is no comprehensive set of object model refactorings, provably sound to help designers. Additionally, we need a more flexible and abstract refinement notion for object models.. 1.2. Motivating Examples. In this section, we describe how even simple object model refactorings proposed in the literature may lead to undesired results such as ill-typed or semantically distinct models. These examples show that when proposing object model refactorings, we have to prove not only the preservation of semantics, but also the absence of type errors. So, each refactoring must transform a well-formed model into another well-formed model that is semantically equivalent. In the context of a banking system application, Figure 1.1, which presents two object models, shows a transformation that pushes down a relation. The transformation, which is proposed elsewhere [43, 14], can always be applied (there is no condition for applying this refactoring). When considering object models with constraints, the previous transformation may introduce type errors, as explained next. Each box [99] in Figure 1.1, such as Account, represents a set of objects. The arrows, such as the card arrow from Account to Card, are relations and indicate how objects of a set are related to objects in other sets. An arrow with a closed head.

(15) 3 form, such as from ChAcc to Account, denotes the subtype relationship. The left-hand side (LHS) diagram states that accounts may be checking or saving. Each account may have bank cards since card does not have multiplicities. By default, it is the unconstrained multiplicity. Moreover, there is an invariant (no SavAcc.card) stating that saving accounts do not have a bank card. The join operator (.), in this case, denotes the standard relational composition. The no keyword, when applied to an expression, denotes that this expression has no elements. The ps keyword is a variable representing a surrogate for the rest of the model.. Figure 1.1: Push Down Relation From the invariant and the fact that accounts can only be checking or saving, we conclude that only checking accounts can have bank cards. So, we may apply the Push Down Field refactoring [45] and push down the card relation to ChAcc, yielding the righthand side (RHS) diagram. It is important to mention that the refactorings proposed [45] are for programs. However, we can state similar ones for object models. A deeper analysis shows that this transformation, although preserving semantics, introduces a type error in the refactored diagram considering the type system proposed for object models [38]. Since card is now declared in ChAcc, we cannot join SavAcc and card. So, the RHS diagram in Figure 1.1 becomes ill-typed. The Pull Up Field refactoring [45] also cannot be applied sometimes. We have to make sure that pulling up a relation does not introduce name conflicts, for instance. These kinds of errors can be easily checked with a type-checker. Another transformation [63] allows us to always convert a generalization into an injective function, as exemplified in Figure 1.2. This transformation corresponds to the Replace Inheritance with Delegation program refactoring [45]. For instance, the LHS diagram states that a saving account is a type of account. Applying the proposed transformation results in a diagram in which each saving account is related to exactly one account by the acc relation. The generalization is converted into acc and multiplicities constraints stating that acc is an injection. Suppose that we have a constraint stating that all accounts have a special credit on the LHS model. By applying this transformation, this special credit is removed from all saving accounts, since they are no longer accounts. So, the transformation proposed is not semantics-preserving. Moreover, since SavAcc is not a subtype of Account after the transformation, we have to make sure that the transformation does not introduce type errors. Applying the transformation depicted in Figure 1.2, from right to left, corresponds to the Replace Delegation with Inheritance program refactoring [45]. The model refactoring cannot be applied sometimes. For instance, if there is an explicit global invariant stating.

(16) 4. Figure 1.2: Converting a Generalization into a Relation. that SavAcc has more elements than Account, applying the transformation yields an inconsistent model since the subclass has more elements than its superclass. The previous transformations are proposed for UML class diagrams [19]. The previous errors shown for object models also happen in UML. The examples presented in this section suggest the importance of formally proposing and proving that each object model refactoring preserves type system and semantics.. 1.3. Solution. In this work, we propose a comprehensive set of semantics-preserving transformations for Alloy, which is a formal object-oriented modeling language [83] discussed in Section 3.1. Moreover, we show that this set of transformations is relatively complete, in the sense that it is sufficient to reduce an arbitrary Alloy model to a normal form. Thus, we can derive a comprehensive set of provably correct model transformations by the composition of more basic transformations. We follow a similar approach used for imperative and object-oriented languages [75, 20]. We encode a type system, specify well-formedness rules and an extended semantics for Alloy in the Prototype Verification System (PVS) [118], which encompasses a formal specification language and a theorem prover. Our transformations are proven sound with respect to this formal semantics. In Section 6.2, we justify some of the proposed transformations. In order to compare object models, we propose an abstract and flexible refinement notion, and encode it in PVS. Our transformations relate models that are equivalent (bidirectional refinement) based on this notion. Moreover, we prove a number of properties of this notion in PVS, such as compositionality. Additionally, we relate it to the Z data refinement [149] in order to evaluate our notion. When relating both notions, we found that the backward simulation rule for Z is unsound (Section 5.6.1). We proposed transformations for Alloy, rather than for UML [19] and the Object Constraint Language (OCL) [91], due to Alloy’s simpler semantics. In spite of this, Alloy is expressive enough to model a broad variety of applications [81]. Moreover, we [55] could specify in Alloy some of our transformations and check them using the Alloy Analyzer [82]. In our prior work [103], we show how our transformations for Alloy can be leveraged to UML class diagrams annotated with OCL. We prefer to propose primitive (core) transformations because it is much easier to prove them sound. Furthermore, since they are primitive, we can propose a relativelycomplete set of primitive transformations that is concise. Although they are primitive,.

(17) 5 we can compose them to derive interesting coarse-grained transformations. For instance, we can compose them to formally introduce Alloy design patterns (idioms). For example, an Alloy model, which describes the system behavior, can represent a state using two idioms [81]. By using our catalog, we can derive refactorings that allow us to formally change between those idioms. As another example, we formalize the compilation process performed by the Alloy Analyzer by composing our transformations [82]. Furthermore, besides being useful for clarifying Alloy’s semantics and a tool for reasoning about Alloy models, our model transformations can be used to improve the analysis performance of Alloy Analyzer 3. Massoni et al. [104] synchronize our transformations for Alloy with similar behaviorpreserving program transformations for sequential Java. This may be useful for maintaining the consistency when refactoring programs. Massoni et al. [104] show an example of how the Push Down Relation object model refactoring can be synchronized with a sequence of equivalent program refactorings, ensuring semantic preservation. If the invariants of the structural model are not available, the Push Down Field program refactoring cannot be performed in the example [104]. However, by synchronizing model and program refactorings, the program can be refactored by using the model invariants. The structural model and program are consistently transformed. Even popular program refactoring tools, such as Eclipse [37], may introduce some simple errors, such as making a program ill-typed or behaviorally different, as we show in Section 2.2.5. In case of model refactoring, this scenario is even worse since there are a few model transformations proposed in the literature, most of them in an ad hoc way. Consequently, following our approach for proposing refactorings can help improve tool support, adding reliability to software refactoring. Furthermore, our transformations can be useful in Model Driven Architecture (MDA) [92], in particular in refactorings between Platform Independent Models (PIM), which are platform independent abstract models. More details about those applications are discussed in Section 6.5.. 1.4. Summary of Contribuitions. The contributions of this work are summarized as follows: • Propose an abstract refinement notion for object models [59], in Chapter 5. Moreover, we prove a number of properties [59] of our refinement notion, such as compositionality, and relate it to the data refinement notion for Z. • Propose a comprehensive set of structural semantics-preserving model transformations for Alloy [57], as shown in Chapter 6; • Show that our set of transformations is sound [58, 61] and relatively complete, as described in Sections 6.2 and 6.3, respectively; • Encode a type system and specify well-formedness rules for Alloy [61], and specify in PVS an extended semantics [58] for Alloy, as shown in Chapter 4;.

(18) 6 • Apply our transformations to a case study [56] in Section 6.4 and show a number of applications [102, 57, 103], such as deriving refactorings in order to formally introduce Alloy idioms in Section 6.5; • Show that the backward simulation rule for Z is unsound (Section 5.6.1).. 1.5. Organization. In the following chapter, we survey the state of the art on object model and program refactoring. We provide some background on Alloy and PVS in Chapter 3. In Chapter 4, we encode in PVS a type system for Alloy, and specify an extended semantics for Alloy. Chapter 5 presents a refinement notion for object models and its properties. We relate it to the data refinement notion for Z. Our transformations are presented in Chapter 6, where we show that they are sound and relatively complete. Additionally, we apply them to a case study and show some applications. Chapter 7 summarizes the contributions of the thesis, and related and future work. In Appendices A and B, we present the Alloy semantics and our refinement notion specifications in PVS, respectively. Finally, Appendix C proves that two transformations (introducing and removing a signature) are semantics-preserving..

(19) Chapter 2 State of the Art on Refactoring The IEEE Computer Society defines software engineering as [123]: “(1) The application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software; that is, the application of engineering to software. (2) The study of approaches as in (1).” Software development efforts result in the delivery of a software product which satisfies user requirements. Existing large software is never complete and continues to evolve. As it evolves, it grows more complex unless some action is taken to reduce this complexity. Usually, as it is adapted and evolved, its original design starts to decay, thereby lowering the quality of the software. Because of this, the major part of the total software development cost is devoted to software maintenance.. 2.1. Software Maintenance. Software maintenance [69] is defined in the IEEE Standard for Software Maintenance [2] (IEEE 1219) as: “the modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment.” Goal Software maintenance must be performed in order to [1]: • Correct faults; • Improve the design; • Implement enhancements; • Interface with other systems; • Adapt programs so that different hardware, software, system features, and telecommunications facilities can be used;. 7.

(20) 8 • Migrate legacy software; • Retire software. Activities Pfleeger [121] states that the maintainer’s activities comprise four key characteristics: • Maintaining control over the software’s day-to-day functions; • Maintaining control over software modification; • Perfecting existing functions; • Preventing software performance from degrading to unacceptable levels. Categories According to the standard for software engineering and maintenance [3], software maintenance can be divided in four categories: corrective, adaptive, perfective and preventive. 1. Corrective maintenance. Reactive modification of a software product performed to correct discovered problems and fix bugs. This activity allows correct implementation of the user needs. 2. Adaptive maintenance. Modification of a software product performed to keep a software product usable in a changed or changing environment. Reality shows that changing requirements demand adaptive evolution during most of the software life cycle. 3. Perfective maintenance. Modification of a software product to improve performance or maintainability. In the latter, modifications are made to improve understanding of the system in order to apply corrective and adaptive evolution. Software may be enhanced to better address the original requirements. 4. Preventive maintenance. Modification of a software product to detect and correct latent faults in the software product before they become effective faults. Software evolution can be similarly categorized [7].. 2.2. Program Refactoring. The term refactoring [45, 87, 147] was coined by Opdyke in his PhD thesis [117] in the early nineties. He proposes a number of low-level refactorings for C++ [40]. We can view refactoring as a kind of perfective maintenance..

(21) 9. 2.2.1. Definition. The term refactoring has two definitions depending on the context used. When it is used as a noun, it has the following definition [45]. “It is a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.” When it is used as a verb (refactor), it has a different meaning [45]. “It is to restructure software by applying a series of refactorings without changing its observable behavior.” Its importance has been acknowledged by recent software development methods such as Extreme Programming [13], for example, in which refactoring is a key practice.. 2.2.2. Goal. Refactoring provides a technique for cleaning up code in an efficient and controlled manner. A developer can make many changes in software that make little or no change in the observable behavior. Only changes made to make the software easier to understand are refactorings. The idea is to clean the code by continuously removing duplication, and simplifying and clarifying code. So, among other things, we refactor a program for the following reasons [87, 45]. • Make it easier to add new code; • Improve the design of existing code; • Gain a better understanding of code; • Help to find bugs; • Help to program faster. A good contrast is performance optimization. Sometimes the term refactoring is misused. For instance, some approaches [142, 87] consider optimizations as refactorings. In fact, we can sometimes improve the quality of the code and its performance. But we have to be careful since most optimizations make the code less readable.. 2.2.3. Example. In this section, we provide a refactoring example. Suppose a program implements part of a banking system application in Java [66]. It declares an account class and two subclasses: checking and savings. Moreover, each subclass has a balance and a method that yields it, as declared next. Additionally, each account has an identifier. public class Account { int id; ....

(22) 10 } public class SavAccount extends Account { float balance; public float getBalance() { return balance; } public void debit(float amount) { balance = getBalance()-amount; } ... } public class ChAccount extends Account { float balance; public float getBalance() { return balance; } ... }. There are structures in the code that suggest the possibility of refactoring according to Beck and Fowler [45]. Inspecting the code, the program has code duplication, which is a code smell (bad smell). Long method and large class are other examples of bad smells. For each code smell, there is a list of refactorings that should be applied in order to remove it. Observe that both subclasses in the previous example have the same field (balance) and method (getBalance). If we see the same code structure in more than one place, it is better if we find a way to unify them. In order to remove the field duplication, we can first apply the Pull Up Field refactoring [45] in order to pull up the balance field from each subclass to the parent class. Each refactoring presented by Fowler [45] has a mechanics stating how we can apply the refactoring. The mechanics for Pull Up Field is: 1. Inspect all uses of the candidate fields to ensure they are used in the same way; 2. If the fields do not have the same name, rename the fields so that they have the name you want to use for the superclass field; 3. Compile and test; 4. Create a new field in the superclass. If the fields are private, you will need to protect the superclass field so that the subclasses can refer to it; 5. Delete the subclass fields; 6. Compile and test. Notice that we apply small steps followed by successive compilation and tests, in order to make sure that we preserve the static and dynamic semantics, respectively. Following the previous mechanics, the resulting code is declared next..

(23) 11 public class Account { float balance; int id; ... } public class SavAccount extends Account { public float getBalance() { return balance; } public void debit(float amount) { balance = getBalance()-amount; } ... } public class ChAccount extends Account { public float getBalance() { return balance; } ... }. So far we have removed the field duplication. Now we can apply the Pull Up Method refactoring [45] following its mechanics in order to pull up the method getBalance. We can pull up this method because both methods are equivalent (they have the same behavior). Therefore, the resulting program, which is declared in the following fragment, will have the same behavior. public class Account { float balance; int id; public float getBalance() { return balance; } ... } public class SavAccount extends Account { void debit(float amount) { balance = getBalance()-amount; } ... } public class ChAccount extends Account { ... }.

(24) 12. 2.2.4. Activities. According to Mens and Tourwé [107], the refactoring process consists of some activities. 1. Identify where the software should be refactored; 2. Determine which refactoring(s) should be applied to the identified places; 3. Guarantee that the applied refactoring preserves behavior; 4. Apply the refactoring; 5. Assess the effect of the refactoring on quality characteristics of the software (e.g., complexity, understandability, maintainability) or the process (e.g., productivity, cost, effort); 6. Maintain the consistency between the refactored program code and other software artifacts (such as documentation, design documents, requirements specifications, tests, etc.). The first step in the refactoring process is to find out the bad smells and suggest refactorings for them. Fowler [45] and Kerievsky [87] informally show a number of code smells and suggest some refactorings for them. Balazinska et al. [11] detect duplicated code by using a clone analysis tool that suggests candidates for refactoring. Ducasse et al. [36] propose an approach, which is based on an object-oriented meta model, to detect duplicated code in software and propose refactorings that can eliminate this duplication. Tourwé and Mens [146] use a semi-automated approach based on logic meta programming to formally specify and detect some bad smells and to propose refactoring opportunities that remove these bad smells. Emden and Moonen [41] have a visualization mechanism that allows us to detect bad smells. We can use some metrics in the original program in order to check whether there is some code smell. Similarly, we can verify after the refactoring whether the metrics in the resulting program have improved [140]. Observe that although the idea is to have small steps in the mechanics [45], this activity is error-prone, expensive and tedious. So, in order to avoid that, there are some tools that implement refactorings. For example, Eclipse [37] and IntelliJ Idea [84] implement refactorings for Java, while Refactoring Browser [128] implements refactorings for Smalltalk [64]. The latter one was the first refactoring tool. Roberts [127] in his PhD thesis automates the basic refactorings proposed by Opdyke. We can derive coarse-grained refactorings by composing the basic ones.. 2.2.5. Behavior Preservation. According to the refactoring definition presented in Section 2.2.1, two programs are equivalent if they have the same observable behavior. In Opdyke’s thesis [117], two C++ [40] programs have the same observable behavior if and only if seven properties are satisfied, as described next. 1. Unique superclass. Each class in the resulting program must have at most one superclass;.

(25) 13 2. Distinct class names. All classes in the resulting program must have distinct names; 3. Distinct member names. Each class in the resulting program must have distinct variable and function names; 4. Inherited member variables not redefined. A member variable inherited from a superclass is not redefined in any of its subclasses; 5. Compatible signatures in member function redefinition. Redefinitions of methods have the same signatures as the redefined method; 6. Type-safe assignments. In the resulting program, every expression that is assigned to a variable must have the same type or a subtype of the variable’s type; 7. Semantically equivalent references and operations. The resulting program must have the same output set of the original program for a given set of inputs. Besides the observable behavior preservation, the idea of refactoring is to transform a well-formed program into another that does not have compilation errors. The first six properties, which can be statically checked, are related to the static semantics preservation. However, the last property, which assures the behavior preservation, cannot be checked statically. This property is preserved when running the main function of the initial and resulting programs with the same inputs, both of them yield the same outputs. In order to make refactoring more useful in practice, Fowler proposes refactorings [45] for Java. He guarantees that two programs have the same observable behavior if the resulting program compiles and does not present failures in a test suite. The compilation is necessary in order to guarantee that the refactoring preserves the static semantics and the test suite makes sure that the behavior is preserved. However, as is well-known, a suite of tests is able only to reveal errors, not to prove their absence. Moreover, it is a time-consuming task. Thus, we have to be careful when performing refactorings, which rely on test suites to guarantee behavior-preservation. Problem Each refactoring presents a number of enabling conditions for its application. For instance, suppose in the example presented in Section 2.2.3 that each subclass has the following method, which returns the identifier of an account. int getIdentifier() { return super.id; }. Since both classes have equivalent methods, we would like to apply the Pull Up Method refactoring again. Using Fowler’s mechanics, we observe that after applying some small steps the program does not compile. When we pull up this method in the Account class, its parent class (Object) does not have the id field. So, we cannot always pull up methods that use super..

(26) 14 Most of the refactorings [45, 87, 147] proposed in the literature do not precisely state in which conditions we can apply a refactoring. In the previous example, it is easy to find out that the resulting program does not compile. However, errors related to the dynamic semantics preservation are more difficult to find out in the absence of a good test suite. In Opdyke’s thesis, a refactoring preserves behavior if seven properties are satisfied. However, there is no guarantee that preserving these properties, the resulting program has the same observable behavior of the original one. In fact, Tokuda and Batory [144] showed, based on experiments, that Opdyke’s notion is not sufficient to guarantee that the behavior is preserved; hence they extended this notion with other three properties related to the static semantics preservation. However, there is no guarantee, as in Opdyke’s thesis, that by preserving these invariants, the program behavior will be preserved. In fact, we have to know in which kind of application we are applying a refactoring. For example, if concurrency is important, additional conditions may be necessary. As another example, memory constraints and power consumption for embedded systems are important. We cannot always apply the Extract Subclass refactoring [45] because adding a new class may cause errors related to stack overflow. For real time systems, we have to be sure that the resulting program satisfies some temporal constraints. Applying the Extract Method refactoring [45] may not preserve the observable behavior for real time systems because it may decrease the program’s performance sometimes. So, when refactoring programs, we have to precisely state the equivalence notion establishing when two programs have the same observable behavior. So, it is difficult to formally prove that a program transformation preserves the observable behavior. Motivating Examples Since most refactorings are informally proposed in the literature, even popular program refactoring tools, such as Eclipse [37], may introduce some simple errors, such as transforming a well-typed program into a ill-typed one. Next we show two examples of refactorings implemented in Eclipse that do not preserve the static and dynamic semantics, respectively. Static Semantics A simple example describing part of a banking system in Java containing two kinds of accounts (savings and checking) is declared in the following fragment. Account declares a method getBalance. We would like to apply the Push Down Method refactoring [45] in getBalance to ChAccount. public class Account { float balance; public float getBalance() { return balance; } ... }.

(27) 15 public class SavAccount extends Account { void debit(float amount) { balance = getBalance()-amount; } ... } public class ChAccount extends Account {}. We choose in Eclipse that this refactoring should change Account and ChAccount classes. After applying the Push Down Method refactoring, the resulting program is ill-typed since the method debit uses the method getBalance, which is not defined in SavAccount anymore. In this particular case, the tool should at least warn the user that this transformation may introduce a compilation error, before applying the refactoring. Dynamic Semantics Next we show a simple example describing part of a banking system in Java, each class declaring a balance method. Observe that they implement different behaviors. public class Account { float balance; ... } public class SavAccount extends Account { public float balance() { return 1.2f*balance; } ... } public class ChAccount extends Account { public float balance() { return balance; } ... }. When we pull up the balance method from SavAccount and ChAccount classes to the Account class applying the Pull Up Method refactoring using Eclipse, the behavior of balance is changed. The current Eclipse’s implementation of this refactoring removes both balance methods from SavAccount and ChAccount and puts one of them in Account; hence not preserving the behavior since the initial balance methods were different. The tool applies the refactoring and does not generate any warning to the user stating that the behavior is not preserved. In the previous example, the tool would have to check whether the methods in the subclasses have the same observable behavior, which may not be even feasible. In this case, checking only some conditions is a decision of the refactoring tool developers to make a more efficient implementation, hence more useful in practice. Cornélio [28] presents more examples of refactorings in Eclipse that does not preserve behavior. Ettinger and Verbaere [42] show other refactoring bugs in Eclipse, IntelliJ Idea and Visual Studio..

(28) 16. 2.2.6. Program Refactoring Overview. In this section, we describe some related work on program refactorings. The idea of transforming a program into another with the same observable behavior is not new. There exist other approaches in non-object-oriented program restructuring that are very similar to refactorings. For instance, we can see refactorings as a special kind of refinement, which were proposed by Hoare and Dijkstra in the early seventies [73, 33]. Data refinement was proposed by Tony Hoare in the early seventies [73]. Hoare suggested a technique of accomplishing the transition from programs operating on abstract data spaces to others operating on concrete spaces, and a method of proving its correctness. The idea is to formally verify whether a concrete data type represents an abstract data type preserving the behavior of any program that uses the abstract type. A program is a sequence of operations, starting with initialization, then a finite number of operations, then a finalization. As an example, Hoare showed how we can refine SIMULA 67 programs. Based on this notion, Jones proposed a refinement notion for VDM [86] and a proof method. In his notion, the relationship between the abstract and concrete data types must be functional. Moreover, there is a notion of adequacy (the concrete values must represent all abstract values) and fully abstract (two distinct values of the abstract data type can be distinguished by some sequence of operations on the data). This notion was useful to prove many refinements in practice, but not all because the relationship was always a function. Those refinement notions were later refined in order to make it simpler and more general. Jifeng et al. [85] describe a theory of data refinement for relations. In order to prove refinement without having to reason over the space of all programs using the abstract and concrete data types, they presented two proof methods (rules): backward and forward simulations. These rules are also called as upward and downward simulations, respectively. They expressed these rules in the relational calculus, with the constraint that all the relations are total, and proved that they were sound and jointly complete. There was no notion of adequacy and fully abstract of VDM. Moreover, the relation did not need to be functional. Stepwise refinement [33] is a method for systematic construction of sequential programs. An original high level program specification is transformed, by a sequence of correctness preserving refinements, into an executable efficient program. The stepwise refinement was formalized by Back in his PhD thesis [9] in a calculus of refinements, using weakest precondition [34]. Morgan [111] puts a stronger emphasis on the calculation aspects of the refinement calculus. A number of tools check refinements. We can check data refinement using theorem provers. For instance, Z-Eves [131] is a theorem prover that allows us to check data refinement in Z. As another example, Failures-Divergence Refinement (FDR) [130] is a model-checking tool for state machines, with foundations in the theory of concurrency based around Communicating Sequential Processes (CSP) [74]. FDR automatically checks refinements in CSP. As another example, Bolton [18] proposed an approach to automatically detect a Z retrieve by using the Alloy Analyzer, and hence to verify simulation and refinement in Z. The Alloy Analyzer specifies the retrieve by enumerating its values. However, this work does not scale..

(29) 17 Laws of programming state properties about programming constructs and are useful for reasoning about programs [111]. They assist the software development process, and describe the properties of programs expressible in a suitable notation, just as the laws of arithmetic describe the properties of numeric operations. Algebraic laws provide an interface between the user of the language and the mathematical engineer who designs it. The laws of programming are like the laws of arithmetic. Several paradigms have benefited from algebraic programming laws. For example, laws of Occam [129] provide useful properties of concurrency and communication. As another example, the laws of imperative programming [75] have been useful not only for providing algebraic semantic definitions but also for establishing a sound basis for formal software development methods. Hoare et al. [75] proposed the following law, which states a property of conditional statements: P / b . Q = Q / ¬b . P. where P and Q are programs, and b is a boolean expression. The / b . is the if-then-else command. On the left-hand side of the law, if the boolean expression is true then the program P is executed. Otherwise, the program Q is executed. The right-hand side of the law says the same thing, except that we negate the boolean expression. Thus, this law says that both conditional statements are equivalent for sequential programs. The law proposed by Hoare et al. is similar to a program refactoring proposed by Fowler [45], which simplify commands and expressions. For example, if we have the following Java code fragment of a banking system application. if(!acc.isNotNegative()) { P } else { Q }. The acc is an account and isNotNegative is a method that evaluates to true when the account’s balance is not negative. We can apply the Remove Double Negative refactoring [46] to simplify the previous fragment. Refactorings can be seen as a special kind of refinements. The Remove Double Negative refactoring can be formally proved using the previous law presented by Hoare et al. [75]. Griswold [67] presents a number of meaning-preserving transformations. Moreover, he presents how we can build a meaning-preserving restructuring tool for imperative programs. Many transformations are well-known compiler optimizations or their inverses, like extracting or inlining a function. In order to ensure that the transformations are meaning-preserving, Griswold uses Program Dependence Graphs to reason about the correctness of transformation rules. A simple example of restructuring consists in replacing inline commands by calls to functions that encapsulate those commands [68] Object-oriented Refactorings Borba et al. [20, 21, 28] propose a set of program refactorings for the Refinement ObjectOriented Language (ROOL), which is a language similar to sequential Java but with copy.

(30) 18 semantics. A set of primitive laws (bidirectional program refactorings) is defined, and proved that they are behavior preserving based on a weakest preconditions semantics for ROOL [20]. By composing their laws, they formalize several refactorings. For example, next we state a refactoring rule, which formalizes the Pull Up Field and Push Down Field refactorings [45], when applied from left to right and right to left, respectively. Each refactoring consists of two templates (patterns) of equivalent ROOL programs, on the left-hand (LHS) and right-hand (RHS) sides. We can apply a refactoring whenever the template is matched by a ROOL program. A matching is an assignment of all variables occurring in LHS/RHS models to concrete values. Each refactoring may declare some meta-variables. The cds, ads and ops meta-variables denote a set of classes, attributes and operations, respectively. The c meta-variable represents the main program. The ≤ operator denotes the subtype relationship. We write (→), before the condition, to indicate that this condition is required when applying this law from left to right. Similarly, we use (←) to indicate what is required when applying the law in the opposite direction, and we use (↔) to indicate that the condition is necessary in both directions. For instance, in order to move an attribute to a superclass, we cannot introduce name conflicts. ROOL Refactoring hmove attribute to superclassi class B extends A class B extends A ads pub a : T ; ads ops ops end end =cds,c class C extends B class C extends B 0 pub a : T ; ads ads 0 ops 0 ops 0 end end provided (→) The attribute name a is not declared by the subclasses of B in cds; (←) D.a, for any D ≤ B and D 6≤ C , does not appear in cds, c, ops, or ops 0 .. A closely related approach is developed by Tip et al. [143]. They realized that some enabling conditions and modifications to source code for refactorings involving generalization, for automation in Eclipse [37], depend on relationships between types of variables. These type constraints enable the tool to selectively perform transformations on source code, avoiding type errors that would otherwise prohibit the overall application of the refactoring. They proved that these refactorings preserve typing. Mens et al. [106] use graph rewriting for formalizing program refactorings. They propose an equivalence notion for Java programs, based on three properties that can be statically checked, and formalize two refactorings for a subset of Java. They propose a static semantics for Java and prove using argumentation that both refactorings preserve it. There is no mention whether more elaborate well-formedness properties can be.

(31) 19 specified using graphs. They are not concerned with giving a language semantics, but describing a transformation. They recognize that some refactorings, such as the Move Method refactoring [45], which may deal with nested structures, require techniques to deal with complex graphs. These refactorings are difficult to manipulate by means of graph rewriting. Finally, they need to generate a graph, which may be time consuming, for applying each refactoring. KABA [138], which is a system for refactoring Java class hierarchies, is developed following a different approach. The system regards refactoring for a target set of client programs accessing a class hierarchy, trying to automatically propose refactorings by investigating (using static and dynamic analysis) the use of such classes by the clients. The static approach requires static program analysis and guarantees behavior preservation for all analyzed client programs. The dynamic approach requires dynamic program analysis and guarantees behavior preservation for all client runs of a given test suite. Tokuda and Batory [144] consider that refactorings are behavior preserving due to good engineering practices, ignoring mathematical guarantee, in order to justify a more simplified approach for automation. In this approach, refactoring tools are compared to compilers, during transformation of source code into assembly code. They argue that compilers do not lose their practical application even though it is not guaranteed that the resulting machine code behaves exactly as desired by the programmer. Fuhrer et al. [52] propose a refactoring that assists programmers with the adoption of a generic version of an existing class library. They have implemented this refactoring in Eclipse, and evaluated the work by migrating a number of moderate-sized Java applications that use the Java collections framework to Java 1.5 generic collection classes. Aspect-oriented Refactorings Cole and Borba [25] propose a set of thirty laws for AspectJ [89], which is an aspectoriented programming (AOP) language [90, 148]. Each law defines two behaviorpreserving fine-grained transformations similar to ROOL’s laws. They evaluate their laws by showing how they can derive refactorings proposed before [77, 70]. They build a proof sketch [26] for one of their laws (the Add-Before Execution) based on a formal semantics of an AOP language [94] that is similar to AspectJ. The following approaches are more practical different from the previous work, which is more formal. Monteiro and Fernandes [110] present a collection of aspect-oriented refactorings covering both the extraction of aspects from object-oriented legacy code and the subsequent tidying up of the resulting aspects. They review the traditional objectoriented code smells in the light of aspect-orientation and propose some new smells for the detection of crosscutting concerns. In addition, it is proposed a new code smell that is specific to aspects. It is important to mention that usually some object-oriented refactorings do not preserve behavior in the presence of aspects. For example, applying the Rename Method refactoring [45] may easily interfere in the pointcut definitions of the aspects. Hannemann et al. [71] have introduced an approach for refactoring crosscutting concerns (CCC) based on roles. This approach helps programmers transform scattered implementations of CCC into a modular implementation in an AOP language. The roles allow the abstract description of CCC. They have applied their approach to refac-.

(32) 20 tor instances of three different design pattern CCCs in the JHotDraw graphical editing framework. Binkley et al. [16] describe how to migrate an object-oriented program into an aspectoriented program. Some refactorings are proposed and it is shown how the UnDo crosscutting concern of the JHotDraw tool can be modularized applying some refactorings. Monteiro and Fernandes [109] describe a refactoring process transforming a Java code into an AspectJ equivalent. Refactorings for API evolution It is difficult for library developers to refactor their code because they have to limit themselves to change the internal implementation or to expand the application programming interface (API), but they cannot remove or change existing parts of the API without breaking client code (the interface must be preserved). In order to solve this problem, Henkel and Diwan [72] propose an approach and a tool for evolving APIs. When API developers refactor their API, all refactoring actions are captured. Users of the API can replay the refactorings in their components before using the new API. This approach can be useful for eliminating the deprecated methods of Java API, for instance. However, there are some limitations in this work. Not all API modifications are refactorings. Moreover, we have to remember that each refactoring has enabling conditions. Successfully applying a refactoring in an API does not imply that we can always apply it in the client program. Dig and Johnson [32] investigate the API changes of some frameworks, such as Eclipse. They have discovered that the changes that break the API user’s program are not random, but they tend to fall into particular categories. More than 80 % of these changes are refactorings. This suggests that refactoring-based migration tools should be used to update applications. They have shown what kind of transformations can break the APIs. The previous approaches show the importance that refactorings play in the evolution of components. It is important to mention that the previous approaches are based on refactorings that are not formally proven. Therefore, it is difficult to know whether a transformation is a refactoring. In another related approach [10], a migration specification defines how uses of legacy library classes are mapped to uses of their replacement classes. They evaluated their approach on a number of moderate-sized Java applications and found that, in their benchmarks, over 90% of declarations, allocation sites, and call sites were migrated successfully. They use an analysis based on type constraints to determine where, for a given migration specification, it is possible to migrate uses of legacy classes without affecting the program’s type-correctness. Other Kinds of Refactorings Some approaches show how to refactor software product lines (SPL) [24]. Alves et. al propose aspect-oriented refactorings for bootstrapping and evolving SPL [4]. They perform a qualitative and quantitative analysis of state-of-the-art SPL variability mechanisms, some of which in industrial case studies, and provide a set of refactorings for SPL..

(33) 21 Another work investigate a new definition for refactorings in the context of SPL [5], since the original definition is only defined for a single program, or at most application frameworks, but not full-fledged SPL. For example, it is desired that the SPL instances must remain the same after applying a refactoring to a SPL. Each refactoring for SPL is composed by a program refactoring and a feature model refactoring. A feature model [31] represents the common and the variable features of concept instances and the dependencies between the variable features. So, the new definition for refactoring SPL guarantees that we must at least be able to derive the same products that we have before applying the transformation. Critchlow et. al [30] tackled the same problem. They have started addressing the problem of maintaining the structural quality of Product Line Architectures (PLA). They are building the ArchRefactor tool, which provides support for refactoring a PLA in response to problems identified by some metrics. Refactorings for other paradigms have been proposed. For example, Li and Thompson explore the formal specification and proof of behaviour-preservation of refactorings for Haskell programs [97]. Moreover, they have built a refactoring tool, called HaRe. Schrijvers and Serebrenik propose a catalog of refactorings for Prolog programs [132]. Some of the refactorings have been adapted from the OO-paradigm, while others have been specifically designed for Prolog. They introduce their refactoring browser (ViPReSS), which implements most of their refactorings.. 2.3. Model Refactoring. Models consist of sets of elements that describe some physical, abstract, or hypothetical reality. Good models serve as means of communication and simulation. The idea of program refactoring can be similarly applied to models describing the behavior and the structure of a system. In this section, we focus on structural model refactorings or object model refactorings. Next we define it, and then show some related work.. 2.3.1. Definition. Design is defined [123] as both “the process of defining the architecture, components, interfaces, and other characteristics of a system or component” and “the result of [that] process.” Viewed as a process, software design is a software development life cycle activity in which software requirements are analyzed in order to produce a description of the software’s internal structure that will serve as the basis for its construction. Software design plays an important role in software development: it allows software engineers to produce various models that form a kind of blueprint of the solution to be implemented. Similarly to program refactorings, we may wish to refactor models. Next we define an object model refactoring. Our definition is used throughout this work. “It is a transformation that improves design structure while preserving semantics. They might bring similar benefits of program refactorings but with a greater impact on cost and productivity, since they are mostly useful in early investigative software development tasks.”.

(34) 22. 2.3.2. Motivating Examples. In Section 1.2, we have shown two model transformations proposed in the literature that are intended to be semantics preserving, but they do not preserve semantics. In this section, we present two other examples showing that making small modifications in a structural model may introduce inconsistencies. Both examples specify part of a banking system application. For instance, it is argued that creating a generalization between classes preserves semantics. However, the constraints in the specification can become inconsistent by introducing a generalization (Figure 2.1). For instance, we cannot declare the ChAcc class to extend the Account class when an explicit constraint in the specification states that ChAcc has more elements than Account (#ChAcc > #Account), where # denotes the cardinality set operator. The introduction of a generalization in this case makes the specification inconsistent, since the generalization constrains Account to contain all elements of ChAcc. Therefore, we can deduce that Account has the same number of elements or more elements than ChAcc, which contradicts the explicit constraint in the specification. So, performing this small transformation in the model makes the entire model inconsistent.. Figure 2.1: Introduce Generalization. As another example, introducing a relation may make a model inconsistent. Figure 2.2 shows a transformation introducing the c relation, which represents that each account has exactly one bank card. However, we cannot always perform this transformation. If we have an explicit constraint in the model stating that the Account class has at least one element and the Card class is empty, then the transformation introduces an inconsistency. The c relation must relate in the resulting diagram each account to exactly on bank card. There is at least one account, but there is no bank card to be related. So, this transformation introduces an inconsistency.. Figure 2.2: Introduce Relation.

(35) 23. 2.3.3. Object Model Refactoring Overview. In this section, we show some related work in structural model refactoring. The Model Driven Architecture (MDA) [92, 115] is a framework for software development defined by the Object Management Group (OMG). Figure 2.3 depicts the general idea of MDA. Central to MDA is the notion of creating different models at different levels of abstraction and then linking them together to form an implementation. MDA defines Platform Independent Models (PIM) [92], which have a high level of abstraction and are independent of any implementation technology. These models must be transformed into one or more Platform Specific Models (PSMs). A PSM is tailored to specify the system in terms of the implementation constructs that are available in one specific implementation technology.. Figure 2.3: Part of the MDA Approach [92]. Bergstein [14] proposes a set of five primitive object-preserving class reorganization transformations. This work formally states an object-preserving equivalence notion, which is important in this kind of work. However his notion is restrictive. Bergstein compares models that have the same base classes, which are classes that do not extend others. Moreover, he does not precisely state when the transformations can be applied. Some primitive transformations are very similar to some refactorings, such as the Pull Up Method refactoring. The set of transformations are proven to be complete and primitive. Coarse-grained transformations can be derived from the primitive ones. Sunyé et al. [139] present a set of refactorings and explain how they can be designed so as to preserve the behavior of a UML model. Some refactorings for adding, removing and moving features are presented. They do not propose a formal semantics for UML and an equivalence notion. Moreover, in some situations, OCL constraints can become ill-typed by applying some transformations. Some enabling conditions are informally presented. Some of them may require a proof assistant to be checked. For instance, in order to introduce a parent class, the inserted element must not introduce new behavior. Moreover, they also discuss some refactorings for statecharts. Lano and Bicarregui [95] present semantics for some UML diagrams in Real-time Action Logic, and a set of transformations for structural and behavioral diagrams. They.