This section describes the steps needed to reduce the number of refactoring sequences to be evaluated by the developer. To achieve that, two activities must be performed:
• Create the initial refactoring sequences: An initial representation of all the pos-sible sequences of refactoring is created, regardless of the semantics of each refac-toring pattern. A representation of refacrefac-toring sequences is the use of a notation to express the ordering of application of refactoring patterns. It is used in this ap-proach to express which are the initial possible sequences, by adding sequences to the representation and to reduce the number of sequences, by removing sequences from the representation. The refactoring sequences can be expressed using different representations, such as: trees, graphs, finite state machines, Petri nets, DFAs, and grammars, for example. This representation is manipulated to insert, remove and search for possible refactoring sequences.
• Simplify the set of sequences:The initial representation is simplified, considering the semantics of each transformation. In this step, the created representation is tra-versed, searching for simplifications or equivalences between different sequences.
The possible types of simplifications are discussed in Section 8.3.2.
Figure 8.1 shows the roles and artefacts for these two activities. The tool provider creates the initial representation for the refactoring sequences and then simplifies it us-ing a set of rules. A developer can then use this simplified representation to search for refactoring opportunities for these sequences (or for a sub-set of these sequences).
Create the Initial Sequences Tool
Provider Refactoring
Catalog
Initial Sequences Language
Grammar
Simplified Sequences
Simplify the Sequences
Simplification Rules
Figure 8.1: Roles, artefacts and activities
Section 8.3.1 describes the creation of this initial representation using a non-deterministic finite automaton and Section 8.3.2 describes the simplification rules and how each simplification can be done.
8.3.1 Creating the Initial Refactoring Sequences
A refactoring pattern is applicable to one or more symbols of a grammar (either ter-minals or non-terter-minals) and vice-versa (in this case, each symbol can havenapplicable refactoring patterns associated to it). A grammar is usually represented by a 4-tuple (N, P, P, S), in which: N is the finite set of non-terminal symbols, P
is the finite set of terminal symbols, S is the initial non-terminal symbol and P is a finite set of production rules.
Refactoring sequences are composed of two or more refactoring patterns which are applied in sequence. To create the initial sequences, there is the need to know for which set of refactoring patterns the sequences will be created. For the sake of simplicity, it is suggested to group the refactoring patterns by the grammar symbols they affect. For instance, the sequences in the example are sequences for the manipulation of methods, which affects the non-terminalmethodof an object-oriented language.
To create the initial refactoring sequences, there is the need to bind the set of refactor-ing patterns to the grammar of the language for which the sequences will be created. This thesis uses the termrefactoring catalogueto denote a named set of refactoring patterns.
Figure 8.2 shows the encapsulation of the binding concern in a separated class. The ad-vantage of separating the binding concern this way is that the catalogue does not need to deal with language-specific grammars, neither the grammars need to be aware of the exis-tence of refactoring patterns that can affect programs written according to its production rules.
When computing the possible sequences, there is the need to inform the maximum size of a sequence. This size is called thelevels of a refactoring sequence. For example,
RPsCatalog - name: String RefactoringPattern - name: String
Grammar - name: String
Symbol - name: String
Terminal NonTerminal
ProductionRule Binder
+ bind(RPsCatalog, Grammar) : void + addBind(Binding) : void
Binding
+ Binding(RefactoringPattern, Symbol) : Binding
0..* 1
symbolRPsBindings 0..*
catalog
1 grammar
1 rules
1..*
startSymbol 1
nonTerminals 1..*
terminals 0..*
+catalog 0..*
+patterns 1..*
Figure 8.2: Binding refactoring patterns to grammar symbols
the application of aPull Up Methodrefactoring pattern followed by an application of the Inline Methodrefactoring pattern is a refactoring sequence with two levels. The first one is comprised by the application ofPull Up Method and the second one by the application ofInline Method.
Considering the existence of a grammar and a refactoring catalogue, the steps for the creation of the initial refactoring sequences fornlevels are:
1. Create the binding between the grammar and the refactoring catalogue.
2. Create the individual bindings between the grammar symbols and the individual refactoring patterns in the catalogue.
3. For each grammar symbol which has applicable refactoring patterns proceed as follows:
(a) Generate all the combinations (with repetitions) of the applicable refactoring patterns with 1 level
(b) ...
(c) Generate all the combinations (with repetitions) of the applicable refactoring patterns with n levels
Section 8.4 exemplifies how these steps are mapped to the source code of a proto-type developed to generate these initial sequences. To simplify the creation process, the impossible sequences are not evaluated at this stage, but later, in the simplification phase.
8.3.2 Simplifying the Sequences
The next steps reduce the number of sequences. Letr1andr2be refactoring patterns andI be an initial program which will be manipulated by the refactoring patterns. The following cases can occur:
• Simplifications: Simplifications occur when there is a shorter path that leads from an initial program to an end program. If from I, the application ofr1followed by r2results in the same piece of software than the application of r2from the initial programI, the sequences are said to be equivalent. This equivalence can be denoted as: I r1r2≡ I r2.
• Commutative Path: Commutative paths occur when the order of application of a refactoring patterns pair does not matter. It means that: Ir1r2≡ I r2r1.
• Inverse Path: Refactoring patterns usually have an inverse refactoring pattern (for example,Pull Up Methodis the inverse refactoring pattern ofPush Down Method).
This case can be expressed as: I r1r2≡ I.
• Independent Path: This kind of sequence occurs when two different refactoring patterns in a path do not have an influence on each other and can be applied in parallel. It is a special case of a commutative path. This cases occur because the refactoring patterns are manipulating distinct elements of the programs and in the general case of commutative paths the refactoring patterns can be manipulating the same elements, but with the same final result.
• Impossible Paths: Impossible paths are refactoring sequences that cannot be ap-plied. Certain refactoring patterns can disable the application of other patterns.
For example, after a method is inlined it cannot be moved or renamed because the method itself does not exist anymore.
These rules of behavioural preservation and equivalence can be proved using differ-ent techniques. The equivalences for simplifications, commutative and inverse paths can be proved using graph parallelism and confluence techniques (BALDAN et al., 1999;
HECKEL; KUSTER; TAENTZER, 2002). The occurrence of independent and impos-sible paths can be detected using critical pair analysis (MENS; TAENTZER; RUNGE, 2005).
The following algorithm can be used to simplify a set of sequences for a given ele-ment of the grammar. First, all possible sequences are computed. Then, the impossible, independent and inverse sequences are removed, simplifications are removed and finally, one of the commutative sequences are also removed (it does not matter which one). The algorithm can be expressed as:
SIMPLIFY-REP(rep) 1 seqs=GETSEQS(rep)
2 seqs=seqs−IMPOSSIBLESEQ(SEQS) 3 seqs=seqs−INDEPENDENTSEQ(SEQS) 4 seqs=seqs−INVERSESEQ(SEQS)
5 seqs=seqs−SIMPLIFICATIONSEQ(SEQS) 6 seqs=seqs−COMMUTATIVESEQ(SEQS) 7 returnseqs
Functions ForbiddenSeq, IndependentSeq , InverseSeq, SimplificationSeq and Com-mutativeSeqreturn, respectively, all the sequences that are impossible, independent, in-verse, simplifications and one of two commutative paths. An example using these rules is shown in Section 8.4.
These simplification rules can be created once and stored in a knowledge base. They are used when the developer wants to search for sequences of refactoring patterns (instead of the application a single refactoring pattern). A tool provider, for example, can specify a set of simplification rules for refactoring patterns manipulating methods, and another set for refactoring patterns manipulating classes.
The user can then use such rules indirectly, by providing for which refactoring patterns he wants to search for refactoring opportunities, how many levels are the sequences and for which modules, packages or classes the search will be conducted (scope reduction).