• Nenhum resultado encontrado

Formal verification of EVM bytecode

3.2 Semantics of the symbolic EVM

3.2.2 Small-step rules

The general structure of a rule changes to encompass multiple call stacks. Given that the universe is now a list of call stacks, a step consists of the execution of an instruction in each call stack, namely, the instruction in the position that corresponds to the program counter of the machine in each of the top elements of the call stacks. We will write the list of call stacks vertically for better readability. If there are ncall stacks, the evolution of the call stack (µi, ιi, σi, ηi) ::Si is described by the rule that is applicable to the opcodeOPi in the conditionspremises(Si), where 1≤i≤n, whose general structure is presented in Figure 3.1.

ωµ11=OP1 . . . ωµnn=OPn

premises(S1) . . . premises(Sn)

Γ|=

1, ι1, σ1, η1) ::S1

01, ι01, σ10, η01) ::S10

. . . .

n, ιn, σn, ηn) ::Sn0n0, ι0n0, σ0n0, ηn00) ::Sn00

Figure 3.1: General structure of a small-step rule in the symbolic machine

Since the executions of thencall stacks are independent from each other, in the following we will only write one call stack on the left hand side and the call stacks that result from that one on the right hand side. The number of call stacks may increase after a step: a call stack may fork whenever an operation over elements of the execution stack does not return a numeric value due to the symbolic nature of one or more elements. The new list of call stacks is the union of the ones resulting from the individual execution of each call stack.

Initialisation is very similar to the numeric case: the only modification is in the initial value of the machine state µ: it is now (g,0, ε,0, ε, ε,cond), where cond ∈ LΣ(S) is the initial condition that we wish to specify. Possible examples are statements about environmental variables that are known to be addresses, throughisAddress, or about the transaction value. If there are no assumptions,condistrue by default.

During the symbolic execution we maintain a list of call stacksS1, . . . , Sn. If there exists a call stack, Si, 1≤ i≤n, such that no rule can be further applied to it, we consider that its configuration in the

next step is equal to the current configuration. We say that we reach a final state when there is no rule that can be applied to any of the call stacksS1, . . . , Sn. Therefore, we are looking for a fixed point of the function that reads the current configuration and updates each call stack according to the current context.

The finalisation should contain the payment of rewards to the miners and the destruction of the accounts that were signaled to be eliminated during the execution but, since we are interesting in modelling smart contract execution and not blockchain usage, we did not implement these functionalities. As it was already described, when the symbolic execution finishes our aim is to determine if a given property P holds.

In the following, we will present the most significant semantic rules of the symbolic Ethereum Virtual Machine. The rules should illustrate the main characteristic features of the EVM and the most important differences between the original and the symbolic machines.

Basic arithmetic instructions, comparison operators and stack operations. We start with the arithmetic operation ADDto ease the comparison with semantics of the regular EVM. The structure is the same, the only modification is in the nature of stack elements, that are now allowed to be symbols.

The expressiona+b mod 2256should be resolved ifaandbare numeric and be stored asa+b mod 2256 otherwise. The rule is presented in Figure 3.2.

ωµ,ι=ADD valid(µ.gas,3,|s|) µ.s=a::b::s µ0=µ[gas−= 3][pc+ = 1][s→a+b mod 2256::s]

Γ|= (µ, ι, σ, η) ::S→(µ0, ι, σ, η) ::S

Figure 3.2: Rule for the successful execution of the ADDopcode in the symbolic machine

The rules for comparison operators must be able to accomodate two possibilities. One call stack generates two call stacks, each equal to the previous one in every component but the machine state of the top element. Each machine state contains the conjunction of the previous condition with one of the new possible conditions – for example, the opcode LTwith argumentsa andbresults in the new conditions a<banda≥b.

Stack operations, likePUSHn,POP,DUPnandSWAPn, have the expected semantics: only the execution stack, the available gas and the program counter of the top element of the call stack are changed. Notice that it is not possible to PUSH a symbol to the stack since the operand of the PUSHn opcode is the number formed by thenbytes that follow the push instruction in the currently executing code. However, the execution stack may contain symbols that were added to it through requests for the value of the transaction, the content of a position of the storage, or the balance of a given account, among other examples, soPOP,DUPnandSWAPncan handle symbols.

Terminating opcodes. RETURNis used to successfully terminate an execution with the possibility of returning output. The output is taken from the memory of the execution. The two operands of this instruction are the initial position of the memory and the size of the portion that should be returned.

Both arguments must be numeric since they represent positions, but the memory contents that form the

output can contain symbols. The condition is returned in the HALT element in order to be passed on to the outer execution or to be a part of the final state, as it can be seen in Figure 3.3.

ωµ,ι=RETURN

µ.s=io::is::s aw=M(µ.i, io, is) c=Cmem(µ.i, aw) valid(µ.gas, c,|s|) d=µ.m[io, io+is−1] g=µ.gas−c

Γ|= (µ, ι, σ, η) ::S→HALT(σ, g,d, η, µ.cond) ::S

Figure 3.3: Rule for the successful execution of theRETURNopcode in the symbolic machine

Memory usage consumes gas. M: (N256)3→N256 is the memory expansion function, used to deter-mine the new number of active words in memory given the current number of active words in memoryµ.i, and the offsetio and the sizeisof the memory accessed by the current instruction. Cmem: (N256)2→Z is an auxiliary function used to compute gas costs due to the variation of memory usage. These functions are defined in Appendix A.

When the RETURN opcode is run, if the execution stack has at least two elements io and is, a size no greater than 1024, and if there is enough gas, then the call stack is popped and a new terminating element is pushed to it. HALT(σ, g,d, η, µ.cond) contains the global stateσ and the transaction effect η that were specified by the previous top element of the call stack, the available gas g – equal to the previous available gas minus the gas spent by this operation –, the return data d, which is the content of the memory between the positionsioandis, and the condition µ.condof the previous machine state.

If the current execution was caused by a CALL-like opcode, in next step the changes in the components present in the HALT element will become effective, including updating the return data to d and the available gas.

TheREVERT opcode ends the current execution reverting all changes made during it. It can return some output from the memory, likeRETURN, and the unused gas is refunded to the account that triggered the execution, unlike what happens when an exception is thrown. The main rule for REVERT, shown in Figure 3.4, is very similar to the main rule forRETURN, butσandηare discarded since the modifications to the global state and to the transaction effect should not become effective.

ωµ,ι=REVERT

µ.s=io::is::s aw=M(µ.i, io, is) c=Cmem(µ.i, aw) valid(µ.gas, c,|s|) d=µ.m[io, io+is−1] gas=µ.gas−c

Γ|= (µ, ι, σ, η) ::S→REV(g,d, µ.cond) ::S

Figure 3.4: Rule for the successful execution of theREVERTopcode in the symbolic machine

Jumps. Recall that the set of valid jump destinations is the set of positions that contain the op-code JUMPDEST. Let D:B → B be the function that assigns each piece of code its set of valid jump destinations. The rules for the unconditional jump JUMP are similar to the slightly more complex rules for the conditional jumpJUMPI. This opcode, when given a symbolic guard, causes the current call stack to split in two: one where the execution jumps to position i and other where it continues to the next

position. The conditions are updated in order to reflect the assumptions made in each case. The first argument cannot be symbolic since it represents a position of the code. The complete rule is exhibited in Figure 3.5.

ωµ,ι=JUMPI valid(µ.gas,10,|s|) µ.s=i::b::s i∈D(ι.code) µ01=µ[pc+ = 1][s→s][gas−= 10][cond→µ.cond∧b= 0]

µ02=µ[pc→i][s→s][gas−= 10][cond→µ.cond∧b6= 0]

Γ|= (µ, ι, σ, η) ::S→ (µ01, ι, σ, η) :: (µ, ι, σ, η) ::S (µ02, ι, σ, η) :: (µ, ι, σ, η) ::S

Figure 3.5: Rule for the successful execution of theJUMPIopcode in the symbolic machine

Storage management. Saving information in the storage of the currently executing account is done throughSSTORE. This operation depends on two arguments that may be symbolic – position and content – and the gas cost varies according to the values of the arguments. Storing elementbin positiona, whose current content is x, has a cost of 20000 gas ifx= 0 and b6= 0 and of 5000 gas otherwise – it is more expensive to write a non-zero value in an empty position. Therefore, it is necessary to consider up to four cases due to the differences in theµ.condfield and in the remaining gas. In any successful case the function stor of the currently executing account should be modified so that stor(a) = b. We chose to present in Figure 3.6 the rule for the case where there is enough gas for the cheapest case but not for the most expensive.

ωµ,ι=SSTORE ¬valid(µ.gas,20000,|s|) valid(µ.gas,5000,|s|) µ.s=a::b::s x= (σ(ι.actor).stor)(a) cond=µ.cond

µ01=µ[gas−= 5000][pc+ = 1][s→s][cond→µ.cond∧x= 0∧b= 0]

µ03=µ[gas−= 5000][pc+ = 1][s→s][cond→µ.cond∧x6= 0∧b= 0]

µ04=µ[gas−= 5000][pc+ = 1][s→s][cond→µ.cond∧x6= 0∧b6= 0]

σ0=σhι.actor→ι.actor[stor→σ(ι.actor).stor[a→b]]i η0=η[balr+ = 15000]

Γ|= (µ, ι, σ, η) ::S→

01, ι, σ0, η) ::S

EXC(cond∧x= 0∧b6= 0) ::S (µ03, ι, σ0, η0) ::S

04, ι, σ0, η) ::S

Figure 3.6: Rule for the execution of theSSTOREopcode in the symbolic machine where the available gas is between 5000 and 20000

Notice that the balance refundη.balr only increases in the case where a previously written position is cleared. One of the call stacks records the exception associated with the corresponding condition. In the next step, each call stack will perform a step independently.

SLOAD accepts a symbolic argument, that corresponds to the position whose content should be re-trieved, and pushes to the execution stack whatever is in that symbolic position. Naturally, since the symbolic machine is an isolated and experimental environment, a symbolic position of the storage only exists if it was stored during the same transaction.

Memory management. The semantics of the operationMSTOREreflect the assumptions made about the structure of the memory. It receives two arguments from the execution stack, aand b, and saves b in the memory as a 32-byte word: bis written in the 32 bytes froma toa+ 31. amust be a number;b may be a symbol. If b is a symbol it is stored in a different way: 32 copies of the symbol are written in the desired portion of the memory.

ωµ,ι =MSTORE c=Cmem(µ.i, aw) + 3 valid(µ.gas, c,|s|) aw=M(µ.i, a,32) µ.s=a::b::s µ0=µ[gas−=c][pc+ = 1][m→µ.m[[a, a+ 31]→b]][i→aw][s→s]

Γ|= (µ, ι, σ, η) ::S→(µ0, ι, σ, η) ::S

ωµ,ι=MSTORE c=Cmem(aw)−Cmem(µ.i) + 3 valid(µ.gas, c,|s|) aw=M(µ.i, a,32) µ.s=a::b::s b∈/ N256

µ0=µ[gas−=c][pc+ = 1][m→µ.m[a→b]. . .[a+ 31→b]][i→aw][s→s]

Γ|= (µ, ι, σ, η) ::S→(µ0, ι, σ, η) ::S

Figure 3.7: Two rules that concern the execution of theMSTOREopcode in the symbolic machine

Environmental information. Many environmental opcodes are getter methods, so their semantics are trivial to derive. We detail the rules for the very important operation CALLDATALOAD. It receives a single argument, a position, and pushes to the execution stack the number formed by the 32 bytes starting in that position of the inputι.input. Since it is assumed that the first four bytes of the input represent the function to be called, they must be numeric. The arguments of the function follow this reference, starting in positions that are congruent with 4 modulo 32. Each argument has a size of 32 bytes. If they are symbolic, they must be stored using 32 copies of the same symbol. The original specification states that this operation receives a numerical position aand returns the contents of ι.input from position a to positiona+ 31. The semantics of this operation in the symbolic machine result from a modification of the original ones in the following way:

• If a = 0, it checks if the first 4 bytes of ι.input are all numeric. If they are not, it raises an exception. If they are, it returns the number corresponding to those 4 bytes followed by 28 zeros.

This option was motivated by the fact that the only common reason to execute CALLDATALOADin position 0 is to get the hash of the function and, so, the last 28 bytes were going to be discarded in the next step anyway; there is no relevant information lost;

• Ifa6= 0, we require thatpos≡4 mod 32, otherwise an exception is thrown. If the bytes fromato a+ 31 are all numeric, the instruction pushes the corresponding number to the stack. If every byte is the same symbol, it pushes that symbol to the stack. In any other case, an exception is thrown.

The rule in Figure 3.8 covers the case wherea= 0 anddis a list of numbers. If the input is less than 4 bytes long, the element pushed to the stack is padded on the right with the necessary number of zeros so that it has 32 bytes.

ωµ,ι=CALLDATALOAD

µ.s= 0 ::s valid(µ.gas,3,|s|+ 1) k=min(|ι.input|,4) d=ι.input[0, k−1] d0 =d·0224+32−k µ0=µ[gas−= 3][pc+ = 1][s→d0::s]

Γ|= (µ, ι, σ, η) ::S→(µ0, ι, σ, η) ::S

Figure 3.8: Rule for the successful execution of theCALLDATALOADopcode in the symbolic machine with argument equal to 0

If ι.input[0, k−1] contains a symbol, an exception is thrown. The rule in Figure 3.9 illustrates exceptional termination, with the symbolEXC being pushed to the call stack.

ωµ,ι=CALLDATALOAD

µ.s= 0 ::s valid(µ.gas,3,|s|+ 1) k=min(|ι.input|,4)

d=ι.input[0, k−1] d6∈B

Γ|= (µ, ι, σ, η) ::S→EXC(µ.cond) ::S

Figure 3.9: Rule for the unsuccessful execution of the CALLDATALOAD opcode in the symbolic machine with argument equal to 0

If the position is different from 0, the conditions that were previously stated are checked, as well as the size of the input. We only present here the rule where the request is successfully completed; cases where at least one byte is a symbol and the 32 bytes are not equal, where the input is too small, where a6≡4 mod 32, where there is not enough gas or where the execution stack underflows raise exceptions and the rules are presented in Appendix A.

ωµ,ι=CALLDATALOAD

µ.s=a::s a≡4 mod 32 valid(µ.gas,3,|s|+ 1) |ι.input| ≥a+ 31 ι.input[a] =· · ·=ι.input[a+ 31] =d d6∈B µ0 =µ[gas−= 3][pc+ = 1][s→d::s]

Γ|= (µ, ι, σ, η) ::S→(µ0, ι, σ, η) ::S

Figure 3.10: One of the rules for the unsuccessful execution of theCALLDATALOADopcode in the symbolic machine

Interactions between contracts. The previous examples illustrate the behaviour of the system during a simple execution. Naturally, only the top element of the call stack is changed. When a call to other account is performed, a new element is pushed to the call stack. CALLis one of the most complex opcodes: it starts an internal transaction. It takes seven values from the execution stack: gas to the inner execution, address to be called, value to be transferred, input offset, input size, output offset, and output size. The input of this transaction is taken from the memory of the caller and the output will be written in the memory of the caller, in the specified positions and with the specified sizes. Since a new element is pushed to the call stack, it is necessary to check that it will not overflow. The success of the operation also depends on the relationship between the balance of the sender and the transferred value, and on the

existence of enough gas. The gas cost depends on the value to be sent, on the existence of the called account, and on the gas specified by the sender. Since we do not allow the gas to be symbolic, we only need to care about the value to determine the possible execution paths. There are three possiblilities: the value is 0; the value is greater than the balance of the sender; or the value is different from zero and less than or equal to the balance of the sender. The second case triggers an exception by lack of funds; the first and the third cases need to be considered separately because the gas costs are different. In the third case we consider that the value is 1 in the auxiliary functions of the gas calculations because they return the same results for every non-zero value. The input may contain symbolic elements. The address of the recipient can be symbolic, but it should be noted that, if the stack element that represents the account is to, the account whose existence will be checked is to mod 2160. It is possible to call this symbolic account if to mod 2160 is an account of the global stateσor if the termisAddress(to) is a part of the fieldµ.cond. The following rule, in Figure 3.11, models the case where the called account exists. If it did not exist, the gas cost would be higher.

ωµ,ι=CALL µ.s=g::to::va::io::is::oo::os::s |S|+ 1≤1024 toa=to mod 2160 σ(toa)6=⊥ aw=M(M(µ.i, io, is), oo, os) d=µ.m[io, io+is−1]

ccall1 =Cgascap(0,1, g, µ.gas) ccall2 =Cgascap(1,1, g, µ.gas) c1=Cbase(0,1) +Cmem(µ.i) +ccall1 c2=Cbase(1,1) +Cmem(µ.i) +ccall2

valid(µ.gas, c1,|s|+ 1) valid(µ.gas, c2,|s|+ 1) σ0 =σhtoa→σ(toa)[b+ =va]ihι.actor→σ(ι.actor)[b−=va]i

µ01= (ccall1,0, λx.0,0, , µ.rd, µ.cond∧va= 0)

µ02= (ccall2,0, λx.0,0, , µ.rd, µ.cond∧0<va≤σ(ι.actor).b)

ι01=ι[actor→toa][input→d][sender→ι.actor][value→0][code→σ(toa).code]

ι02=ι[actor→toa][input→d][sender→ι.actor][value→va][code→σ(toa).code]

Γ|= (µ, ι, σ, η) ::S→

01, ι01, σ, η) :: (µ, ι, σ, η) ::S (µ02, ι02, σ0, η) :: (µ, ι, σ, η) ::S

EXC(µ.cond∧va> σ(ι.actor).b)) :: (µ, ι, σ, η) ::S

Figure 3.11: Rule for the successful execution of the CALL opcode in the symbolic machine where the called account exists

As a result, the current call stack is replaced by three, two of which correspond to regular configu-rations. The elements pushed to those call stacks are of the form (µ0, ι0, σ0, η). σ0 is the same in both and it is equal to σafter the transference of the valueva from the accountι.actor to the accounttoa. The conversiontoa=to mod 2160is needed because addresses are 160-bit numbers and execution stack elements are 256-bit numbers. ι01andι02are the new execution environments: actoris the called contract toa and code is its code,sender is the caller contract, valueis either 0 or the symbol va, andinput is the content of the memory of the caller between positions io and io+is−1. The new machine states µ01, µ02contain available gasccalli that depends on whether the value is null or not, program counter and number of active words in memory equal to 0, and empty memory and execution stack, differing only in the formulas va= 0 and va6= 0 that they connect to their conditions.

Cbase:N256× {0,1} →N and Cgascap: N256× {0,1} ×N256×N256 →N are functions that help to

compute the gas cost of a call operation and the gas that should be sent to the inner execution. These functions consider the available gas, the value, and a boolean value (f lag) that indicates if the called account exists in order to determine the possible additional costs of the operation. Cgascap depends on the functionCextra:N256× {0,1} →N256. Their values are higher if there is ether to be transferred or if the called account does not exist. Having a fixed, known available gasµ.gas,c1and c2 are fixed values, and if the second argument,f lag, was 0 they would also be fixed, but different, values. Cbase, Cgascap, andCextra are formally defined in Appendix A.

ci (i= 1,2) is the maximum amount of gas that can be spent in the inner execution, so it is checked that the caller has at leastciavailable gas. We only subtract the gas used in the inner execution from the machine state of the caller when it finishes. This is not a problem since the execution of a transaction is linear: the caller does not execute or trigger the execution of any code while the called contract is still running.

In the next step, the execution of the transaction continues according to the top element of each call stack. The regular ones contains instructions to execute the code of toa with input d. The values of the components ofµ0, ι0, σ0, andη may change during this inner execution. If that ends successfully, the relevant changes (condition condof the machine state µ, global stateσand transaction effectη) will be passed on to the second element of the call stack and the top element will be popped. Otherwise, the top

In the next step, the execution of the transaction continues according to the top element of each call stack. The regular ones contains instructions to execute the code of toa with input d. The values of the components ofµ0, ι0, σ0, andη may change during this inner execution. If that ends successfully, the relevant changes (condition condof the machine state µ, global stateσand transaction effectη) will be passed on to the second element of the call stack and the top element will be popped. Otherwise, the top

Documentos relacionados