Nelma Moreira, David Pereira Simão Melo de Sousa Deciding Kleene algebra terms equivalence in Coq

(1)

Deciding Kleene algebra terms equivalence in Coq

NelmaMoreira, David Pereira, Simão Melode Sousa

a b s t ra c t

Keywords:

Proofassistants Regularexpressions Kleenealgebrawithtests Programverification

Thispaperpresentsamechanicallyverifiedimplementationofanalgorithmfordeciding the equivalenceof Kleenealgebra terms within the Coq proof assistant. The algorithm decidesequivalenceoftwogivenregularexpressionsthroughaniteratedprocessoftesting the equivalenceoftheirpartial derivativesand doesnot requirethe construction ofthe corresponding automata. Recenttheoretical andexperimentalresearchprovidesevidence that this method is, on average, more efficient than the classical methods based on automata.Wepresentsomeperformancetests,comparisonswithsimilarapproaches,and also introduce ageneralization of the algorithm to decidethe equivalence of termsof Kleene algebra with tests.The motivation for the work presented inthis paper is that ofusingthelibrariesdevelopedastrustedframeworksforcarryingoutcertified program verification.

1. Introduction

Formal languagesareoneofthepillarsofcomputerscience.Amongsttheseveralcomputationalmodelsofformal languages, that of regularexpression is one of the mostwidely known andused. The notion of regular expressions has its originsintheseminalworkofKleene,wheretheauthorintroducedthemasaspecificationlanguagefordeterministicfinite automata (DFA)[1]. Nowadays,regular expressions findapplications ina wide variety ofareas dueto their capability of expressingpatternsinasuccinctandcomprehensiveway.Theyaboundintechnologiesderiving fromtheWorldWideWeb, intextprocessors,instructuredlanguagessuchas XML,andareacoreelementofprogramminglanguageslikePerl[2] and Esterel[3].Morerecently,regularexpressionshavebeensuccessfullyappliedintheruntimeverificationofprograms[4,5].

In thepastyears, muchattentionhasbeengivento themechanization ofKleenealgebra(KA) –the algebraofregular expressions –withinproof assistants.Formally,aKAisan idempotentsemiring togetherwiththeKleenestaroperator ·^⋆, that is characterized axiomatically. J.-C. Filliâtre [6] provided a firstformalization ofthe Kleene theoremfor regular languages [1] within the Coqproof assistant [7].Höfner andStruth [8] investigatedthe automated reasoning invariants of Kleene algebras withProver9 and Mace4 [9]. Pereira and Moreira [10] implemented in Coq an abstract specification of Kleenealgebrawithtests(KAT) [11] andtheproofsthat propositionalHoare logicdeductionrulesare theoremsofKAT.An obviousfollowup ofthat workwas toimplementacertifiedprocedure fordeciding equivalenceofKAterms,i.e., regular expressions. A first stepwas the proof ofthecorrectnessof thepartial derivative automatonconstruction fromaregular expression[12].Inthispaperwedescribethemechanizationofadecisionprocedurebasedonpartialderivativesthatwas

(2)

proposedbyAlmeidaetal.[13],andthatisafunctionalvariantoftherewritesystemintroducedbyAntimirovandMosses in[14].Thisproceduredecidesregularexpressionequivalencethroughaniteratedprocessoftestingtheequivalenceoftheir partialderivatives.

Similarapproachesbasedonthecomputationofabisimulationbetweenthetworegularexpressionswereusedrecently.

In1971,HopcroftandKarp[15]presentedanalmostlinearalgorithmforequivalenceoftwoDFAs.Bytransformingregular expressions intoequivalentDFAs,HopcroftandKarp’smethodcanbe usedforregularexpressions equivalence.A compar- ison ofthatmethodwiththemethodproposed here isdiscussedby Almeidaetal.[16,17].Thereitisconjectured that a direct method should performbetter on average, andthat is corroborated by theoretical studies basedon analytic com- binatorics [18]. HopcroftandKarp’s method was usedby Braibant andPous [19] toformally verifyKozen’sproof of the completenessofKleenealgebra[20]inCoq.

Independentlyoftheworkpresentedhere,CoquandandSiles[21]mechanicallyverifiedanalgorithmfordecidingregular expression equivalencebasedonBrzozowski’sderivatives[22] andan inductivedefinitionoffinitesetscalledKuratowski- finite sets.Based onthesamenotion ofderivative,Krauss andNipkow[23] providean elegant andconciseformalization ofRutten’sco-algebraicapproachofregularexpressionequivalence[24]intheIsabelleproofassistant[25],buttheydonot address the terminationof thedecision procedure.Komendantsky provides a novel functionalconstruction of thepartial derivativeautomaton[26],andalsomadecontributions[27]to themechanization ofconceptsrelatedtoMirkin’s construction [28] of thatautomata. More recently,Andrea Aspertiformalized a decisionprocedure forthe equivalence ofpointed regularexpressions[29],thatisbothcompactandefficient.

Besides avoiding theneed forbuildingDFAs, our useofpartial derivativesalso avoidsthe necessary normalization of regular expressionsmodulo ACI (i.e.,thenormalizationmoduloassociativity, idempotenceandcommutativityoftheunion of regular expressions) in order to ensure the finiteness of Brzozowski’s derivatives. Like in other approaches [19], our methodalsoincludesarefutationstepthatimprovesthedetectionofinequivalentregularexpressions.

Althoughthealgorithm wehavechosentoverifyseems straightforward,the processofitsmechanicalverificationina theoremproverbasedon atypetheoryraisesseveralissueswhicharequitedifferentfromausualimplementationinstan- dard programminglanguages.TheCoq proofassistantallows userstospecifyandimplementprograms,andalsotoprove thattheimplementedprogramsarecompliantwiththeirspecification.Inthissense,thefirsttaskistheeffortofformalizing the underlyingalgebraic theory.Afterwards,andinorder toencode the decisionprocedure,we havetoprovide aformal proof ofits terminationsince ourprocedure isageneralrecursive one, whereasCoq’s type systemacceptsonlyprovably terminating functions.Finally,aformal proof mustbeprovidedinordertoensurethat thefunctionalbehavioroftheim- plementedprocedureiscorrectw.r.t. regularexpressionequivalence.Moreover,theencodingeffortmustbeconductedwith careinordertoobtainasolutionthatisabletocomputeinsideCoq,orextractedandcompiledasanOCamldevelopment, bothwithreasonableperformances.

1.1. Paperorganization

Thispaperisorganizedasfollows.InSection2weprovideaconciseintroductiontotheCoqproofassistant.InSection3 we reviewsomeoftheconcepts offormal languagesthatweneedtoformalize inordertoimplementthedecisionproce- dure;inSection4wedescribetheformalization ofthedecisionprocedure,itsproofsofcorrectnessandcompleteness,and commentontheprocedure’scomputationalefficiency;inSection5wedescribethegeneralizationofthedecisionprocedure to decideKAT termsequivalence,andshow howthisprocedureis usefulinprogramverification;finally,inSection 6 we presentourconclusionsabouttheworkpresentedinthispaper,andpointtofutureresearchdirections.Theworkpresented here is an extended version of the work previously presented in [30,31], andthe corresponding development in Coq is availableat[32].

2. AnoverviewoftheCoqproofassistant

The Coq proof assistant[7]is an implementationofPaulin-Mohring’sCalculusofInductiveConstructions(CIC) [33]. The CICisarichtypedλ-calculusthatfeaturespolymorphism,dependenttypes,andthatextendsCoquandandHuet’sCalculus ofConstructions(CC)[34]withveryexpressive(co-)inductivetypes.

The CIC is built upon the Curry–HowardIsomorphism (CHI) programs-as-proofs principle [35], where a typing relation t:A is interpreted eitherasa termt thathas thetype A,orast beinga proof ofthe proposition A. Hence, theCIC is simultaneously a functional programminglanguage witha very expressive type systemand ahigher-order logic, andso, userscandefinespecificationsofprograms,andalsobuildproofsconcerningthosespecifications.

In theCICthere existsnodistinction betweentermsandtypes.Therefore,all typesalsohavetheir owntype,calleda sort,andeachsortbelongstothewell-formedsetS= {Prop,Set,Type(i)|i∈N},whereType(i)isthetypeofsmallersorts Type(j)with j<i,includingthesortsProp andSetwhichensureastrict separationbetweenlogicaltypesandinformative types:theformeristhetypeofpropositionsandproofs,whereasthelatteraccommodatesdatatypesandfunctionsdefined over those data types.An immediateeffectof thenon-existing distinction betweentypes andtermsin CIC isthat com- putations occurbothinprogramsandinproofs.AfundamentalfeatureofCoq’sunderlyingtypesystemisthesupportfor dependentproducttypesΠx:A.BwhichextendfunctionaltypesA→BinthesensethatthetypeofΠx:A.B isthetypeof

(3)

functionsthatmap eachinstanceofxoftype A toatypeof Bwherexmayoccurinit.Ifxdoesnotoccurin Bthenthe dependentproductcorrespondstothefunctiontype A→B.

InductivedefinitionsareakeyingredientofCoq.Inductivetypesareintroducedbyacollectionofconstructors,eachwith itsown arity.Atermofan inductivetype isacompositionofsuch constructorsandifT isthetype underconsideration, thenits constructorsare functionswhosefinal typeis T,oran applicationof T toarguments.Usingpatternmatching,we can implementrecursive functions by deconstructing thegiven termandproducing new termsfor each constructor. For instance, it is straightforward to define Peano natural numbers and a function plus that implements addition on these numbers:

Inductivenat : Set : =

| 0 : nat

| S : nat→ nat.

Fixpointplus (n m:nat) : nat : = matchn with

| O⇒ m

| S p ⇒S (p + m) end where"n + m" : = (plus n m) .

Thedefinitionof^plusisacceptedbyCoq’stype-checkerbecauseitexhaustivelypattern-matchesoveralltheconstructorsof nat,andbecausetherecursivecallsareperformedontermsthatarestructurallysmallerthantherecursiveargument.This isastrongrequirementofCICthatforcesallfunctionstobeterminating.

Wecandefineinductivetypesthataremorecomplexthannat,namely,inductivetypesthatdependonvalues.Aclassic exampleisthefamilyofvectorsoflengthn∈N_,_whose_elements_have_a_type_A_:

Inductivevect (A : Type) : nat→ Type : =

| vnil : vect A 0

| vcons : ∀ n : nat, A→vect An→vect A (Sn)

Giventhedefinitionofvect,wecandefinetheconcatenationofvectors,asfollows:

Fixpointapp(n:nat) (l₁:vect A n) (n^′:nat) (l₂:vect An^′) {structl₁} : vect (n+n^′) : = matchl1 in (vect _m^′) return (vect A (m^′ + n^′) ) with

| vnil⇒ l2

| vconsn0 v l^′₁ ⇒vcons A (n0 + n^′) v (app n0 l^′₁ n^′ l2) end.

Notethatthereisadifferencebetweenthepattern-matchingconstructionusedinthedefinitionofplusandtheoneused toimplementapp:inthelatter,thereturningtypedependsonthesizesofthevectorsgivenasarguments;therefore,the extended^matchconstructionin^apphastobindthedependentargumentm^′ toensurethatthefinalreturntypeisavector whosesizeisn+n^′.

In Coq’senvironment, the primitivewayto constructa proof is toexplicitly build CIC terms.However, proofscan be builtmoreconveniently, inan interactive andbackward fashion throughtheusage ofhigh-level commands calledtactics.

TheCICtermsbuiltbytacticsarealwaysverifiedbyCoq’stypechecker,whichensuresthatpossibleerrorsinthetacticsdo notinterferewiththesoundnessoftheproofconstructionprocess.

We finishourbriefintroduction toCoq addressingthe development ofnon-structurally recursive functions. Abovewe haveseenpatternmatchingover(dependent)inductivetypes,andwhosedecreasingcriteriaisstructuralrecursion.However, thisapproachisnotalwayspossibleandthewaytodealwiththisproblemisviaanencoding oftheoriginal formulation intoanequivalentfunctionthatisstructurallyrecursive.Thereareseveraltechniquesavailabletoaddressthedevelopment ofnon-structurallydecreasingfunctionsinCoq,whicharedescribedindetailin[7];herewe willconsiderthemethodfor definingwell-foundedrecursivefunctions.

AgivenbinaryrelationR_over_a_set_S_is_said_to_bewell-foundedifforallelementsx∈S,thereexistsnoinfinitesequence (x,x₀,x₁,x₂,. . .)ofelementsofSsuchthat(xi+1,xi)∈R_,_for_all_i∈N_.Well-foundedrelationsareavailableinCoqthrough thedefinitionoftheinductivepredicateAccandthepredicatewell_founded:

InductiveAcc (A : Type) (R : A→ A→ Prop) (x : A) : Prop : =

| Acc_intro : (∀ y : A, R y x→Acc A R y)→Acc A Rx

SincethetypeAccisinductivelydefined,wecanuseitasthestructurallyrecursiveargumentinthedefinitionofafunction.

Thankfully, Coq provides a high-level command named Function [36] that eases the burdenof manually constructing a recursive functionover Accpredicates. Thecommand Functionallows users toexplicitlystate that the target functionis goingtobedefinedoveraproofthatassertsthattheunderlyingrecursivemeasureiswell-founded.

Forfurther information aboutthe details of theCoq proof assistant, we point the readerto the worksof Bertotand Casterán[7],ofChlipala[37],andofPierceetal.[38].

3. Preliminariesofformallanguages

Inthissectionweintroducesome classicconcepts offormallanguagesthatwe willneedinthework weareaboutto describe.Theseconceptscanbefoundintheintroductorychaptersofclassicaltextbookssuch astheonebyHopcroftand

(4)

Ullman[39]ortheonebyKozen[40].TheencodinginCoqoftheseveraldefinitionsthatweareabouttointroducecanbe seenin[31].

3.1. Alphabets,wordsandlanguages

AnalphabetΣisanon-emptyfinitesetofobjectsusuallycalledsymbols(orletters).Aword(orstring)overanalphabet Σ is a finite sequence of symbolsfrom Σ. A language is anyfinite or infiniteset of words over an alphabet Σ. Given an alphabet Σ,the set ofall wordsover Σ, denoted by Σ^⋆,is inductively defined asfollows: the empty wordǫ isan elementofΣ^⋆ and,ifw∈Σ^⋆anda∈Σ,thenawisalsoamemberofΣ^⋆.Theconstantlanguagesaretheemptylanguage, the languagecontaining only ǫ,andthelanguage containing onlyasymbola∈Σ.Theoperationsoverlanguagesinclude the usual Boolean set operations (union, intersection, and complement), plus concatenation, power and Kleenestar. The concatenationoftwolanguagesL₁andL₂isdefinedby L₁L₂= {wu|w∈L₁∧u∈L₂}.Thepowerofalanguage L,denoted by Lⁿ,with∈N_,_isinductivelydefinedby L⁰= {ǫ},andLⁿ⁺¹=LLⁿ,forn∈N_._The_Kleene_star _of_a_language _L_is_the_union ofallthefinitepowersofL,thatis,

L^⋆=

i≥0

Lⁱ. (1)

Wedenotelanguageequalityby L₁=L₂.Finally,weintroducetheconceptoftheleft-quotientofalanguage Lwithrespect toa wordw∈Σ^⋆,whichisdefinedasD_w(L)= {v|w v∈L}.Inparticular,if w=a,witha∈Σ,wesaythatD_a(L)isthe left-quotientofLwithrespecttothesymbola.

3.2. Regularexpressions

RegularexpressionsareinductivelydefinedoveranalphabetΣ,asfollows:theconstants0and1areregularexpressions;

allthesymbolsa∈Σareregularexpressions;if αandβ areregularexpressions,thentheirunionα+β andtheirconcate- nation αβ are regularexpressions aswell;finally,ifα is aregular expression,then soisits Kleenestarα^⋆.The syntactic equalityoftworegular expressions α andβ isdenotedby α≡β.Thesetofallregular expressionsoveranalphabetΣ is thesetRE_Σ.Thelengthofaregularexpression αisthetotalnumberofconstants,symbolsandoperatorsof α;thealpha- beticlengthofaregularexpression αisthetotalnumberofoccurrencesofsymbolsofΣin α.Theprevioustwomeasures aredenotedby|α|andby|α|Σ,respectively.

Regularexpressions denoteregular languages.Thelanguage ofaregularexpression α,denotedL(α),isinductivelyde- fined in theexpectedway:the languagesof theconstants 0 and1are, respectively, thesets ∅ and {ǫ}; thelanguage of theregularexpressiona,witha∈Σ,istheset{a};if α andβ areregular expressions,thenthelanguagesdenotedbythe expressions α+β, αβ,and α^⋆are,respectively,thelanguagesL(α)∪L(β),L(α)L(β),andL(α)^⋆.Thelanguageofafinite setofregularexpressions Sisdefinedby

L(S)=

α_i∈S

L(αi).

Tworegularexpressions αandβ aresaidtobeequivalentiftheydenotethesamelanguage,andwewrite α∼β whenever thatisthecase.¹Naturally,twosetsofregularexpressions S₁andS₂areequivalentifL(S₁)=L(S₂),andwewrite S₁∼S₂. Givenasetofregularexpressions S= {α1,α2,. . . ,αn}wedefine

S=α1+α2+. . .+αn,

whoselanguageis

L

S

=^L(α1)∪^L(α2)∪ · · · ∪^L(αn).

Wesaythataregularexpression α isnullableif ǫ∈L(α)andnon-nullableotherwise.Moreover,weconsidertheBoolean function ε(·)suchthat the ε(α)=trueifandonlyif ǫ∈L(α)holds.Nullabilityextendstosetsofregular expressionsin a straightforward way:aset S is nullable ifε(α)evaluates positively, that is,if ε(α)=trueforat leastone α∈S. We denotethenullabilityofasetofregularexpressions S by ε(S).Twosetsofregularexpressions S₁ andS₂ areequi-nullable if ε(S₁)=ε(S₂).Wealsoconsidertheright-concatenation S⊙αofaregularexpression α withasetofregularexpressions S,whichisdefinedasfollows: S⊙α= ∅if α≡0,S⊙α=S if α≡1,andS⊙α= {βα|β∈S}otherwise.Weusuallyomit theoperator⊙andwrite Sαinstead.

1 Asthereaderwillnotice,weoverloadthenotation“∼”wheneverequivalencebymeansoflanguageequalityisconsidered.

(5)

3.3. Derivativesofregularexpressions

Thenotionofderivativeofaregularexpression α was introducedbyBrzozowski inthe1960s[22],andwas motivated by the construction of sequential circuits directly fromregular expressions extended with intersectionand complement.

In thesame decade, Mirkin introduced the notionof prebase andbase ofa regular expression asa method to construct non-deterministicfiniteautomata(NFA)thatrecognize thecorrespondinglanguages[28].Mirkin’sdefinitionisageneralization ofBrzozowski’sderivativesforNFAandwas independentlyre-discoveredalmost thirtyyearslaterby Antimirov[41],who coineditasthepartialderivativesofaregularexpression.

Let α bearegularexpressionandleta∈Σ.Theset∂a(α)ofpartialderivativesoftheregular expression α withrespect toa isinductivelydefinedasfollows:

∂a(0)= ∅ ∂a(α+β)=∂a(α)∪∂a(β)

∂a(1)= ∅ ∂a(αβ)=

∂a(α)β∪∂a(β) ifε(α)=true,

∂a(α)β otherwise.

∂a(b)=

{ε} ifa≡b,

∅ otherwise. ∂a(α^⋆)=∂a(α)α^⋆

Theoperationofpartialderivationnaturallyextendstoasetofregularexpressions Sasfollows:

∂a(S)=

α∈S

∂a(α).

The language of the set of partial derivatives ∂a(α) is the left-quotient of L(α), i.e., L(∂a(α))=D_a(L(α)). The set of partialderivativesisextendedtowordsinthefollowingway:givenaregularexpression α andawordw∈Σ^⋆,thepartial derivative∂w(α)of αwithrespecttow isdefinedinductivelyby∂ε(α)= {α},and∂wa(α)=∂a(∂w(α)).Wecanusepartial derivatives and nullability of regular expressions to determine if a word w∈Σ^⋆ is a member of some language L(α). Forthat,itisenough tocheckthevalue computedby ε(∂w(α)):if ε(∂w(α))=truethenwe have w∈L(α); otherwise, w∈/L(α)holds.

Example1.Thewordderivativeoftheregularexpressionab^⋆withrespecttoabbisgivenbythefollowingcomputation:

∂_abb(α)=∂_b

∂_b

∂_a ab^⋆

=∂_b

∂_b

∂_a(a)b^⋆

=∂_b

∂_b b^⋆

=∂_b

∂_b(b)b^⋆

=∂_b b^⋆

= b^⋆ .

Fromthenullabilityoftheresultingsetofregularexpressions{b^⋆},weeasilyconcludethatabb∈L(α)since ε(b^⋆)=true. Finally,wepresentthesetofpartialderivativesofagivenregularexpression α,whichisdefinedby

PD(α)=

w∈Σ^⋆

∂_w(α) .

Antimirovprovedin[41] thatgivenaregularexpression α,thesetPD(α)isalwaysfiniteanditscardinalityhasan upper bound of|α|Σ+1.Champarnaud andZiadi[42] introduced an elegantrecursive functionforcalculating thesupport ofa given regular expression α, andfrom whichit is easy to calculate PD(α).The function,denoted by π(α), isrecursively definedasfollows:

π(0)= ∅ π(1)= ∅ π(a)= {ε}

π(α+β)=π(α)∪π(β) π(αβ)=π(α)β∪π(β)

πα^⋆=π(α)α^⋆

Champarnaud andZiadi provedthat PD(α)= {α}∪π(α)holdsforall regularexpressions α,andonceagainwe conclude that|PD(α)|≤ |α|_Σ+1.