Deciding Kleene algebra terms equivalence in Coq
NelmaMoreira, David Pereira, Simão Melode Sousa
a b s t ra c t
Keywords:
Proofassistants Regularexpressions Kleenealgebrawithtests Programverification
Thispaperpresentsamechanicallyverifiedimplementationofanalgorithmfordeciding the equivalenceof Kleenealgebra terms within the Coq proof assistant. The algorithm decidesequivalenceoftwogivenregularexpressionsthroughaniteratedprocessoftesting the equivalenceoftheirpartial derivativesand doesnot requirethe construction ofthe corresponding automata. Recenttheoretical andexperimentalresearchprovidesevidence that this method is, on average, more efficient than the classical methods based on automata.Wepresentsomeperformancetests,comparisonswithsimilarapproaches,and also introduce ageneralization of the algorithm to decidethe equivalence of termsof Kleene algebra with tests.The motivation for the work presented inthis paper is that ofusingthelibrariesdevelopedastrustedframeworksforcarryingoutcertified program verification.
1. Introduction
Formal languagesareoneofthepillarsofcomputerscience.Amongsttheseveralcomputationalmodelsofformal lan- guages, that of regularexpression is one of the mostwidely known andused. The notion of regular expressions has its originsintheseminalworkofKleene,wheretheauthorintroducedthemasaspecificationlanguagefordeterministicfinite automata (DFA)[1]. Nowadays,regular expressions findapplications ina wide variety ofareas dueto their capability of expressingpatternsinasuccinctandcomprehensiveway.Theyaboundintechnologiesderiving fromtheWorldWideWeb, intextprocessors,instructuredlanguagessuchas XML,andareacoreelementofprogramminglanguageslikePerl[2] and Esterel[3].Morerecently,regularexpressionshavebeensuccessfullyappliedintheruntimeverificationofprograms[4,5].
In thepastyears, muchattentionhasbeengivento themechanization ofKleenealgebra(KA) –the algebraofregular expressions –withinproof assistants.Formally,aKAisan idempotentsemiring togetherwiththeKleenestaroperator ·⋆, that is characterized axiomatically. J.-C. Filliâtre [6] provided a firstformalization ofthe Kleene theoremfor regular lan- guages [1] within the Coqproof assistant [7].Höfner andStruth [8] investigatedthe automated reasoning invariants of Kleene algebras withProver9 and Mace4 [9]. Pereira and Moreira [10] implemented in Coq an abstract specification of Kleenealgebrawithtests(KAT) [11] andtheproofsthat propositionalHoare logicdeductionrulesare theoremsofKAT.An obviousfollowup ofthat workwas toimplementacertifiedprocedure fordeciding equivalenceofKAterms,i.e., regular expressions. A first stepwas the proof ofthecorrectnessof thepartial derivative automatonconstruction fromaregular expression[12].Inthispaperwedescribethemechanizationofadecisionprocedurebasedonpartialderivativesthatwas
proposedbyAlmeidaetal.[13],andthatisafunctionalvariantoftherewritesystemintroducedbyAntimirovandMosses in[14].Thisproceduredecidesregularexpressionequivalencethroughaniteratedprocessoftestingtheequivalenceoftheir partialderivatives.
Similarapproachesbasedonthecomputationofabisimulationbetweenthetworegularexpressionswereusedrecently.
In1971,HopcroftandKarp[15]presentedanalmostlinearalgorithmforequivalenceoftwoDFAs.Bytransformingregular expressions intoequivalentDFAs,HopcroftandKarp’smethodcanbe usedforregularexpressions equivalence.A compar- ison ofthatmethodwiththemethodproposed here isdiscussedby Almeidaetal.[16,17].Thereitisconjectured that a direct method should performbetter on average, andthat is corroborated by theoretical studies basedon analytic com- binatorics [18]. HopcroftandKarp’s method was usedby Braibant andPous [19] toformally verifyKozen’sproof of the completenessofKleenealgebra[20]inCoq.
Independentlyoftheworkpresentedhere,CoquandandSiles[21]mechanicallyverifiedanalgorithmfordecidingregular expression equivalencebasedonBrzozowski’sderivatives[22] andan inductivedefinitionoffinitesetscalledKuratowski- finite sets.Based onthesamenotion ofderivative,Krauss andNipkow[23] providean elegant andconciseformalization ofRutten’sco-algebraicapproachofregularexpressionequivalence[24]intheIsabelleproofassistant[25],buttheydonot address the terminationof thedecision procedure.Komendantsky provides a novel functionalconstruction of thepartial derivativeautomaton[26],andalsomadecontributions[27]to themechanization ofconceptsrelatedtoMirkin’s construc- tion [28] of thatautomata. More recently,Andrea Aspertiformalized a decisionprocedure forthe equivalence ofpointed regularexpressions[29],thatisbothcompactandefficient.
Besides avoiding theneed forbuildingDFAs, our useofpartial derivativesalso avoidsthe necessary normalization of regular expressionsmodulo ACI (i.e.,thenormalizationmoduloassociativity, idempotenceandcommutativityoftheunion of regular expressions) in order to ensure the finiteness of Brzozowski’s derivatives. Like in other approaches [19], our methodalsoincludesarefutationstepthatimprovesthedetectionofinequivalentregularexpressions.
Althoughthealgorithm wehavechosentoverifyseems straightforward,the processofitsmechanicalverificationina theoremproverbasedon atypetheoryraisesseveralissueswhicharequitedifferentfromausualimplementationinstan- dard programminglanguages.TheCoq proofassistantallows userstospecifyandimplementprograms,andalsotoprove thattheimplementedprogramsarecompliantwiththeirspecification.Inthissense,thefirsttaskistheeffortofformalizing the underlyingalgebraic theory.Afterwards,andinorder toencode the decisionprocedure,we havetoprovide aformal proof ofits terminationsince ourprocedure isageneralrecursive one, whereasCoq’s type systemacceptsonlyprovably terminating functions.Finally,aformal proof mustbeprovidedinordertoensurethat thefunctionalbehavioroftheim- plementedprocedureiscorrectw.r.t. regularexpressionequivalence.Moreover,theencodingeffortmustbeconductedwith careinordertoobtainasolutionthatisabletocomputeinsideCoq,orextractedandcompiledasanOCamldevelopment, bothwithreasonableperformances.
1.1. Paperorganization
Thispaperisorganizedasfollows.InSection2weprovideaconciseintroductiontotheCoqproofassistant.InSection3 we reviewsomeoftheconcepts offormal languagesthatweneedtoformalize inordertoimplementthedecisionproce- dure;inSection4wedescribetheformalization ofthedecisionprocedure,itsproofsofcorrectnessandcompleteness,and commentontheprocedure’scomputationalefficiency;inSection5wedescribethegeneralizationofthedecisionprocedure to decideKAT termsequivalence,andshow howthisprocedureis usefulinprogramverification;finally,inSection 6 we presentourconclusionsabouttheworkpresentedinthispaper,andpointtofutureresearchdirections.Theworkpresented here is an extended version of the work previously presented in [30,31], andthe corresponding development in Coq is availableat[32].
2. AnoverviewoftheCoqproofassistant
The Coq proof assistant[7]is an implementationofPaulin-Mohring’sCalculusofInductiveConstructions(CIC) [33]. The CICisarichtypedλ-calculusthatfeaturespolymorphism,dependenttypes,andthatextendsCoquandandHuet’sCalculus ofConstructions(CC)[34]withveryexpressive(co-)inductivetypes.
The CIC is built upon the Curry–HowardIsomorphism (CHI) programs-as-proofs principle [35], where a typing relation t:A is interpreted eitherasa termt thathas thetype A,orast beinga proof ofthe proposition A. Hence, theCIC is simultaneously a functional programminglanguage witha very expressive type systemand ahigher-order logic, andso, userscandefinespecificationsofprograms,andalsobuildproofsconcerningthosespecifications.
In theCICthere existsnodistinction betweentermsandtypes.Therefore,all typesalsohavetheir owntype,calleda sort,andeachsortbelongstothewell-formedsetS= {Prop,Set,Type(i)|i∈N},whereType(i)isthetypeofsmallersorts Type(j)with j<i,includingthesortsProp andSetwhichensureastrict separationbetweenlogicaltypesandinformative types:theformeristhetypeofpropositionsandproofs,whereasthelatteraccommodatesdatatypesandfunctionsdefined over those data types.An immediateeffectof thenon-existing distinction betweentypes andtermsin CIC isthat com- putations occurbothinprogramsandinproofs.AfundamentalfeatureofCoq’sunderlyingtypesystemisthesupportfor dependentproducttypesΠx:A.BwhichextendfunctionaltypesA→BinthesensethatthetypeofΠx:A.B isthetypeof
functionsthatmap eachinstanceofxoftype A toatypeof Bwherexmayoccurinit.Ifxdoesnotoccurin Bthenthe dependentproductcorrespondstothefunctiontype A→B.
InductivedefinitionsareakeyingredientofCoq.Inductivetypesareintroducedbyacollectionofconstructors,eachwith itsown arity.Atermofan inductivetype isacompositionofsuch constructorsandifT isthetype underconsideration, thenits constructorsare functionswhosefinal typeis T,oran applicationof T toarguments.Usingpatternmatching,we can implementrecursive functions by deconstructing thegiven termandproducing new termsfor each constructor. For instance, it is straightforward to define Peano natural numbers and a function plus that implements addition on these numbers:
Inductivenat : Set : =
| 0 : nat
| S : nat→ nat.
Fixpointplus (n m:nat) : nat : = matchn with
| O⇒ m
| S p ⇒S (p + m) end where"n + m" : = (plus n m) .
ThedefinitionofplusisacceptedbyCoq’stype-checkerbecauseitexhaustivelypattern-matchesoveralltheconstructorsof nat,andbecausetherecursivecallsareperformedontermsthatarestructurallysmallerthantherecursiveargument.This isastrongrequirementofCICthatforcesallfunctionstobeterminating.
Wecandefineinductivetypesthataremorecomplexthannat,namely,inductivetypesthatdependonvalues.Aclassic exampleisthefamilyofvectorsoflengthn∈N,whoseelementshaveatypeA:
Inductivevect (A : Type) : nat→ Type : =
| vnil : vect A 0
| vcons : ∀ n : nat, A→vect An→vect A (Sn)
Giventhedefinitionofvect,wecandefinetheconcatenationofvectors,asfollows:
Fixpointapp(n:nat) (l1:vect A n) (n′:nat) (l2:vect An′) {structl1} : vect (n+n′) : = matchl1 in (vect _m′) return (vect A (m′ + n′) ) with
| vnil⇒ l2
| vconsn0 v l′1 ⇒vcons A (n0 + n′) v (app n0 l′1 n′ l2) end.
Notethatthereisadifferencebetweenthepattern-matchingconstructionusedinthedefinitionofplusandtheoneused toimplementapp:inthelatter,thereturningtypedependsonthesizesofthevectorsgivenasarguments;therefore,the extendedmatchconstructioninapphastobindthedependentargumentm′ toensurethatthefinalreturntypeisavector whosesizeisn+n′.
In Coq’senvironment, the primitivewayto constructa proof is toexplicitly build CIC terms.However, proofscan be builtmoreconveniently, inan interactive andbackward fashion throughtheusage ofhigh-level commands calledtactics.
TheCICtermsbuiltbytacticsarealwaysverifiedbyCoq’stypechecker,whichensuresthatpossibleerrorsinthetacticsdo notinterferewiththesoundnessoftheproofconstructionprocess.
We finishourbriefintroduction toCoq addressingthe development ofnon-structurally recursive functions. Abovewe haveseenpatternmatchingover(dependent)inductivetypes,andwhosedecreasingcriteriaisstructuralrecursion.However, thisapproachisnotalwayspossibleandthewaytodealwiththisproblemisviaanencoding oftheoriginal formulation intoanequivalentfunctionthatisstructurallyrecursive.Thereareseveraltechniquesavailabletoaddressthedevelopment ofnon-structurallydecreasingfunctionsinCoq,whicharedescribedindetailin[7];herewe willconsiderthemethodfor definingwell-foundedrecursivefunctions.
AgivenbinaryrelationRoverasetSissaidtobewell-foundedifforallelementsx∈S,thereexistsnoinfinitesequence (x,x0,x1,x2,. . .)ofelementsofSsuchthat(xi+1,xi)∈R,foralli∈N.Well-foundedrelationsareavailableinCoqthrough thedefinitionoftheinductivepredicateAccandthepredicatewell_founded:
InductiveAcc (A : Type) (R : A→ A→ Prop) (x : A) : Prop : =
| Acc_intro : (∀ y : A, R y x→Acc A R y)→Acc A Rx
SincethetypeAccisinductivelydefined,wecanuseitasthestructurallyrecursiveargumentinthedefinitionofafunction.
Thankfully, Coq provides a high-level command named Function [36] that eases the burdenof manually constructing a recursive functionover Accpredicates. Thecommand Functionallows users toexplicitlystate that the target functionis goingtobedefinedoveraproofthatassertsthattheunderlyingrecursivemeasureiswell-founded.
Forfurther information aboutthe details of theCoq proof assistant, we point the readerto the worksof Bertotand Casterán[7],ofChlipala[37],andofPierceetal.[38].
3. Preliminariesofformallanguages
Inthissectionweintroducesome classicconcepts offormallanguagesthatwe willneedinthework weareaboutto describe.Theseconceptscanbefoundintheintroductorychaptersofclassicaltextbookssuch astheonebyHopcroftand
Ullman[39]ortheonebyKozen[40].TheencodinginCoqoftheseveraldefinitionsthatweareabouttointroducecanbe seenin[31].
3.1. Alphabets,wordsandlanguages
AnalphabetΣisanon-emptyfinitesetofobjectsusuallycalledsymbols(orletters).Aword(orstring)overanalphabet Σ is a finite sequence of symbolsfrom Σ. A language is anyfinite or infiniteset of words over an alphabet Σ. Given an alphabet Σ,the set ofall wordsover Σ, denoted by Σ⋆,is inductively defined asfollows: the empty wordǫ isan elementofΣ⋆ and,ifw∈Σ⋆anda∈Σ,thenawisalsoamemberofΣ⋆.Theconstantlanguagesaretheemptylanguage, the languagecontaining only ǫ,andthelanguage containing onlyasymbola∈Σ.Theoperationsoverlanguagesinclude the usual Boolean set operations (union, intersection, and complement), plus concatenation, power and Kleenestar. The concatenationoftwolanguagesL1andL2isdefinedby L1L2= {wu|w∈L1∧u∈L2}.Thepowerofalanguage L,denoted by Ln,with∈N,isinductivelydefinedby L0= {ǫ},andLn+1=LLn,forn∈N.TheKleenestar ofalanguage Listheunion ofallthefinitepowersofL,thatis,
L⋆=
i≥0
Li. (1)
Wedenotelanguageequalityby L1=L2.Finally,weintroducetheconceptoftheleft-quotientofalanguage Lwithrespect toa wordw∈Σ⋆,whichisdefinedasDw(L)= {v|w v∈L}.Inparticular,if w=a,witha∈Σ,wesaythatDa(L)isthe left-quotientofLwithrespecttothesymbola.
3.2. Regularexpressions
RegularexpressionsareinductivelydefinedoveranalphabetΣ,asfollows:theconstants0and1areregularexpressions;
allthesymbolsa∈Σareregularexpressions;if αandβ areregularexpressions,thentheirunionα+β andtheirconcate- nation αβ are regularexpressions aswell;finally,ifα is aregular expression,then soisits Kleenestarα⋆.The syntactic equalityoftworegular expressions α andβ isdenotedby α≡β.Thesetofallregular expressionsoveranalphabetΣ is thesetREΣ.Thelengthofaregularexpression αisthetotalnumberofconstants,symbolsandoperatorsof α;thealpha- beticlengthofaregularexpression αisthetotalnumberofoccurrencesofsymbolsofΣin α.Theprevioustwomeasures aredenotedby|α|andby|α|Σ,respectively.
Regularexpressions denoteregular languages.Thelanguage ofaregularexpression α,denotedL(α),isinductivelyde- fined in theexpectedway:the languagesof theconstants 0 and1are, respectively, thesets ∅ and {ǫ}; thelanguage of theregularexpressiona,witha∈Σ,istheset{a};if α andβ areregular expressions,thenthelanguagesdenotedbythe expressions α+β, αβ,and α⋆are,respectively,thelanguagesL(α)∪L(β),L(α)L(β),andL(α)⋆.Thelanguageofafinite setofregularexpressions Sisdefinedby
L(S)=
αi∈S
L(αi).
Tworegularexpressions αandβ aresaidtobeequivalentiftheydenotethesamelanguage,andwewrite α∼β whenever thatisthecase.1Naturally,twosetsofregularexpressions S1andS2areequivalentifL(S1)=L(S2),andwewrite S1∼S2. Givenasetofregularexpressions S= {α1,α2,. . . ,αn}wedefine
S=α1+α2+. . .+αn,
whoselanguageis
L
S
=L(α1)∪L(α2)∪ · · · ∪L(αn).
Wesaythataregularexpression α isnullableif ǫ∈L(α)andnon-nullableotherwise.Moreover,weconsidertheBoolean function ε(·)suchthat the ε(α)=trueifandonlyif ǫ∈L(α)holds.Nullabilityextendstosetsofregular expressionsin a straightforward way:aset S is nullable ifε(α)evaluates positively, that is,if ε(α)=trueforat leastone α∈S. We denotethenullabilityofasetofregularexpressions S by ε(S).Twosetsofregularexpressions S1 andS2 areequi-nullable if ε(S1)=ε(S2).Wealsoconsidertheright-concatenation S⊙αofaregularexpression α withasetofregularexpressions S,whichisdefinedasfollows: S⊙α= ∅if α≡0,S⊙α=S if α≡1,andS⊙α= {βα|β∈S}otherwise.Weusuallyomit theoperator⊙andwrite Sαinstead.
1 Asthereaderwillnotice,weoverloadthenotation“∼”wheneverequivalencebymeansoflanguageequalityisconsidered.
3.3. Derivativesofregularexpressions
Thenotionofderivativeofaregularexpression α was introducedbyBrzozowski inthe1960s[22],andwas motivated by the construction of sequential circuits directly fromregular expressions extended with intersectionand complement.
In thesame decade, Mirkin introduced the notionof prebase andbase ofa regular expression asa method to construct non-deterministicfiniteautomata(NFA)thatrecognize thecorrespondinglanguages[28].Mirkin’sdefinitionisageneralization ofBrzozowski’sderivativesforNFAandwas independentlyre-discoveredalmost thirtyyearslaterby Antimirov[41],who coineditasthepartialderivativesofaregularexpression.
Let α bearegularexpressionandleta∈Σ.Theset∂a(α)ofpartialderivativesoftheregular expression α withrespect toa isinductivelydefinedasfollows:
∂a(0)= ∅ ∂a(α+β)=∂a(α)∪∂a(β)
∂a(1)= ∅ ∂a(αβ)=
∂a(α)β∪∂a(β) ifε(α)=true,
∂a(α)β otherwise.
∂a(b)=
{ε} ifa≡b,
∅ otherwise. ∂a(α⋆)=∂a(α)α⋆
Theoperationofpartialderivationnaturallyextendstoasetofregularexpressions Sasfollows:
∂a(S)=
α∈S
∂a(α).
The language of the set of partial derivatives ∂a(α) is the left-quotient of L(α), i.e., L(∂a(α))=Da(L(α)). The set of partialderivativesisextendedtowordsinthefollowingway:givenaregularexpression α andawordw∈Σ⋆,thepartial derivative∂w(α)of αwithrespecttow isdefinedinductivelyby∂ε(α)= {α},and∂wa(α)=∂a(∂w(α)).Wecanusepartial derivatives and nullability of regular expressions to determine if a word w∈Σ⋆ is a member of some language L(α). Forthat,itisenough tocheckthevalue computedby ε(∂w(α)):if ε(∂w(α))=truethenwe have w∈L(α); otherwise, w∈/L(α)holds.
Example1.Thewordderivativeoftheregularexpressionab⋆withrespecttoabbisgivenbythefollowingcomputation:
∂abb(α)=∂b
∂b
∂a ab⋆
=∂b
∂b
∂a(a)b⋆
=∂b
∂b b⋆
=∂b
∂b(b)b⋆
=∂b b⋆
= b⋆ .
Fromthenullabilityoftheresultingsetofregularexpressions{b⋆},weeasilyconcludethatabb∈L(α)since ε(b⋆)=true. Finally,wepresentthesetofpartialderivativesofagivenregularexpression α,whichisdefinedby
PD(α)=
w∈Σ⋆
∂w(α) .
Antimirovprovedin[41] thatgivenaregularexpression α,thesetPD(α)isalwaysfiniteanditscardinalityhasan upper bound of|α|Σ+1.Champarnaud andZiadi[42] introduced an elegantrecursive functionforcalculating thesupport ofa given regular expression α, andfrom whichit is easy to calculate PD(α).The function,denoted by π(α), isrecursively definedasfollows:
π(0)= ∅ π(1)= ∅ π(a)= {ε}
π(α+β)=π(α)∪π(β) π(αβ)=π(α)β∪π(β)
πα⋆=π(α)α⋆
Champarnaud andZiadi provedthat PD(α)= {α}∪π(α)holdsforall regularexpressions α,andonceagainwe conclude that|PD(α)|≤ |α|Σ+1.