• Nenhum resultado encontrado

Error reporting in Parsing Expression Grammars

N/A
N/A
Protected

Academic year: 2021

Share "Error reporting in Parsing Expression Grammars"

Copied!
12
0
0

Texto

(1)

Contents lists available atScienceDirect

Science

of

Computer

Programming

www.elsevier.com/locate/scico

Error

reporting

in

Parsing

Expression

Grammars

André

Murbach Maidl

a

,

Fabio Mascarenhas

b

,

,

Sérgio Medeiros

c

,

Roberto Ierusalimschy

d

aPolytechnicSchool,PUCPR,Curitiba,Brazil

bDepartmentofComputerScience,UFRJ,RiodeJaneiro,Brazil cSchoolofScienceandTechnology,UFRN,Natal,Brazil dDepartmentofComputerScience,PUC-Rio,RiodeJaneiro,Brazil

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory: Received 11 April 2014

Received in revised form 3 August 2016 Accepted 12 August 2016

Available online 20 August 2016 Keywords:

Parsing Error reporting

Parsing expression grammars Packrat parsing

Parser combinators

Parsing Expression Grammars (PEGs) describe top-down parsers. Unfortunately, the error-reporting techniques used in conventional top-down parsers do not directly apply to parsers based on Parsing Expression Grammars (PEGs), so they have to be somehow simulated. While the PEG formalism has no account of semantic actions, actual PEG implementations add them, and we show how to simulate an error-reporting heuristic through these semantic actions.

We also propose a complementary error reporting strategy that may lead to better error messages: labeled failures. This approach is inspired by exception handling of programming languages, and lets a PEG define different kinds of failure, with each ordered choice operator specifying which kinds it catches. Labeled failures give a way to annotate grammars for better error reporting, to express some of the error reporting strategies used by deterministic parser combinators, and to encode predictive top-down parsing in a PEG.

©2016 Elsevier B.V. All rights reserved.

1. Introduction

Whena parserreceivesan erroneousinput,itshouldindicate theexistenceofsyntax errors.Errorscan behandledin variousways.Theeasiestisjusttoreportthatanerrorwasfound,whereitwasfound,andwhatwasexpectedatthatpoint andthenabort.Attheotherendofthespectrumwefindmechanismsthatattempttoparsethecompleteinput,andreport asmanyerrorsasbestaspossible.

The LL(k)andLR(k)methods detect syntaxerrors very efficientlybecausethey havethe viableprefix property, that is, thesemethodsdetectasyntaxerrorassoonask tokensarereadandcannotbeusedtoextendthethusfaracceptedpartof theinputintoaviableprefixofthelanguage[1].LL(k)andLR(k)parserscanusethispropertytoproducesuitable,though generic,errormessages.

Parsing ExpressionGrammars(PEGs)[2]area formalismfordescribingthe syntaxofprogramminglanguages.Wecan viewaPEGasaformaldescriptionofatop-downparserforthelanguageitdescribes.PEGshaveaconcretesyntaxbased onthesyntaxofregexes,orextendedregularexpressions.UnlikeContext-FreeGrammars(CFGs),PEGsavoidambiguitiesin thedefinitionofthegrammar’slanguageduetotheuseofanorderedchoice operator.

*

Corresponding author.

E-mailaddresses:andre.murbach@pucpr.br(A.M. Maidl), mascarenhas@ufrj.br(F. Mascarenhas), sergiomedeiros@ect.ufrn.br(S. Medeiros), roberto@inf.puc-rio.br(R. Ierusalimschy).

http://dx.doi.org/10.1016/j.scico.2016.08.004 0167-6423/©2016 Elsevier B.V. All rights reserved.

(2)

Morespecifically,aPEGcanbeinterpretedasathespecificationofarecursivedescentparserwithrestricted(orlocal) backtracking.Thismeansthat thealternatives ofachoicearetried inorder;assoonasan alternativerecognizesaninput prefix,nootheralternativeofthischoicewillbetried,butwhenanalternativefailstorecognizeaninputprefix,theparser backtrackstotrythenextalternative.

On theone hand,PEGs can beinterpreted asaformalization ofaspecific classoftop-down parsers [2]; ontheother hand, PEGs cannot use error handling techniques that are often applied to predictive top-down parsers, because these techniques assume the parser readsthe input without backtracking [3]. In top-down parsers without backtracking, it is possibleto signalasyntax errorassoonasthenextinput symbolcannot beaccepted.InPEGs,itismorecomplicatedto identifythecauseofan errorandthepositionwhereitoccurs, becausefailuresduringparsingarenot necessarilyerrors, butjustanindicationthattheparsercannotproceedandadifferentchoiceshouldbemadeelsewhere.

Ford[3]hasalreadyidentifiedthislimitationoferrorreportinginPEGs,and,inhisparsergeneratorsforPEGs,included aheuristic forbettererrorreporting.Thisheuristicsimulatestheerrorreportingtechniquethatisimplementedintop-down parsers withoutbacktracking.Theideaistotrackthepositionintheinputwherethefarthestfailureoccurred, aswellas whattheparserwasexpectingatthatpoint,andreportthistotheuserincaseoferrors.

TrackingthefarthestfailurepositionandcontextgivesusPEGsthatproduceerrormessagessimilartotheautomatically producederrormessagesofothertop-downparsers;theytelltheuserthepositionwheretheerrorwasencountered,what wasfoundintheinputatthatposition,andwhattheparserwasexpectingtofind.

Inthispaper,weshowhowgrammarwriterscanusethiserrorreportingtechniqueeveninPEGimplementationsthat donotimplementit,bymakinguseofsemanticactionsthatexposethecurrentpositionintheinputandthepossibilityto accesssomeformofmutablestateassociatedwiththeparsingprocess.

WealsoproposeacomplementaryapproachforerrorreportinginPEGs,basedontheconceptoflabeledfailures,inspired by thestandardexceptionhandlingmechanismsasfoundinprogramminglanguages.Insteadofjustfailing,alabeledPEG canproducedifferentkindsoffailurelabelsusingathrow operator.Eachlabelcanbetiedtoamorespecificerrormessage. PEGs canalsocatch suchlabeledfailures,viaachangetotheorderedchoiceoperator.Weformalize labeledfailuresasan extensionofthesemanticsofregularPEGs.

WithlabeledPEGswecanexpresssomealternativeerrorreportingtechniquesfortop-downparserswithlocal backtrack-ing.WecanalsoencodepredictiveparsinginaPEG,andweshow howtodothatforLL

(∗)

parsing,apowerfulpredictive parsingstrategy.

The restofthispaperisorganizedasfollows:Section2contextualizestheproblemoferrorhandlinginPEGs,explains in detail thefailure tracking heuristic,andshows howit can be realizedinPEG implementations that donot support it directly;Section 3discusses relatedworkonerrorreportingfortop-downparserswithbacktracking;Section 4introduces andformalizestheconceptoflabeled failures,andshowshowtouseitforerrorreporting;Section 5comparestheerror messagesgeneratedbyaparserbasedonthefailuretrackingheuristicwiththeonesgeneratedbyaparserbasedonlabeled failures;Section6showshowlabeledfailurescanencodesomeofthetechniquesofSection3,aswellaspredictiveparsing; finally,Section7givessomeconcludingremarks.

2. HandlingsyntaxerrorswithPEGs

In thissection, weuseexamples topresentinmoredetailhowa PEGbehavesbadly inthepresenceofsyntax errors. Afterthat,wepresentaheuristicproposedbyFord[3]toimplementerrorreportinginPEGs.Ratherthanusingtheoriginal notationandsemanticsofPEGsgivenbyFord[2],ourexamplesusetheequivalentandmoreconcisenotationandsemantics proposed byMedeiros etal.[4–6].Wewillextendboth thenotationandthesemantics inSection 4topresentPEGswith labeledfailures.

A PEG G is a tuple

(

V

,

T

,

P

,

pS

)

where V is a finite set ofnon-terminals, T is a finite setof terminals, P is a total

function fromnon-terminalstoparsingexpressions and pS istheinitial parsingexpression.Wedescribethefunction P as

a setof rulesofthe form A

p, where A

V and p isaparsing expression.Aparsing expression,whenapplied toan input string,eitherfailsorconsumesaprefix oftheinputandreturnstheremainingsuffix.Theabstractsyntaxofparsing expressionsisgivenasfollows,wherea isaterminal, A isanon-terminal,andp,p1 andp2 areparsingexpressions:

p

=

ε

|

a

|

A

|

p1p2

|

p1

/

p2

|

p

∗ | !

p

Intuitively,

ε

successfully matches the empty string, not changing the input; a matches and consumes itself or fails otherwise; A triestomatchtheexpression P

(

A

)

; p1p2 triesto matchp1 followedby p2; p1

/

p2 triesto matchp1;if p1

fails,thenittriestomatchp2;p

repeatedlymatches p untilp fails,thatis,itconsumesasmuchasitcanfromtheinput;

thematchingof

!

p succeedsiftheinputdoesnotmatch p andfailswhentheinput matches p,notconsuminganyinput ineithercase;wecallitthenegativepredicateorthelookaheadpredicate.

Fig. 1 presentsa PEGforthe Tinylanguage [7].Tiny isa simpleprogramminglanguage withasyntax that resembles Pascal’s.We willusethisPEG,which canbe seenastheequivalent ofan LL(1)CFG,toshow how errorreportingdiffers betweentop-downparserswithoutbacktrackingandPEGs.

(3)

TinyCmdSeq

CmdSeq← (CmdSEMICOLON) (CmdSEMICOLON)

CmdIfCmd/RepeatCmd/AssignCmd/ReadCmd/WriteCmd IfCmd← IFExpTHENCmdSeq(ELSECmdSeq/ε) END

RepeatCmd← REPEATCmdSeqUNTILExp AssignCmd← NAME ASSIGNMENTExp

ReadCmd← READ NAME WriteCmd← WRITEExp

ExpSimpleExp((LESS / EQUAL)SimpleExp/ε)

SimpleExpTerm((ADD / SUB)Term)

TermFactor((MUL / DIV)Factor)

Factor← OPENPARExpCLOSEPAR/ NUMBER / NAME Fig. 1. A PEG for the Tiny language.

01 n := 5; 02 f := 1; 03 repeat 04 f := f * n; 05 n := n - 1 06 until (n < 1); 07 write f;

Fig. 2. Program for the Tiny language with a syntax error.

PEGsusually expressthelanguage syntaxatthecharacterlevel,withouttheneedofaseparate lexer.Forinstance,we canwritethelexicalrule

IF

asfollows,assumingwehavenon-terminals

Skip

,whichconsumeswhitespace,and

IDRest

, whichconsumesanycharacterthatmaybepresentonapropersuffixofanidentifier1:

IF

if

!

IDRest Skip

Now,wepresentanexampleoferroneousTinycodesowecancompareapproachesforerrorreporting.Theprogramin

Fig. 2is missingasemicolon (

;

) inthe assignmentinline

5

. Apredictivetop-down parserthat abortsonthefirst error presentsanerrormessagelike:

factorial.tiny:6:1: syntax error, unexpected ’until’, expecting ’;’

Theerrorisreportedinline

6

becausetheparsercannotcompleteavalidprefixofthelanguage, sinceitunexpectedly findsthetoken

until

whenitwasexpectingacommandterminator(

;

).

InPEGs,wecantrytoreporterrorsusingtheremainingsuffix,butthisapproachusuallydoesnothelpthePEGproduce an error messagelike the one shownabove. In general, when a PEG finishes parsing the input,a remaining non-empty suffixmeansthatparsingdidnotreachtheendoffileduetoasyntaxerror.However,thissuffixusuallydoesnotindicate theactualplaceoftheerror,astheerrorwillhavecausedthePEGtobacktracktoanotherplaceintheinput.

Inourexample,theproblemhappenswhenthePEGtriestorecognizethesequence ofcommands insidethe

repeat

command. Even though the program has a missing semicolon (

;

) in the assignment in line

5

, making the PEG fail to recognize the sequence of commands inside the

repeat

command, this failure is not treatedasan error. Instead, this failuremakestherecognitionofthe

repeat

commandalsofail.Forthisreason,thePEGbacktrackstheinputtoline

3

to trytoparseotheralternativesforCmdSeq,andsincethesedonotexist,itsancestorCmd.Sinceitisnotpossibletorecognize acommandotherthan

repeat

atline

3

,theparsingfinisheswithoutconsumingalltheinput.Hence,ifthePEGusesthe remaining suffix to producean errormessage,the PEGreports line3instead ofline 6asthe location whereno further progresscanbemade.

Thereis noperfectmethodtoidentifywhich informationisthemostrelevanttoreport an error.Inourexampleitis easyfortheparsertocorrectlyreportwhattheerroris,butitiseasy toconstructexampleswherethisisnotthecase.If weaddthesemicolonintheendofline6andremoveline3,apredictivetop-downparserwouldcomplainaboutfinding an

until

whereitexpectedanotherstatement,whiletheactualerrorisamissing

repeat

.

According to Ford[3],using theinformation ofthe farthestposition that the PEG reachedin theinput is a heuristic thatprovidesgoodresults.PEGsdefinetop-down parsersandtrytorecognizetheinputfromlefttoright, sotheposition farthesttotherightintheinputthataPEGreachesduringparsingusually isclosetotherealerror[3].Thesameideafor errorreportingintop-downparsingswithbacktrackingwasalsomentionedinSection16.2of[8].

(4)

Ford usedthisheuristic to add errorreportingto his PEGimplementation usingpackrat parsers[3].A packratparser generated by Pappy [9], Ford’s PEGparser generator, tracks the farthestposition and uses thisposition to report an er-ror. Inother words,thisheuristic helpspackratparsers tosimulatetheerrorreporting techniquethat is implementedin deterministicparsers.

AlthoughFordonlyhasdiscussedhisheuristic inrelationtopackratparsers,wecanusethefarthestpositionheuristic toadderrorreportingtoanyimplementationofPEGsthatprovidessemanticactions.Theideaistoannotatethegrammar withsemanticactionsthattrackthisposition.Whilethisseemsonerous,wejustneedtoaddannotationstoallthelexical rulestoimplementerrorreporting.

For instance, in Leg [10], a PEG parser generator with Yacc-style semantic actions, we can annotate the rule for

SEMICOLON

asfollows,where

|

is Leg’s ordered choiceoperator, and followingit is a semanticaction (in thenotation usedbyLeg):

SEMICOLON = ";" Skip | &{ updateffp() }

The function

updateffp

that thesemanticactioncallsupdatesthefarthestfailure positioninaglobalvariableifthe currentparsingpositionisgreaterthanthepositionthatisstoredinthisglobal,thenmakesthewholeactionfailsoparsing continuesasiftheoriginalfailurehadoccurred.

However, storing justthe farthestfailure positiondoesnot give theparserall the informationitneeds to producean informativeerrormessage.Thatis,theparserhastheinformationaboutthepositionwheretheerrorhappened,butitlacks the informationaboutwhatterminalsfailedatthatposition.Thus,we extendourapproach byincludingtheterminalsin the annotations sothe parsercanalso trackthesenamesin ordertocompute the setofexpectedterminalsata certain position:

SEMICOLON = ";" Skip | &{ updateffp(";") }

Theextendedimplementationof

updateffp

keeps,foragivenfailureposition,thenamesofallthesymbolsexpected there.Ifthecurrentpositionisgreaterthanthefarthestfailure,

updateffp

initializesthissetwithjustthegivenname.If thecurrentpositionequalsthefarthestfailure,

updateffp

addsthisnametotheset.

Parsersgenerated by Pappy alsotracktheset ofexpectedterminals, butwithlimitations.The errormessages include onlysymbolsandkeywordsthatweredefinedinthegrammarasliteralstrings.Thatis,theerrormessagesdonotinclude terminalsthatweredefinedthroughcharacterclasses.

Our approach of naming terminals in the semantic actions avoids the kind of limitation found in Pappy, though it increasestheannotationburdenbecausetheimplementorofthePEGisalsoresponsibleforaddingonesemanticactionfor eachterminalanditsrespectivename.

The annotationburdencanbelessenedinimplementationsofPEGsthattreatparsingexpressionsasfirst-classobjects, because thismakes it possible to define functionsthat annotate the lexical partsof the grammarto trackerrors,record information about the expected terminalsto produce good errormessages, andenforce lexical conventionssuch as the presence ofsurroundingwhitespace.Forinstance,inLPEG[11,12],aPEGimplementationforLua thatdefinespatternsas first-class objects,we canannotatetheruleCmdSeq asfollows,wherethepatterns

V"A"

,

p1 * p2

,and

p^0

are respec-tivelyequivalenttoparsingexpressions A,p1p2,andp

(inthenotationusedbyLPEG):

CmdSeq = V"Cmd" * symb(";") * (V"Cmd" * symb(";"))^0;

Thefunction

symb

receivesastringasitsonlyargumentandreturnsaparserthatisequivalenttotheparsingexpression thatweusedintheLegexample.Thatis,

symb(";")

isequivalentto

";" Skip | &{ updateffp(";")}

.

WeimplementederrortrackingandreportingusingsemanticactionsasasetofparsingcombinatorsontopofLPegand used thesecombinatorstoimplementaPEGparserforTiny.Itproduces thefollowingerrormessagefortheexamplewe havebeenusinginthissection:

factorial.tiny:6:1: syntax error, unexpected ’until’,

expecting ’;’, ’=’, ’<’, ’-’, ’+’, ’/’, ’*’

WetestedthisPEGparserforTinywithothererroneousinputsandinallcasestheparseridentifiedanerrorinthesame placeasatop-downparserwithoutbacktracking.Inaddition,theparserforTinyproducederrormessagesthataresimilar to the error messages produced by packrat parsers generated by Pappy. We annotatedother grammars andsuccessfully obtainedsimilar results.However, theerrormessagesarestill generic;theyarenot asspecificastheerrormessages ofa hand-writtentop-downparser.

3. Errorreportingintop-downparserswithbacktracking

(5)

Mizushima et al. [13] proposed a cut operator (

) to reduce the space consumption of packrat parsers; the authors claimedthatthecutoperatorcanalsobeusedtoimplementerrorreportinginpackratparsers,buttheauthorsdidnotgive anydetails onhowthecutoperatorcouldbeusedforthispurpose.Thecut operatorisborrowedfromPrologtoannotate piecesofaPEGwherebacktrackingshouldbeavoided.PEGs’orderedchoiceworksinasimilarwaytoProlog’sgreencuts, thatis,theylimitbacktrackingtodiscardunnecessarysolutions.Thecutproposed forPEGsisawaytoimplementProlog’s whitecuts,thatis,theypreventbacktrackingtorulesthatwillcertainlyfail.

Thesemanticsofcutissimilartothesemanticsofan

if-then-else

controlstructureandcanbesimulatedthrough predicates.Forinstance,thePEG(withcut) A

B

C

/

D isfunctionallyequivalenttothePEG(without cut)A

BC

/

!

B D

thatisalsofunctionallyequivalenttotherule A

B

[

C

,

D

]

onGeneralizedTop-DownParsingLanguage(GTDPL),oneofthe parsing techniquesthat influenced thecreation ofPEGs [3,9,2]. Onthe threecases,the expression D is tried onlyifthe expression B fails.Nevertheless,thistranslatedPEGstillbacktrackswheneverB successfullymatchesandC fails.Thus,itis nottrivialtousethistranslationtoimplementerrorreportinginPEGs.

Rats![14]isapopularpackratparserthatimplementserrorreportingwithastrategysimilartoFord’s,withthechange thatitalwaysreportserrorpositionsatthestartofproductions,andpretty-printsnon-terminalnamesintheerrormessage.

Forexample,anerrorinaReturnStatement non-terminalbecomes

return

statement

expected

.

Even though error handlingis an importanttask for parsers, we did not find anyother research results about error handling in PEGs, beyond the heuristic proposed by Ford andthe cut operator proposed by Mizushima et al. However, parsercombinators[15]presentsomesimilaritieswithPEGssowewillbrieflydiscussthemfortherestofthissection.

Infunctionalprogrammingitiscommontoimplementrecursivedescentparsersusingparsercombinators[15].Aparser isa functionthat we usetomodel symbolsofthegrammar.Aparser combinatoris ahigher-orderfunction that weuse to implementgrammarconstructionssuch assequencing andchoice.One kindofparser combinator implementsparsers thatreturnalistofallpossibleresultsofa parse,effectivelyimplementingarecursivedescentparserwithfull backtrack-ing.Despitebeingactuallydeterministicinbehavior (parsingthesameinputalways yieldsthe samelistofresults),these combinatorsarecallednon-deterministicparsercombinators duetotheiruseofanon-deterministicchoiceoperator.Weget parsercombinatorsthathavethesamesemanticsasPEGsbychangingthereturntypefromlistofresultsto

Maybe

.Thatis, weusedeterministicparsercombinators thatreturn

Maybe

toimplementrecursivedescent parserswithlimited backtrack-ing[16].Intherestofthispaper,wheneverwerefertoparsercombinatorsweintendtorefertotheseparsercombinators withlimitedbacktracking.

LikePEGs,mostdeterministic parsercombinator librariesalsouseorderedchoice,andthussufferfromthesame prob-lemsasPEGswitherroneousinputs,wherethepointthattheparserreachedintheinputisusually farawayfromthepoint oftheerror.

Hutton[15] introduced the

nofail

combinator toimplementerrorreportingina quitesimpleway:we justneedto distinguish betweenfailure anderror during parsing.More specifically, we can use the

nofail

combinator to annotate thegrammar’sterminalsandnon-terminalsthatshouldnot fail;whentheyfail,thefailureshouldbe transformedintoan error.Thedifferencebetweenanerrorandafailureisthatanorderedchoicejustpropagatesanerrorinitsfirstalternative insteadofbacktrackingandtryingitssecondalternative,soanyerrorabortsthewholeparser.Thistechniqueisalsocalled thethree-values technique[17]becausetheparserfinisheswithoneofthefollowingvalues:

OK

,

Fail

or

Error

.

Röjemo[18]presenteda

cut

combinatorthatwecanalsousetoannotatethegrammarpieceswhereparsingshouldbe abortedonfailure,onbehalfofefficiencyanderrorreporting.The

cut

combinator isdifferentfromthecutoperator2 (

)

forPEGsbecausethecombinatorisabortiveandunarywhiletheoperatorisnotabortiveandnullary.The

cut

combinator introducedbyRöjemohasthesamesemanticsasthe

nofail

combinatorintroducedbyHutton.

PartridgeandWright[17]showedthaterrordetectioncanbeautomatedinparsercombinatorswhenweassumethatthe grammarisLL(1).Theirmainideais:ifonealternativesuccessfullyconsumesatleastonesymbol,nootheralternativecan successfullyconsumeanysymbols. Theirtechnique isalsoknownasthefour-values techniquebecausetheparserfinishes withoneofthefollowingvalues:

Epsn

,whentheparserfinisheswithsuccesswithoutconsuminganyinput;

OK

,whenthe parserfinisheswithsuccessconsumingsomeinput;

Fail

,whentheparserfailswithoutconsuminganyinput;and

Error

, whentheparserfailsconsumingsomeinput.ThreevalueswereinspiredbyHutton’swork[15],butwithnewmeanings.

Inthe four-valuestechnique,we donot needtoannotate thegrammarbecause theauthorschanged thesemanticsof the sequence andchoice combinatorsto automatically generatethe

Error

value accordingto the Table 1. In summary, the sequencecombinator propagates an errorwhen thesecond parse failsafter consumingsome input whilethe choice combinator doesnot tryfurther alternatives ifthe currentone consumedatleastone symbol fromtheinput. In caseof error,the four-valuestechniquedetects thefirst symbolfollowing thelongestparseofthe inputanduses thissymbolto reportanerror.

Thefour-valuestechniqueassumesthat theinputiscomposed bytokenswhichareprovidedbyaseparatelexer. How-ever, being restricted to LL(1)grammars can be a limitation because parser combinators, like PEGs, usually operate on stringsofcharacterstoimplementbothlexerandparsertogether.Forinstance,aparserforTinythatisimplementedwith Parsec[19]doesnotparsethefollowingprogram:

read x;

.Thatis,thematchingof

read

against

repeat

generatesan error.SuchbehaviorisconfirmedinTable 1bythethirdlinefromthebottom.

(6)

Table 1

Behavior of sequence and choice in the four-values technique.

p1 p2 p1p2 p1|p2

Error Error Error Error

Error Fail Error Error

Error Epsn Error Error

Error OK(x) Error Error

Fail Error Fail Error

Fail Fail Fail Fail

Fail Epsn Fail Epsn

Fail OK(x) Fail OK(x)

Epsn Error Error Error

Epsn Fail Fail Epsn

Epsn Epsn Epsn Epsn

Epsn OK(x) OK(x) OK(x)

OK(x) Error Error OK(x)

OK(x) Fail Error OK(x)

OK(x) Epsn OK(x) OK(x)

OK(x) OK( y) OK( y) OK(x)

Parsec is a parser combinator library for Haskell that employs a technique equivalent to the four-values technique for implementing LL(1) predictive parsers that automatically report errors [19]. To overcome the LL(1) limitation, Par-sec introduced the

try

combinator, a dual of Hutton’s

nofail

combinator. The effect of

try

is to translate an error into a backtrackeablefailure. The ideais touse

try

to annotate theparts ofthe grammarwherearbitrarylookahead is needed.

Parsec’s restriction to LL(1)grammars madeit possibleto implementan error reportingtechnique similar to the one used in top-down parsers. Parsecproduces errormessages that includethe error position,the character atthis position andthe

FIRST

and

FOLLOW

setsoftheproductionsthatwereexpectedatthisposition.Parsecalsoimplementstheerror injectioncombinator(

<?>

)fornamingproductions.Thiscombinatorgetstwoarguments:aparser

p

andastring

exp

.The string

exp

replacesthe

FIRST

set ofaparser

p

when allthealternatives of

p

failed. Thiscombinator isusefultoname terminalsandnon-terminalstogetbetterinformationaboutthecontextofasyntaxerror.

Swierstra and Duponcheel [20] showed an implementation of parser combinators for error recovery, although most libraries andparser generators that are based on parser combinators implementonly error reporting. Their work relies on thefactthat thegrammarisLL(1)andshowsanimplementationofparsercombinatorsthat repairserroneousinputs, produces an appropriatedmessage,andcontinuesparsing therestoftheinput.Thisapproach was laterextendedto also dealwithgrammarsthatarenotLL(1),includingambiguousgrammars[21].Theextendedapproachreliesheavilyonsome featuresthattheimplementationlanguageshouldhave,suchaslazyevaluation.

4. Labeledfailures

Exceptions area commonmechanismforsignalingandhandlingerrorsinprogramminglanguages.Exceptions let pro-grammers classify thedifferenterrorstheir programs maysignal byusing distincttypesfordistinct errors,anddecouple errorhandlingfromregularprogramlogic.

Inthissectionweaddlabeledfailures toPEGs,amechanismakintoexceptionsandexceptionhandling,withthegoalof improvingerrorreportingwhilepreservingthecomposabilityofPEGs.InthenextsectionwediscusshowtousePEGswith labeled failurestoimplementsome ofthetechniquesthat wehavediscussedinSection 3: the

nofail

combinator [15], the

cut

combinator[18],thefour-valuestechnique[17]andthe

try

combinator[19].

A labeledPEGG is atuple

(

V

,

T

,

P

,

L

,

pS

)

where L isa finitesetoflabelsthat mustincludethe

fail

label, andthe

expressions in P have beenextendedwiththethrow operator,explained below.Theother partsusethesame definitions fromSection2.

Theabstractsyntaxoflabeledparsingexpressionsaddsthethrow operator

l,whichgeneratesafailurewithlabell,and

addsan extraargument S totheorderedchoiceoperator, whichisthesetoflabelsthat theorderedchoiceshould catch.

S mustbeasubsetofL.

p

=

ε

|

a

|

A

|

p1p2

|

p1

/

Sp2

|

p

∗ | !

p

| ⇑

l

Fig. 3presentsthesemanticsofPEGswithlabelsasasetofinferencerules.ThesemanticsofPEGswithlabelsisdefined by the relationPEG

;

among aparsing expression,an input stringanda result. The resultiseithera stringor alabel. The notation G

[

p

]

xyPEG

;

y meansthattheexpression p matchestheinputxy,consumestheprefix x andleavesthesuffix y as

theoutput.Thenotation G

[

p

]

xyPEG

;

l indicatesthatthematchingofp failswithlabell ontheinputxy.

Nowa matchesandconsumesitselfandfailswithlabel

fail

otherwise; p1p2triestomatchp1,ifp1 matchesaninput

prefix,thenittriestomatchp2 withthesuffixleftby p1,thelabell ispropagatedotherwise; p1

/

Sp2triestomatchp1 in

(7)

Empty G[ε]xPEG;x (empty.1) Terminal G[a]axPEG;x (char.1) G[b]axPEG; fail ,b=a(char.2) G[a] εPEG; fail (char.3) Non-terminal G[P(A)]x PEG ;X G[A]xPEG;X (var.1)

Concatenation G[p1]xyPEG;y G[p2]yPEG;X G[p1p2]xyPEG;X

(con.1) G[p1]xPEG;l G[p1p2]xPEG;l

(con.2)

Ordered Choice G[p1]xyPEG;y G[p1/Sp2]xyPEG;y (ord.1) G[p1]xPEG;l G[p1/Sp2]xPEG;l ,l∈S(ord.2) G[p1]xPEG;l G[p2]xPEG;X G[p1/Sp2]xPEG;X ,lS(ord.3) Repetition G[p]x PEG ; fail G[p∗]xPEG;x (rep.1) G[p]xy PEG ;y G[p∗]yPEG;X G[p∗]xyPEG;X (rep.2) G[p]xPEG;l G[p∗]xPEG;l ,l= fail (rep.3) Negative Predicate G[p]x PEG ; fail G[!p]xPEG;x (not.1) G[p]xy PEG ;y G[!p]xyPEG; fail(not.2)

G[p]xPEG;l G[!p]xPEG;l ,l= fail (not.3) Throw G[⇑l]xPEG;l (throw.1)

Fig. 3. Semantics of PEGs with labels.

repeatedlymatchesp untilthematchingofp silentlyfailswithlabel

fail

,andpropagatesalabell whenp failswiththis label;

!

p successfullymatchesiftheinputdoesnotmatchp withthelabel

fail

,failsproducingthelabel

fail

whenthe inputmatches p,andpropagatesalabell when p failswiththislabel,not consumingtheinput inall cases;

l produces

thelabell.

Wefacedsomedesigndecisionsinourformulationthatareworthdiscussing.First,weuseasetoflabelsintheordered choiceasaconvenience.Wecouldhaveeachorderedchoicehandlingasingle label,anditwouldjustleadtoduplication: anexpression p1

/

{l1,l2,...,ln}p2 wouldbecome

(

. . . ((

p1

/

l1p2

)

/

l2p2

)

. . . /

ln p2

)

.

Second,we requirethepresenceofa

fail

labelto maintaincompatibilitywiththeoriginalsemantics ofPEGs,where weonlyhave

fail

tosignalbotherrorandfailure.Forthesamereason,wedefinetheexpressionp1

/

p2assyntacticsugar

forp1

/

{fail}p2.

Anotherchoice washowtohandlelabelsinarepetition.Wechosetohavearepetitionstop silentlyonlyonthe

fail

labelto maintain thefollowing identity, which holdsforunlabeledPEGs: an expression p

is equivalent toa fresh non-terminal A withtherule A

pA

/

ε

.

Finally,thenegative predicatesucceedsonlyon the

fail

labeltoallowtheimplementationofthepositive predicate: the expression &p that implements the positive predicate inthe original semantics of PEGs [3,9,2] is equivalent to the expression

!!

p.Bothexpressions successfullymatchiftheinputmatches p, failproducingthelabel

fail

whentheinput doesnotmatchp,andpropagatealabell when p failswiththislabel,notconsumingtheinputinallcases.

Fig. 4presentsaPEGwithlabelsfortheTinylanguagefromSection2.Theexpression

[

p

]

l issyntacticsugarfor

(

p

/

l

)

. The strategy we usedto annotate the grammarwas the following: first,annotate every terminal that should not fail, thatis,makingthePEGbacktrackonfailureofthatterminalwouldbeuseless,asthewholeparsewouldeitherfailornot consumethe wholeinput inthat case. Foran LL(1)grammarlike theone in theexample,that means allterminalsin a productionexcepttheoneintheverybeginningoftheproduction.

After annotating the terminals, we do the same for whole productions. We annotate productions where failing the wholeproductionalways impliesanerrorintheinput,addinganewalternativethatthrowsanerrorlabelspecifictothat production.

For Tiny, we end up annotatingjust two productions, Factor andCmdSeq. Productions Exp, SimpleExp, and Term also

shouldnotfail,butafterannotatingFactor theyalwayseithersucceedorthrowthelabel

exp

.TheCmd productioncanfail, becauseitcontrolswhethertherepetitioninsideCmdSeq stopsorcontinues.

(8)

TinyCmdSeq

CmdSeq← (Cmd[SEMICOLON]sc) (Cmd[SEMICOLON]sc)∗ / ⇑cmd

CmdIfCmd/RepeatCmd/AssignCmd/ReadCmd/WriteCmd IfCmd← IFExp[THEN]thenCmdSeq

(ELSECmdSeq/ε)[END]end

RepeatCmd← REPEATCmdSeq[UNTIL]untilExp

AssignCmd← NAME [ASSIGNMENT]bindExp

ReadCmd← READ [NAME]read

WriteCmd← WRITEExp

ExpSimpleExp((LESS / EQUAL)SimpleExp/ε)

SimpleExpTerm((ADD / SUB)Term)

TermFactor((MUL / DIV)Factor)

Factor← OPENPARExp[CLOSEPAR]cp/ NUMBER / NAME /exp Fig. 4. A PEG with labels for the Tiny language.

Notice that this is just an example of how a grammar can be annotated. More thorough analyses are possible: for example,wecandeducethatCmd isnotallowed tofailunlessthenexttokenisoneof

ELSE

,

END

,

UNTIL

,ortheendof the input (the

FOLLOW

setof Cmd),andinsteadof

cmd add

!(ELSE /

END

/

UNTIL

/

!.)

cmd asanew alternative.This wouldremovetheneedforthe

cmd annotationofCmdSeq.

ThePEGreportsanerrorwhenparsingfinisheswithanuncaughtlabel.Eachlabelisassociatedwithameaningfulerror message.Forinstance,ifweusethisPEGforTinytoparsethecodeexamplefromSection2,parsingfinishes withthe

sc

labelandthePEGcanuseittoproducethefollowingerrormessage:

factorial.tiny:6:1: syntax error, there is a missing ’;’

NotehowthesemanticsoftherepetitionworkswiththeruleCmdSeq.Insidetherepetition,the

fail

labelmeansthat therearenomorecommandstobematchedandtherepetitionshouldstopwhilethe

sc

labelmeansthatasemicolon(

;

) failedtomatch.ItwouldnotbepossibletowritetheruleCmdSeq using repetitionifwehadchosen tostop therepetition withanylabel,insteadofstoppingonlywiththe

fail

label,becausetherepetition wouldacceptthe

sc

labelastheend oftherepetitionwhereasitshouldpropagatethislabel.

AlthoughthesemanticsofPEGswithlabelspresentedinFig. 3allowsustogeneratespecificerrormessages,itdoesnot give usinformationaboutthelocationwherethefailure probablyis,soitisnecessarytousesomeextramechanism(e.g., semanticactions)togetthisinformation.Toavoidthis, wecanadaptthesemanticsofPEGswithlabelstogive usatuple

(

l

,

y

)

incaseofafailure,wherey thesuffixoftheinputthatPEGwastryingtomatchwhenlabell wasthrown.Updating thesemanticsofFig. 3toreflectthischangeisstraightforward.

In the next section, we try to establish a comparison betweenthe farthest failure position heuristic and thelabeled failuremechanismbycontrastingtwodifferentimplementationsofaparserforadialectoftheLualanguage.

5. Labeledfailuresversusfarthestfailureposition

In this section we will compare two parser implementations for the TypedLua language, one that uses the farthest failurepositionheuristicforerrorreporting,whichwasimplementedfirst,andonebasedonlabeledfailures.

TypedLua[22]isanoptionally-typedextensionoftheLuaprogramminglanguage[23].TheTypedLuaparserrecognizes plainLua programs,andalsoLua programswithtype annotations.Thefirstversion oftheparserwas implementedusing Ford’sheuristicandtheLPeglibrary.3

As LPeg doesnothavea nativeerrorreportingmechanismbased onFord’sstrategy, the failuretracking heuristicwas implementedfollowingtheapproachdescribedinSection2,whichusessemanticactions.

BelowwehavetheexampleofaLuastatementwithasyntaxerror:

a = function (a,b,) end

Inthiscase,theparsergivesusthefollowingerrormessage,whichisquiteprecise:

test.lua:1:19: syntax error, unexpected ’)’, expecting ’...’, ’Name’

Intheprevious case,thelistofexpectedtokenshadonlytwocandidates,butthisisnotalwaysthecase.Forexample, letusconsiderthefollowingLuaprogram,wherethereisnoexpressionafterthe elseif inline5:

(9)

01 if a then

02

return x

03 elseif b then

04

return y

05 elseif

06

07 end

Thecorrespondingerrormessagehasalengthylistoftokens,whichdoesnothelpmuchtofixtheerror:

test.lua:7:1: syntax error, unexpected ’end’, expecting ’(’, ’Name’, ’{’,

’function’, ’...’, ’true’, [9 more tokens]

WhenusingtheTypedLuaparserbasedonFord’sheuristicitisnotuncommontogetamessagelikethis.Ananalysis ofthetestcasesavailableintheparserpackageshowsusthataroundhalfoftheexpectederrormessageshavealistofat leasteightexpectedtokens(thereweremessageswithalistof39expectedtokens).

ThesecondimplementationoftheTypedLua parserwas basedonlabeledfailuresandusedtheLPegLabellibrary[24], whichisanextensionoftheLPeglibrarythatsupportslabeledfailures.4

The use oflabeled failures adds an annotation burden, as we haveto specify when each label should be thrown. In thecaseoftheTypedLuagrammar,wedefinedalmost50differentlabels,usingthesamebasicstrategythat weusedto annotatetheTinygrammarofSection4.

GiventhepreviousLuaprogram,theerrormessagepresentednowis:

test.lua:7:1: expecting <exp> after ’elseif’

Thiserrormessageismoreinformativethanthepreviousone,whichwasgeneratedautomatically.Weanalyzedtheerror messagesgeneratedbythetwoparsersin53examples,andconsideredthatinmorethanhalfoftheseexamplestheparser basedonlabeledfailuresproducedabettererrormessage.Inabout20%ofthecasesweconsideredtheerrormessagesof bothapproachessimilar,andinother20%ofthecasestheparserbasedonFord’sheuristicgeneratedbettererrormessages. Theerrorlocationindicatedbythetwoparsersintheexamplesanalyzedwasessentiallythesame.Thisseemstoindicate that themaindifference inpracticebetweenboth approachesisrelatedtothelengthofthe errormessagegenerated.By usinglabeledfailureswe canprobablygetasimple errormessageatthecost ofannotatingthe grammar,whilebyusing thefarthestfailuretrackingheuristicwe canautomaticallygenerateerrormessages,whichsometimesmaycontainalong listofexpectedtokens.

Apointthat isworthmentioningaboutthelabeledfailure approachisthatit isnotmandatoryto annotatetheentire grammar.Thegrammarcanbeannotatedincrementally,atthepointswherethecurrenterrormessageisnotgoodenough, andwhennospecificlabelisthrown,i.e.,whenthelabel

fail

isthrown,anerrormessagecanbegeneratedautomatically by using the position where the failure occurred. This means that combining labeled failures with the farthest failure positionreducestheannotationburden,andhelpstoidentifytheplacesintheparserwherethealabelisdesirable.

In the next section, we discusssome applications of labeled failures: we can use labeled PEGs to express the error reportingtechniquesthatwehavediscussedinSection3 [15,18,17,19],andalsotoefficientlyparsecontext-freegrammars thatcanusetheLL

(

∗)

parsingstrategy[25].

6. Applicationsoflabeledfailures

This section shows that PEGs withlabeled failures can express several error reportingtechniques used in the realm of parsingcombinators.They can alsoefficiently parsecontext-free grammars that are parseable by the LL

(∗)

top-down parsingstrategy.

InHutton’s deterministicparser combinators[15], the

nofail

combinator is usedtodistinguish betweenfailure and error.Wecanexpressthe

nofail

combinatorsusingPEGswithlabelsasfollows:

nofail

p

p

/

error

Thatis,

nofail

is an expression that transforms the failure of p into an errorto abort backtracking. Note that the

error

labelshould not be caught by any ordered choice. Instead,the ordered choice propagatesthis labelandcatches solelythe

fail

label.Theideaisthatparsingshouldfinishwithoneofthefollowingvalues:

success

,

fail

or

error

.

The annotationoftheTinygrammartouse

nofail

issimilar totheannotation wehavedone usinglabeled failures. Basically,wejustneedtochangethegrammartouse

nofail

insteadof

[

p

]

l.Forinstance,we canwritetheruleCmdSeq

asfollows:

CmdSeq

← (

Cmd

(nofail SEMICOLON)) (

Cmd

(nofail SEMICOLON))

(10)

Ifwearewritingagrammarfromscratch,thereisnoadvantagetouse

nofail

insteadofmorespecific labels,asthe annotationburdenisthesameandwith

nofail

welosemorespecificerrormessages.

The

cut

combinator[18]wasintroduced toreducethespaceinefficiency ofbacktrackingparsers,wherethepossibility ofbacktrackingimpliesthatanyinputthat hasalreadybeenprocessedmustbekept inmemoryuntiltheendofparsing. Semanticallyitisidenticalto

nofail

,differingonlyinthewaythecombinatorsareimplemented:toimplement

cut

the parsercombinatorsusecontinuation-passingstyle,so

cut

candropthefailurecontinuationandconsequentlyanypending backtrack frames. Hutton’s

nofail

is implemented in direct style, and is not able to drop pending backtrack frames. Expressinga

cut

operatorwiththesamepropertiesisnotpossibleinoursemanticsofPEGs.

Thefour-valuestechniquechangedthesemanticsofparsercombinatorstoimplementpredictiveparsersforLL(1) gram-marsthatautomaticallyidentifythelongestinputprefixincaseoferror,withoutneedingannotationsinthegrammar.We canexpressthistechniqueusinglabeledfailuresbytransformingtheoriginalPEGwiththefollowingrules:

J

ε

K ≡ ⇑

epsn (1)

J

a

K ≡

a (2)

J

A

K ≡

A (3)

J

p1p2K ≡ Jp1K (Jp2K / ⇑error

/

{epsn}

ε

) /

{epsn}

J

p2K (4)

J

p1

/

p2K ≡ Jp1K /{epsn}

(

J

p2K / ⇑epsn

) /

J

p2K (5)

Thistranslationisbasedonthreelabels:

epsn

meansthat theexpressionsuccessfullyfinishedwithoutconsumingany input,

fail

meansthattheexpressionfailedwithoutconsuminganyinput,and

error

meansthat theexpressionfailed after consumingsome input. In our translation we do not have an

ok

label because a resulting suffix means that the expression successfullyfinished afterconsumingsome input.Itisstraightforwardto checkthatthetranslated expressions behaveaccordingtotheTable 1fromSection3.

Parsecintroducedthe

try

combinatortoannotatepartsofthegrammarwherearbitrarylookaheadisneeded.Weneed arbitrary lookahead because PEGs and parser combinators usually operate at the character level. The authors of Parsec alsoshowedacorrespondencebetweenthesemanticsofParsecasimplementedintheirlibraryandPartridgeandWright’s four-valuedcombinators,sowecanemulatethebehaviorofParsecusinglabeledfailuresbybuildingonthefiverulesabove andaddingthefollowingrulefor

try

:

Jtry

p

K ≡ J

p

K /

{error}

fail (6)

Ifwe taketheTinygrammarofFig. 1fromSection2,insert

try

inthenecessary places,andpass thisnewgrammar through thetransformation

J

K

,thenwegetaPEGthat automaticallyidentifieserrorsintheinput withthe

error

label. Forinstance,wecanwritetheruleRepeatCmd asfollows:

RepeatCmd

← (try REPEAT)

CmdSeq

UNTIL

Exp

LL

(∗)

[25]isaparsingstrategyusedby thepopularparsingtoolANTLR[26,27].5 An LL

(∗)

parserisatop-downparser

with arbitrarylookahead.The mainidea of LL

(

∗)

parsingis tobuild a deterministic finiteautomata foreach rulein the grammar,andusethisautomatatopredictwhichalternativeoftheruletheparsershouldfollow,basedontherestofthe input.EachfinalstateoftheDFAshouldcorrespondtoasinglealternative,orwehavean LL

(∗)

parsingconflict.

Mascarenhasetal.[6]showshowCFGclassesthatcorrespondtotop-downpredictiveparsingstrategiescanbeencoded withPEGsbyusingpredicatestoencode thelookaheadnecessaryforeachalternative.AstranslatingaDeterministicFinite Automata(DFA)toaPEGisstraightforward[12,6],thisgivesusonewayofencodinganLL

(

∗)

parsingstrategyinaPEG,at thecostofencodingadifferentcopyofthelookaheadDFAforeachalternative.

LabeledPEGsprovideamorestraightforwardencoding,whereinsteadofapredicateforeachalternative,weuseasingle encoding ofthe lookahead DFA,where each final state endswith a labelcorresponding to one of the alternatives. Each alternativeisprecededbyachoiceoperatorthatcatchesitslabel.

Tomakethetranslationclearer,letusconsiderthefollowingexample,fromParrandFisher [25],wherenon-terminal S

usesnon-terminal Exp (omitted)tomatcharithmeticexpressions:

S

→ ID | ID

=

’ Exp

|

unsigned

’ ‘

’ ‘

int

ID

|

unsigned

’ ‘

ID ID

Afteranalyzingthisgrammar,ANTLRproducestheDFAofFig. 5.Whentryingtomatch S,ANTLRrunsthisDFAonthe input untilit reaches afinal state that indicates which alternative ofthe choice ofrule S should be tried. Forexample, ANTLRchoosesthesecondalternativeiftheDFAreachesstates4.

Fig. 6givesalabeledPEGthatencodestheLL

(∗)

parsingstrategyforruleS.RulesS0,S1,andS2 encodethelookahead

DFAofFig. 5,andcorrespondtostatess0,s1,ands2,respectively.Thethrowexpressionscorrespondtothefinalstates.As

thethrowexpressionsmaketheinputbacktracktowhereitwaspriortoparsingS0,wedonotneedtouseapredicate.We

canalsoturnanyuncaughtfailuresintoerrors.

(11)

Fig. 5. LL(∗)lookahead DFA for rule S.

SS0/1ID/2ID‘=’ Exp/3‘unsigned’ ‘∗’ ‘int’ID/4‘unsigned’ ‘∗’ID ID/error

S0← IDS1/‘unsigned’ S2/‘int’⇑3 S1←‘=’⇑2/!. ⇑1/ ID⇑4

S2←‘unsigned’ S2/ ID⇑4/‘int’⇑3

Fig. 6. PEG with labels that simulates the LL(∗)algorithm. 7. Conclusions

Inthispaper,wediscussederrorreportingstrategiesforParsingExpressionGrammars.PEGsbehavebadlyonthe pres-enceofsyntaxerrors,becausebacktrackingoftenmakesthePEGlosetrackofthepositionwheretheerrorhappened.This limitationwasalreadyknownbyFord,andhetriedtofixitinhisPEGimplementationbyhavingtheimplementationtrack thefarthestpositionintheinputwhereafailurehashappened[3].

Wetook Ford’sfailuretrackingheuristicandshowedthatitisnot necessarytomodifyaPEGimplementationto track failurepositionsaslongastheimplementationhasmechanismstoexecutesemanticactions,andthecurrentparsing posi-tionisexposedtotheseactions.Inaddition,wealsoshowedhowitiseasytoextendthesemanticsofPEGstoincorporate failuretracking,includinginformationthatcanindicatewhatthePEGwasexpectingwhenthefailurehappened.

Trackingthefarthestfailureposition,eitherby changingthePEGimplementation,usingsemanticactions,orredefining thesemanticsofPEGs,helpsPEGparsersproduceerrormessagesthatareclosetoerrormessagesthatpredictivetop-down parsersareabletoproduce,butthesearegenericerrormessages,sometimeswithalonglistofexpectedtokens.

As a way of generating more specific error messages, we introduced a mechanism of labeled failures to PEGs. This mechanismcloselyresemblesstandardexceptionhandlinginprogramminglanguages.Insteadofasinglekindoffailure,we introducedathrow operator

lthatcanthrowdifferentkindsoffailures,identifiedbytheirlabels,andextendedtheordered

choiceoperatortospecifythesetoflabelsthatitcatches.Theimplementationoftheseextensionsinparsergeneratortools basedonPEGsisstraightforward.

We showedhowlabeled failurescan be usedasa wayto annotateerror points ina grammar,andtie themto more meaningfulerrormessages.Labeledfailuresareorthogonaltothefailure trackingapproachwe discussedearlier,so gram-marscanbeannotatedincrementally,atthepointswherebettererrormessagesarejudgednecessary.

We alsoshowedthat the labeled failuresapproach can expressseveraltechniquesfor errorreportingused inparsers basedondeterministicparsercombinators,aspresentedinrelatedwork[15,18,17,19].Labeledfailurescanalsobeusedas awayofencodingthedecisionsmadebyapredictivetop-downparser,aslongasthedecisionprocedurecanbeencoded asaPEG,andshowedanexampleofhowtoencodean LL

(∗)

grammarinthisway.

Annotatingagrammarwithlabeledfailuresdemandscare:ifwemistakenlyannotateexpressionsthatshouldbeableto fail,thismodifiesthebehavioroftheparserbeyonderrorreporting.Inanycase,theuseoflabeledPEGsforerrorreporting introduces anannotationburdenthat islesserthan theannotationburdenintroducedby errorproductionsinLRparsers, whichalsodemandcare,astheirintroductionusuallyleadtoreduce–reduce conflicts[28].

Weshowedtheerrorreportingstrategiesinthecontextofa smallgrammarfora toylanguage,andwealsodiscussed the implementationof parsers forthe Typed Lua language, an extension of theLua language, basedon thesestrategies.

(12)

Moreover, wealsoimplementedparsersforotherlanguages,suchasCéu[29],basedonthesesapproaches, improvingthe qualityoferrorreportingeitherwithgenericerrormessagesorwithmorespecificerrormessages.

References

[1]A.V.Aho,M.S. Lam,R.Sethi,J.D.Ullman,Compilers:Principles,Techniques,andTools,2ndedition,Addison–WesleyLongmanPublishingCo.,Inc., Boston,MA,USA,2006.

[2]B.Ford,Parsingexpressiongrammars:arecognition-basedsyntacticfoundation,in:Proceedingsofthe31stACMSIGPLAN–SIGACTSymposiumon PrinciplesofProgrammingLanguages,POPL’04,ACM,NewYork,NY,USA,2004,pp. 111–122.

[3]B.Ford,Packratparsing:apracticallinear-timealgorithmwithbacktracking,Master’sthesis,MassachusettsInstituteofTechnology,September2002. [4]S.Medeiros,F.Mascarenhas,R.Ierusalimschy,Fromregularexpressionstoparsingexpressiongrammars,in:BrazilianSymposiumonProgramming

Languages,2011.

[5]S.Medeiros,F.Mascarenhas,R.Ierusalimschy,Leftrecursioninparsingexpressiongrammars,in:ProgrammingLanguages:Proceedingsofthe16th BrazilianSymposium,SBLP2012,Natal,Brazil,September23–28,2012,Springer,Berlin,Heidelberg,2012,pp. 27–41.

[6] F. Mascarenhas, S. Medeiros, R. Ierusalimschy, On the relation between context-free grammars and parsing expression grammars, Sci. Comput. Program. 89 (C) (2014) 235–250, http://dx.doi.org/10.1016/j.scico.2014.01.012.

[7]K.C.Louden,CompilerConstruction:PrinciplesandPractice,PWSPublishingCo.,Boston,MA,USA,1997.

[8]D.Grune,C.J.Jacobs,ParsingTechniques:APracticalGuide,2ndedition,SpringerPublishingCompany,Incorporated,2010.

[9]B.Ford,Packrat parsing:simple,powerful,lazy,lineartime,in:ProceedingsoftheSeventhACMSIGPLANInternationalConferenceonFunctional Programming,ICFP’02,ACM,NewYork,NY,USA,2002,pp. 36–47.

[10] I. Piumarta, peg/leg — recursive-descent parser generators for C, http://piumarta.com/software/peg/, 2007 [visited on March 2013].

[11] R. Ierusalimschy, LPeg – parsing expression grammars for Lua, http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html, 2008 [visited on March 2013]. [12]R.Ierusalimschy,Atextpattern-matchingtoolbasedonparsingexpressiongrammars,Softw.Pract.Exp.39 (3)(2009)221–258.

[13]K.Mizushima,A.Maeda,Y. Yamaguchi,Packratparserscanhandlepracticalgrammarsinmostlyconstantspace,in:Proceedingsofthe 9thACM SIGPLAN–SIGSOFTWorkshoponProgramAnalysisforSoftwareToolsandEngineering,PASTE’10,ACM,NewYork,NY,USA,2010,pp. 29–36. [14]R.Grimm,Betterextensibilitythroughmodularsyntax,in:Proceedingsofthe2006ACMSIGPLANConferenceonProgrammingLanguageDesignand

Implementation,PLDI’06,ACM,NewYork,NY,USA,2006,pp. 38–51.

[15]G.Hutton,Higher-orderfunctionsforparsing,J.Funct.Program.2 (3)(1992)323–343.

[16] M. Spivey, When maybe is not good enough, J. Funct. Program. 22 (2012) 747–756, http://dx.doi.org/10.1017/S0956796812000329, http://journals. cambridge.org/article_S0956796812000329.

[17]A.Partridge,D.Wright,Predictiveparsercombinatorsneedfourvaluestoreporterrors,J.Funct.Program.6 (2)(1996)355–364. [18]N.Röjemo,Efficientparsingcombinators,Tech.rep.,DepartmentofComputerScience,ChalmersUniversityofTechnology,1995.

[19]D.Leijen,E.Meijer,Parsec:directstylemonadicparsercombinatorsfortherealworld,Tech.rep.UU-CS-2001-35,DepartmentofComputerScience, UtrechtUniversity,2001.

[20]S.D.Swierstra,L.Duponcheel,Deterministic,error-correctingcombinatorparsers,in:AdvancedFunctionalProgramming,in:LectureNotesinComputer Science,vol. 1129,Springer,1996,pp. 184–207.

[21]S.D.Swierstra,Combinatorparsers:ashorttutorial,in:A.Bove,L.Barbosa,A.Pardo,J.SousaPinto(Eds.),LanguageEngineeringandRigorousSoftware Development,in:LNCS,vol. 5520,Springer-Verlag,2009,pp. 252–300.

[22]A.M.Maidl,F.Mascarenhas,R.Ierusalimschy,TypedLua:anoptionaltypesystemforLua,in:ProceedingsoftheWorkshoponDynamicLanguagesand Applications,Dyla’14,ACM,NewYork,NY,USA,2014,pp. 3:1–3:10.

[23]R.Ierusalimschy,ProgramminginLua,3rdedition,Lua.Org,2013.

[24] S. Medeiros, LPegLabel – an extension of LPeg that supports labeled failures, https://github.com/sqmedeiros/lpeglabel, 2014 [visited on October 2015]. [25] T. Parr, K. Fisher, Ll(*): the foundation of the ANTLR parser generator, in: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language

Design and Implementation, PLDI ’11, ACM, New York, NY, USA, 2011, pp. 425–436, http://doi.acm.org/10.1145/1993498.1993548. [26]T.Par,TheDefinitiveANTLR4Reference,2ndedition,PragmaticBookshelf,2013.

[27] T. Parr, ANTLR, http://www.antlr.org[visited on March 2014].

[28]C.L.Jeffery,GeneratingLRsyntaxerrormessagesfromexamples,ACMTrans.Program.Lang.Syst.25 (5)(2003)631–640. [29]F.Sant’Anna,Safesystem-levelconcurrencyonresource-constrainednodeswithCÉU,Ph.D.thesis,PUC–Rio,2013.

Referências

Documentos relacionados

Neste capítulo procurou-se não só apresentar e analisar as principais tendências no que toca à procura e oferta turística, assim como as ferramentas ligadas ao planeamento

Regarding performance of Regression Trees and Decision Trees, it is shown that by applying the pruning algorithm to design the Kernel Regression Tree structures,

The measurements obtained with the installed fibre Bragg grating sensing network allowed the improvement and calibration of the numerical models, as well as the real-time monitoring

madeira, aço ou betão armado de construções existentes, incluindo elementos estruturais e elementos não estruturais, abertura de negativos para execução de elevadores e escadas,

Finalmente, o capítulo 5 apresenta uma aplicação clínica direta do andador UFES, em cooperação com sensores inerciais, para avaliação dos parâmetros cinemáticos do joelho

As palestras cobriram informações sobre aspectos históricos da pesquisa do gênero, sobre a evolução e domesticação de espécies na América do Sul, variabilidade apresentada

Los antecedentes de esta tradición se remontan a Tucídides, Maquiavelo, Hobbes, Clausewitz, entre otros, quienes han influenciado notablemente en la construcción de una ontología

Observou-se que para se obter flor de cardo secado com o mínimo de humidade possível, foram necessários cerca de 300 minutos de secagem em estufa na temperatura mais