Contents lists available atScienceDirect
Science
of
Computer
Programming
www.elsevier.com/locate/scico
Error
reporting
in
Parsing
Expression
Grammars
André
Murbach Maidl
a,
Fabio Mascarenhas
b,
∗
,
Sérgio Medeiros
c,
Roberto Ierusalimschy
daPolytechnicSchool,PUCPR,Curitiba,Brazil
bDepartmentofComputerScience,UFRJ,RiodeJaneiro,Brazil cSchoolofScienceandTechnology,UFRN,Natal,Brazil dDepartmentofComputerScience,PUC-Rio,RiodeJaneiro,Brazil
a
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory: Received 11 April 2014
Received in revised form 3 August 2016 Accepted 12 August 2016
Available online 20 August 2016 Keywords:
Parsing Error reporting
Parsing expression grammars Packrat parsing
Parser combinators
Parsing Expression Grammars (PEGs) describe top-down parsers. Unfortunately, the error-reporting techniques used in conventional top-down parsers do not directly apply to parsers based on Parsing Expression Grammars (PEGs), so they have to be somehow simulated. While the PEG formalism has no account of semantic actions, actual PEG implementations add them, and we show how to simulate an error-reporting heuristic through these semantic actions.
We also propose a complementary error reporting strategy that may lead to better error messages: labeled failures. This approach is inspired by exception handling of programming languages, and lets a PEG define different kinds of failure, with each ordered choice operator specifying which kinds it catches. Labeled failures give a way to annotate grammars for better error reporting, to express some of the error reporting strategies used by deterministic parser combinators, and to encode predictive top-down parsing in a PEG.
©2016 Elsevier B.V. All rights reserved.
1. Introduction
Whena parserreceivesan erroneousinput,itshouldindicate theexistenceofsyntax errors.Errorscan behandledin variousways.Theeasiestisjusttoreportthatanerrorwasfound,whereitwasfound,andwhatwasexpectedatthatpoint andthenabort.Attheotherendofthespectrumwefindmechanismsthatattempttoparsethecompleteinput,andreport asmanyerrorsasbestaspossible.
The LL(k)andLR(k)methods detect syntaxerrors very efficientlybecausethey havethe viableprefix property, that is, thesemethodsdetectasyntaxerrorassoonask tokensarereadandcannotbeusedtoextendthethusfaracceptedpartof theinputintoaviableprefixofthelanguage[1].LL(k)andLR(k)parserscanusethispropertytoproducesuitable,though generic,errormessages.
Parsing ExpressionGrammars(PEGs)[2]area formalismfordescribingthe syntaxofprogramminglanguages.Wecan viewaPEGasaformaldescriptionofatop-downparserforthelanguageitdescribes.PEGshaveaconcretesyntaxbased onthesyntaxofregexes,orextendedregularexpressions.UnlikeContext-FreeGrammars(CFGs),PEGsavoidambiguitiesin thedefinitionofthegrammar’slanguageduetotheuseofanorderedchoice operator.
*
Corresponding author.E-mailaddresses:andre.murbach@pucpr.br(A.M. Maidl), mascarenhas@ufrj.br(F. Mascarenhas), sergiomedeiros@ect.ufrn.br(S. Medeiros), roberto@inf.puc-rio.br(R. Ierusalimschy).
http://dx.doi.org/10.1016/j.scico.2016.08.004 0167-6423/©2016 Elsevier B.V. All rights reserved.
Morespecifically,aPEGcanbeinterpretedasathespecificationofarecursivedescentparserwithrestricted(orlocal) backtracking.Thismeansthat thealternatives ofachoicearetried inorder;assoonasan alternativerecognizesaninput prefix,nootheralternativeofthischoicewillbetried,butwhenanalternativefailstorecognizeaninputprefix,theparser backtrackstotrythenextalternative.
On theone hand,PEGs can beinterpreted asaformalization ofaspecific classoftop-down parsers [2]; ontheother hand, PEGs cannot use error handling techniques that are often applied to predictive top-down parsers, because these techniques assume the parser readsthe input without backtracking [3]. In top-down parsers without backtracking, it is possibleto signalasyntax errorassoonasthenextinput symbolcannot beaccepted.InPEGs,itismorecomplicatedto identifythecauseofan errorandthepositionwhereitoccurs, becausefailuresduringparsingarenot necessarilyerrors, butjustanindicationthattheparsercannotproceedandadifferentchoiceshouldbemadeelsewhere.
Ford[3]hasalreadyidentifiedthislimitationoferrorreportinginPEGs,and,inhisparsergeneratorsforPEGs,included aheuristic forbettererrorreporting.Thisheuristicsimulatestheerrorreportingtechniquethatisimplementedintop-down parsers withoutbacktracking.Theideaistotrackthepositionintheinputwherethefarthestfailureoccurred, aswellas whattheparserwasexpectingatthatpoint,andreportthistotheuserincaseoferrors.
TrackingthefarthestfailurepositionandcontextgivesusPEGsthatproduceerrormessagessimilartotheautomatically producederrormessagesofothertop-downparsers;theytelltheuserthepositionwheretheerrorwasencountered,what wasfoundintheinputatthatposition,andwhattheparserwasexpectingtofind.
Inthispaper,weshowhowgrammarwriterscanusethiserrorreportingtechniqueeveninPEGimplementationsthat donotimplementit,bymakinguseofsemanticactionsthatexposethecurrentpositionintheinputandthepossibilityto accesssomeformofmutablestateassociatedwiththeparsingprocess.
WealsoproposeacomplementaryapproachforerrorreportinginPEGs,basedontheconceptoflabeledfailures,inspired by thestandardexceptionhandlingmechanismsasfoundinprogramminglanguages.Insteadofjustfailing,alabeledPEG canproducedifferentkindsoffailurelabelsusingathrow operator.Eachlabelcanbetiedtoamorespecificerrormessage. PEGs canalsocatch suchlabeledfailures,viaachangetotheorderedchoiceoperator.Weformalize labeledfailuresasan extensionofthesemanticsofregularPEGs.
WithlabeledPEGswecanexpresssomealternativeerrorreportingtechniquesfortop-downparserswithlocal backtrack-ing.WecanalsoencodepredictiveparsinginaPEG,andweshow howtodothatforLL
(∗)
parsing,apowerfulpredictive parsingstrategy.The restofthispaperisorganizedasfollows:Section2contextualizestheproblemoferrorhandlinginPEGs,explains in detail thefailure tracking heuristic,andshows howit can be realizedinPEG implementations that donot support it directly;Section 3discusses relatedworkonerrorreportingfortop-downparserswithbacktracking;Section 4introduces andformalizestheconceptoflabeled failures,andshowshowtouseitforerrorreporting;Section 5comparestheerror messagesgeneratedbyaparserbasedonthefailuretrackingheuristicwiththeonesgeneratedbyaparserbasedonlabeled failures;Section6showshowlabeledfailurescanencodesomeofthetechniquesofSection3,aswellaspredictiveparsing; finally,Section7givessomeconcludingremarks.
2. HandlingsyntaxerrorswithPEGs
In thissection, weuseexamples topresentinmoredetailhowa PEGbehavesbadly inthepresenceofsyntax errors. Afterthat,wepresentaheuristicproposedbyFord[3]toimplementerrorreportinginPEGs.Ratherthanusingtheoriginal notationandsemanticsofPEGsgivenbyFord[2],ourexamplesusetheequivalentandmoreconcisenotationandsemantics proposed byMedeiros etal.[4–6].Wewillextendboth thenotationandthesemantics inSection 4topresentPEGswith labeledfailures.
A PEG G is a tuple
(
V,
T,
P,
pS)
where V is a finite set ofnon-terminals, T is a finite setof terminals, P is a totalfunction fromnon-terminalstoparsingexpressions and pS istheinitial parsingexpression.Wedescribethefunction P as
a setof rulesofthe form A
←
p, where A∈
V and p isaparsing expression.Aparsing expression,whenapplied toan input string,eitherfailsorconsumesaprefix oftheinputandreturnstheremainingsuffix.Theabstractsyntaxofparsing expressionsisgivenasfollows,wherea isaterminal, A isanon-terminal,andp,p1 andp2 areparsingexpressions:p
=
ε
|
a|
A|
p1p2|
p1/
p2|
p∗ | !
pIntuitively,
ε
successfully matches the empty string, not changing the input; a matches and consumes itself or fails otherwise; A triestomatchtheexpression P(
A)
; p1p2 triesto matchp1 followedby p2; p1/
p2 triesto matchp1;if p1fails,thenittriestomatchp2;p
∗
repeatedlymatches p untilp fails,thatis,itconsumesasmuchasitcanfromtheinput;thematchingof
!
p succeedsiftheinputdoesnotmatch p andfailswhentheinput matches p,notconsuminganyinput ineithercase;wecallitthenegativepredicateorthelookaheadpredicate.Fig. 1 presentsa PEGforthe Tinylanguage [7].Tiny isa simpleprogramminglanguage withasyntax that resembles Pascal’s.We willusethisPEG,which canbe seenastheequivalent ofan LL(1)CFG,toshow how errorreportingdiffers betweentop-downparserswithoutbacktrackingandPEGs.
Tiny←CmdSeq
CmdSeq← (CmdSEMICOLON) (CmdSEMICOLON)∗
Cmd←IfCmd/RepeatCmd/AssignCmd/ReadCmd/WriteCmd IfCmd← IFExpTHENCmdSeq(ELSECmdSeq/ε) END
RepeatCmd← REPEATCmdSeqUNTILExp AssignCmd← NAME ASSIGNMENTExp
ReadCmd← READ NAME WriteCmd← WRITEExp
Exp←SimpleExp((LESS / EQUAL)SimpleExp/ε)
SimpleExp←Term((ADD / SUB)Term)∗
Term←Factor((MUL / DIV)Factor)∗
Factor← OPENPARExpCLOSEPAR/ NUMBER / NAME Fig. 1. A PEG for the Tiny language.
01 n := 5; 02 f := 1; 03 repeat 04 f := f * n; 05 n := n - 1 06 until (n < 1); 07 write f;
Fig. 2. Program for the Tiny language with a syntax error.
PEGsusually expressthelanguage syntaxatthecharacterlevel,withouttheneedofaseparate lexer.Forinstance,we canwritethelexicalrule
IF
asfollows,assumingwehavenon-terminalsSkip
,whichconsumeswhitespace,andIDRest
, whichconsumesanycharacterthatmaybepresentonapropersuffixofanidentifier1:IF
←
if!
IDRest SkipNow,wepresentanexampleoferroneousTinycodesowecancompareapproachesforerrorreporting.Theprogramin
Fig. 2is missingasemicolon (
;
) inthe assignmentinline5
. Apredictivetop-down parserthat abortsonthefirst error presentsanerrormessagelike:factorial.tiny:6:1: syntax error, unexpected ’until’, expecting ’;’
Theerrorisreportedinline
6
becausetheparsercannotcompleteavalidprefixofthelanguage, sinceitunexpectedly findsthetokenuntil
whenitwasexpectingacommandterminator(;
).InPEGs,wecantrytoreporterrorsusingtheremainingsuffix,butthisapproachusuallydoesnothelpthePEGproduce an error messagelike the one shownabove. In general, when a PEG finishes parsing the input,a remaining non-empty suffixmeansthatparsingdidnotreachtheendoffileduetoasyntaxerror.However,thissuffixusuallydoesnotindicate theactualplaceoftheerror,astheerrorwillhavecausedthePEGtobacktracktoanotherplaceintheinput.
Inourexample,theproblemhappenswhenthePEGtriestorecognizethesequence ofcommands insidethe
repeat
command. Even though the program has a missing semicolon (
;
) in the assignment in line5
, making the PEG fail to recognize the sequence of commands inside therepeat
command, this failure is not treatedasan error. Instead, this failuremakestherecognitionoftherepeat
commandalsofail.Forthisreason,thePEGbacktrackstheinputtoline3
to trytoparseotheralternativesforCmdSeq,andsincethesedonotexist,itsancestorCmd.Sinceitisnotpossibletorecognize acommandotherthanrepeat
atline3
,theparsingfinisheswithoutconsumingalltheinput.Hence,ifthePEGusesthe remaining suffix to producean errormessage,the PEGreports line3instead ofline 6asthe location whereno further progresscanbemade.Thereis noperfectmethodtoidentifywhich informationisthemostrelevanttoreport an error.Inourexampleitis easyfortheparsertocorrectlyreportwhattheerroris,butitiseasy toconstructexampleswherethisisnotthecase.If weaddthesemicolonintheendofline6andremoveline3,apredictivetop-downparserwouldcomplainaboutfinding an
until
whereitexpectedanotherstatement,whiletheactualerrorisamissingrepeat
.According to Ford[3],using theinformation ofthe farthestposition that the PEG reachedin theinput is a heuristic thatprovidesgoodresults.PEGsdefinetop-down parsersandtrytorecognizetheinputfromlefttoright, sotheposition farthesttotherightintheinputthataPEGreachesduringparsingusually isclosetotherealerror[3].Thesameideafor errorreportingintop-downparsingswithbacktrackingwasalsomentionedinSection16.2of[8].
Ford usedthisheuristic to add errorreportingto his PEGimplementation usingpackrat parsers[3].A packratparser generated by Pappy [9], Ford’s PEGparser generator, tracks the farthestposition and uses thisposition to report an er-ror. Inother words,thisheuristic helpspackratparsers tosimulatetheerrorreporting techniquethat is implementedin deterministicparsers.
AlthoughFordonlyhasdiscussedhisheuristic inrelationtopackratparsers,wecanusethefarthestpositionheuristic toadderrorreportingtoanyimplementationofPEGsthatprovidessemanticactions.Theideaistoannotatethegrammar withsemanticactionsthattrackthisposition.Whilethisseemsonerous,wejustneedtoaddannotationstoallthelexical rulestoimplementerrorreporting.
For instance, in Leg [10], a PEG parser generator with Yacc-style semantic actions, we can annotate the rule for
SEMICOLON
asfollows,where|
is Leg’s ordered choiceoperator, and followingit is a semanticaction (in thenotation usedbyLeg):SEMICOLON = ";" Skip | &{ updateffp() }
The function
updateffp
that thesemanticactioncallsupdatesthefarthestfailure positioninaglobalvariableifthe currentparsingpositionisgreaterthanthepositionthatisstoredinthisglobal,thenmakesthewholeactionfailsoparsing continuesasiftheoriginalfailurehadoccurred.However, storing justthe farthestfailure positiondoesnot give theparserall the informationitneeds to producean informativeerrormessage.Thatis,theparserhastheinformationaboutthepositionwheretheerrorhappened,butitlacks the informationaboutwhatterminalsfailedatthatposition.Thus,we extendourapproach byincludingtheterminalsin the annotations sothe parsercanalso trackthesenamesin ordertocompute the setofexpectedterminalsata certain position:
SEMICOLON = ";" Skip | &{ updateffp(";") }
Theextendedimplementationof
updateffp
keeps,foragivenfailureposition,thenamesofallthesymbolsexpected there.Ifthecurrentpositionisgreaterthanthefarthestfailure,updateffp
initializesthissetwithjustthegivenname.If thecurrentpositionequalsthefarthestfailure,updateffp
addsthisnametotheset.Parsersgenerated by Pappy alsotracktheset ofexpectedterminals, butwithlimitations.The errormessages include onlysymbolsandkeywordsthatweredefinedinthegrammarasliteralstrings.Thatis,theerrormessagesdonotinclude terminalsthatweredefinedthroughcharacterclasses.
Our approach of naming terminals in the semantic actions avoids the kind of limitation found in Pappy, though it increasestheannotationburdenbecausetheimplementorofthePEGisalsoresponsibleforaddingonesemanticactionfor eachterminalanditsrespectivename.
The annotationburdencanbelessenedinimplementationsofPEGsthattreatparsingexpressionsasfirst-classobjects, because thismakes it possible to define functionsthat annotate the lexical partsof the grammarto trackerrors,record information about the expected terminalsto produce good errormessages, andenforce lexical conventionssuch as the presence ofsurroundingwhitespace.Forinstance,inLPEG[11,12],aPEGimplementationforLua thatdefinespatternsas first-class objects,we canannotatetheruleCmdSeq asfollows,wherethepatterns
V"A"
,p1 * p2
,andp^0
are respec-tivelyequivalenttoparsingexpressions A,p1p2,andp∗
(inthenotationusedbyLPEG):CmdSeq = V"Cmd" * symb(";") * (V"Cmd" * symb(";"))^0;
Thefunction
symb
receivesastringasitsonlyargumentandreturnsaparserthatisequivalenttotheparsingexpression thatweusedintheLegexample.Thatis,symb(";")
isequivalentto";" Skip | &{ updateffp(";")}
.WeimplementederrortrackingandreportingusingsemanticactionsasasetofparsingcombinatorsontopofLPegand used thesecombinatorstoimplementaPEGparserforTiny.Itproduces thefollowingerrormessagefortheexamplewe havebeenusinginthissection:
factorial.tiny:6:1: syntax error, unexpected ’until’,
expecting ’;’, ’=’, ’<’, ’-’, ’+’, ’/’, ’*’
WetestedthisPEGparserforTinywithothererroneousinputsandinallcasestheparseridentifiedanerrorinthesame placeasatop-downparserwithoutbacktracking.Inaddition,theparserforTinyproducederrormessagesthataresimilar to the error messages produced by packrat parsers generated by Pappy. We annotatedother grammars andsuccessfully obtainedsimilar results.However, theerrormessagesarestill generic;theyarenot asspecificastheerrormessages ofa hand-writtentop-downparser.
3. Errorreportingintop-downparserswithbacktracking
Mizushima et al. [13] proposed a cut operator (
↑
) to reduce the space consumption of packrat parsers; the authors claimedthatthecutoperatorcanalsobeusedtoimplementerrorreportinginpackratparsers,buttheauthorsdidnotgive anydetails onhowthecutoperatorcouldbeusedforthispurpose.Thecut operatorisborrowedfromPrologtoannotate piecesofaPEGwherebacktrackingshouldbeavoided.PEGs’orderedchoiceworksinasimilarwaytoProlog’sgreencuts, thatis,theylimitbacktrackingtodiscardunnecessarysolutions.Thecutproposed forPEGsisawaytoimplementProlog’s whitecuts,thatis,theypreventbacktrackingtorulesthatwillcertainlyfail.Thesemanticsofcutissimilartothesemanticsofan
if-then-else
controlstructureandcanbesimulatedthrough predicates.Forinstance,thePEG(withcut) A←
B↑
C/
D isfunctionallyequivalenttothePEG(without cut)A←
BC/
!
B Dthatisalsofunctionallyequivalenttotherule A
←
B[
C,
D]
onGeneralizedTop-DownParsingLanguage(GTDPL),oneofthe parsing techniquesthat influenced thecreation ofPEGs [3,9,2]. Onthe threecases,the expression D is tried onlyifthe expression B fails.Nevertheless,thistranslatedPEGstillbacktrackswheneverB successfullymatchesandC fails.Thus,itis nottrivialtousethistranslationtoimplementerrorreportinginPEGs.Rats![14]isapopularpackratparserthatimplementserrorreportingwithastrategysimilartoFord’s,withthechange thatitalwaysreportserrorpositionsatthestartofproductions,andpretty-printsnon-terminalnamesintheerrormessage.
Forexample,anerrorinaReturnStatement non-terminalbecomes
return
statement
expected
.Even though error handlingis an importanttask for parsers, we did not find anyother research results about error handling in PEGs, beyond the heuristic proposed by Ford andthe cut operator proposed by Mizushima et al. However, parsercombinators[15]presentsomesimilaritieswithPEGssowewillbrieflydiscussthemfortherestofthissection.
Infunctionalprogrammingitiscommontoimplementrecursivedescentparsersusingparsercombinators[15].Aparser isa functionthat we usetomodel symbolsofthegrammar.Aparser combinatoris ahigher-orderfunction that weuse to implementgrammarconstructionssuch assequencing andchoice.One kindofparser combinator implementsparsers thatreturnalistofallpossibleresultsofa parse,effectivelyimplementingarecursivedescentparserwithfull backtrack-ing.Despitebeingactuallydeterministicinbehavior (parsingthesameinputalways yieldsthe samelistofresults),these combinatorsarecallednon-deterministicparsercombinators duetotheiruseofanon-deterministicchoiceoperator.Weget parsercombinatorsthathavethesamesemanticsasPEGsbychangingthereturntypefromlistofresultsto
Maybe
.Thatis, weusedeterministicparsercombinators thatreturnMaybe
toimplementrecursivedescent parserswithlimited backtrack-ing[16].Intherestofthispaper,wheneverwerefertoparsercombinatorsweintendtorefertotheseparsercombinators withlimitedbacktracking.LikePEGs,mostdeterministic parsercombinator librariesalsouseorderedchoice,andthussufferfromthesame prob-lemsasPEGswitherroneousinputs,wherethepointthattheparserreachedintheinputisusually farawayfromthepoint oftheerror.
Hutton[15] introduced the
nofail
combinator toimplementerrorreportingina quitesimpleway:we justneedto distinguish betweenfailure anderror during parsing.More specifically, we can use thenofail
combinator to annotate thegrammar’sterminalsandnon-terminalsthatshouldnot fail;whentheyfail,thefailureshouldbe transformedintoan error.Thedifferencebetweenanerrorandafailureisthatanorderedchoicejustpropagatesanerrorinitsfirstalternative insteadofbacktrackingandtryingitssecondalternative,soanyerrorabortsthewholeparser.Thistechniqueisalsocalled thethree-values technique[17]becausetheparserfinisheswithoneofthefollowingvalues:OK
,Fail
orError
.Röjemo[18]presenteda
cut
combinatorthatwecanalsousetoannotatethegrammarpieceswhereparsingshouldbe abortedonfailure,onbehalfofefficiencyanderrorreporting.Thecut
combinator isdifferentfromthecutoperator2 (↑
)forPEGsbecausethecombinatorisabortiveandunarywhiletheoperatorisnotabortiveandnullary.The
cut
combinator introducedbyRöjemohasthesamesemanticsasthenofail
combinatorintroducedbyHutton.PartridgeandWright[17]showedthaterrordetectioncanbeautomatedinparsercombinatorswhenweassumethatthe grammarisLL(1).Theirmainideais:ifonealternativesuccessfullyconsumesatleastonesymbol,nootheralternativecan successfullyconsumeanysymbols. Theirtechnique isalsoknownasthefour-values techniquebecausetheparserfinishes withoneofthefollowingvalues:
Epsn
,whentheparserfinisheswithsuccesswithoutconsuminganyinput;OK
,whenthe parserfinisheswithsuccessconsumingsomeinput;Fail
,whentheparserfailswithoutconsuminganyinput;andError
, whentheparserfailsconsumingsomeinput.ThreevalueswereinspiredbyHutton’swork[15],butwithnewmeanings.Inthe four-valuestechnique,we donot needtoannotate thegrammarbecause theauthorschanged thesemanticsof the sequence andchoice combinatorsto automatically generatethe
Error
value accordingto the Table 1. In summary, the sequencecombinator propagates an errorwhen thesecond parse failsafter consumingsome input whilethe choice combinator doesnot tryfurther alternatives ifthe currentone consumedatleastone symbol fromtheinput. In caseof error,the four-valuestechniquedetects thefirst symbolfollowing thelongestparseofthe inputanduses thissymbolto reportanerror.Thefour-valuestechniqueassumesthat theinputiscomposed bytokenswhichareprovidedbyaseparatelexer. How-ever, being restricted to LL(1)grammars can be a limitation because parser combinators, like PEGs, usually operate on stringsofcharacterstoimplementbothlexerandparsertogether.Forinstance,aparserforTinythatisimplementedwith Parsec[19]doesnotparsethefollowingprogram:
read x;
.Thatis,thematchingofread
againstrepeat
generatesan error.SuchbehaviorisconfirmedinTable 1bythethirdlinefromthebottom.Table 1
Behavior of sequence and choice in the four-values technique.
p1 p2 p1p2 p1|p2
Error Error Error Error
Error Fail Error Error
Error Epsn Error Error
Error OK(x) Error Error
Fail Error Fail Error
Fail Fail Fail Fail
Fail Epsn Fail Epsn
Fail OK(x) Fail OK(x)
Epsn Error Error Error
Epsn Fail Fail Epsn
Epsn Epsn Epsn Epsn
Epsn OK(x) OK(x) OK(x)
OK(x) Error Error OK(x)
OK(x) Fail Error OK(x)
OK(x) Epsn OK(x) OK(x)
OK(x) OK( y) OK( y) OK(x)
Parsec is a parser combinator library for Haskell that employs a technique equivalent to the four-values technique for implementing LL(1) predictive parsers that automatically report errors [19]. To overcome the LL(1) limitation, Par-sec introduced the
try
combinator, a dual of Hutton’snofail
combinator. The effect oftry
is to translate an error into a backtrackeablefailure. The ideais tousetry
to annotate theparts ofthe grammarwherearbitrarylookahead is needed.Parsec’s restriction to LL(1)grammars madeit possibleto implementan error reportingtechnique similar to the one used in top-down parsers. Parsecproduces errormessages that includethe error position,the character atthis position andthe
FIRST
andFOLLOW
setsoftheproductionsthatwereexpectedatthisposition.Parsecalsoimplementstheerror injectioncombinator(<?>
)fornamingproductions.Thiscombinatorgetstwoarguments:aparserp
andastringexp
.The stringexp
replacestheFIRST
set ofaparserp
when allthealternatives ofp
failed. Thiscombinator isusefultoname terminalsandnon-terminalstogetbetterinformationaboutthecontextofasyntaxerror.Swierstra and Duponcheel [20] showed an implementation of parser combinators for error recovery, although most libraries andparser generators that are based on parser combinators implementonly error reporting. Their work relies on thefactthat thegrammarisLL(1)andshowsanimplementationofparsercombinatorsthat repairserroneousinputs, produces an appropriatedmessage,andcontinuesparsing therestoftheinput.Thisapproach was laterextendedto also dealwithgrammarsthatarenotLL(1),includingambiguousgrammars[21].Theextendedapproachreliesheavilyonsome featuresthattheimplementationlanguageshouldhave,suchaslazyevaluation.
4. Labeledfailures
Exceptions area commonmechanismforsignalingandhandlingerrorsinprogramminglanguages.Exceptions let pro-grammers classify thedifferenterrorstheir programs maysignal byusing distincttypesfordistinct errors,anddecouple errorhandlingfromregularprogramlogic.
Inthissectionweaddlabeledfailures toPEGs,amechanismakintoexceptionsandexceptionhandling,withthegoalof improvingerrorreportingwhilepreservingthecomposabilityofPEGs.InthenextsectionwediscusshowtousePEGswith labeled failurestoimplementsome ofthetechniquesthat wehavediscussedinSection 3: the
nofail
combinator [15], thecut
combinator[18],thefour-valuestechnique[17]andthetry
combinator[19].A labeledPEGG is atuple
(
V,
T,
P,
L,
pS)
where L isa finitesetoflabelsthat mustincludethefail
label, andtheexpressions in P have beenextendedwiththethrow operator,explained below.Theother partsusethesame definitions fromSection2.
Theabstractsyntaxoflabeledparsingexpressionsaddsthethrow operator
⇑
l,whichgeneratesafailurewithlabell,andaddsan extraargument S totheorderedchoiceoperator, whichisthesetoflabelsthat theorderedchoiceshould catch.
S mustbeasubsetofL.
p
=
ε
|
a|
A|
p1p2|
p1/
Sp2|
p∗ | !
p| ⇑
lFig. 3presentsthesemanticsofPEGswithlabelsasasetofinferencerules.ThesemanticsofPEGswithlabelsisdefined by the relationPEG
;
among aparsing expression,an input stringanda result. The resultiseithera stringor alabel. The notation G[
p]
xyPEG;
y meansthattheexpression p matchestheinputxy,consumestheprefix x andleavesthesuffix y astheoutput.Thenotation G
[
p]
xyPEG;
l indicatesthatthematchingofp failswithlabell ontheinputxy.Nowa matchesandconsumesitselfandfailswithlabel
fail
otherwise; p1p2triestomatchp1,ifp1 matchesaninputprefix,thenittriestomatchp2 withthesuffixleftby p1,thelabell ispropagatedotherwise; p1
/
Sp2triestomatchp1 inEmpty G[ε]xPEG;x (empty.1) Terminal G[a]axPEG;x (char.1) G[b]axPEG; fail ,b=a(char.2) G[a] εPEG; fail (char.3) Non-terminal G[P(A)]x PEG ;X G[A]xPEG;X (var.1)
Concatenation G[p1]xyPEG;y G[p2]yPEG;X G[p1p2]xyPEG;X
(con.1) G[p1]xPEG;l G[p1p2]xPEG;l
(con.2)
Ordered Choice G[p1]xyPEG;y G[p1/Sp2]xyPEG;y (ord.1) G[p1]xPEG;l G[p1/Sp2]xPEG;l ,l∈S(ord.2) G[p1]xPEG;l G[p2]xPEG;X G[p1/Sp2]xPEG;X ,l∈S(ord.3) Repetition G[p]x PEG ; fail G[p∗]xPEG;x (rep.1) G[p]xy PEG ;y G[p∗]yPEG;X G[p∗]xyPEG;X (rep.2) G[p]xPEG;l G[p∗]xPEG;l ,l= fail (rep.3) Negative Predicate G[p]x PEG ; fail G[!p]xPEG;x (not.1) G[p]xy PEG ;y G[!p]xyPEG; fail(not.2)
G[p]xPEG;l G[!p]xPEG;l ,l= fail (not.3) Throw G[⇑l]xPEG;l (throw.1)
Fig. 3. Semantics of PEGs with labels.
repeatedlymatchesp untilthematchingofp silentlyfailswithlabel
fail
,andpropagatesalabell whenp failswiththis label;!
p successfullymatchesiftheinputdoesnotmatchp withthelabelfail
,failsproducingthelabelfail
whenthe inputmatches p,andpropagatesalabell when p failswiththislabel,not consumingtheinput inall cases;⇑
l producesthelabell.
Wefacedsomedesigndecisionsinourformulationthatareworthdiscussing.First,weuseasetoflabelsintheordered choiceasaconvenience.Wecouldhaveeachorderedchoicehandlingasingle label,anditwouldjustleadtoduplication: anexpression p1
/
{l1,l2,...,ln}p2 wouldbecome(
. . . ((
p1/
l1p2)
/
l2p2)
. . . /
ln p2)
.Second,we requirethepresenceofa
fail
labelto maintaincompatibilitywiththeoriginalsemantics ofPEGs,where weonlyhavefail
tosignalbotherrorandfailure.Forthesamereason,wedefinetheexpressionp1/
p2assyntacticsugarforp1
/
{fail}p2.Anotherchoice washowtohandlelabelsinarepetition.Wechosetohavearepetitionstop silentlyonlyonthe
fail
labelto maintain thefollowing identity, which holdsforunlabeledPEGs: an expression p
∗
is equivalent toa fresh non-terminal A withtherule A←
pA/
ε
.Finally,thenegative predicatesucceedsonlyon the
fail
labeltoallowtheimplementationofthepositive predicate: the expression &p that implements the positive predicate inthe original semantics of PEGs [3,9,2] is equivalent to the expression!!
p.Bothexpressions successfullymatchiftheinputmatches p, failproducingthelabelfail
whentheinput doesnotmatchp,andpropagatealabell when p failswiththislabel,notconsumingtheinputinallcases.Fig. 4presentsaPEGwithlabelsfortheTinylanguagefromSection2.Theexpression
[
p]
l issyntacticsugarfor(
p/
⇑
l)
. The strategy we usedto annotate the grammarwas the following: first,annotate every terminal that should not fail, thatis,makingthePEGbacktrackonfailureofthatterminalwouldbeuseless,asthewholeparsewouldeitherfailornot consumethe wholeinput inthat case. Foran LL(1)grammarlike theone in theexample,that means allterminalsin a productionexcepttheoneintheverybeginningoftheproduction.After annotating the terminals, we do the same for whole productions. We annotate productions where failing the wholeproductionalways impliesanerrorintheinput,addinganewalternativethatthrowsanerrorlabelspecifictothat production.
For Tiny, we end up annotatingjust two productions, Factor andCmdSeq. Productions Exp, SimpleExp, and Term also
shouldnotfail,butafterannotatingFactor theyalwayseithersucceedorthrowthelabel
exp
.TheCmd productioncanfail, becauseitcontrolswhethertherepetitioninsideCmdSeq stopsorcontinues.Tiny←CmdSeq
CmdSeq← (Cmd[SEMICOLON]sc) (Cmd[SEMICOLON]sc)∗ / ⇑cmd
Cmd←IfCmd/RepeatCmd/AssignCmd/ReadCmd/WriteCmd IfCmd← IFExp[THEN]thenCmdSeq
(ELSECmdSeq/ε)[END]end
RepeatCmd← REPEATCmdSeq[UNTIL]untilExp
AssignCmd← NAME [ASSIGNMENT]bindExp
ReadCmd← READ [NAME]read
WriteCmd← WRITEExp
Exp←SimpleExp((LESS / EQUAL)SimpleExp/ε)
SimpleExp←Term((ADD / SUB)Term)∗
Term←Factor((MUL / DIV)Factor)∗
Factor← OPENPARExp[CLOSEPAR]cp/ NUMBER / NAME /⇑exp Fig. 4. A PEG with labels for the Tiny language.
Notice that this is just an example of how a grammar can be annotated. More thorough analyses are possible: for example,wecandeducethatCmd isnotallowed tofailunlessthenexttokenisoneof
ELSE
,END
,UNTIL
,ortheendof the input (theFOLLOW
setof Cmd),andinsteadof⇑
cmd add!(ELSE /
END
/
UNTIL
/
!.)
⇑
cmd asanew alternative.This wouldremovetheneedforthe⇑
cmd annotationofCmdSeq.ThePEGreportsanerrorwhenparsingfinisheswithanuncaughtlabel.Eachlabelisassociatedwithameaningfulerror message.Forinstance,ifweusethisPEGforTinytoparsethecodeexamplefromSection2,parsingfinishes withthe
sc
labelandthePEGcanuseittoproducethefollowingerrormessage:
factorial.tiny:6:1: syntax error, there is a missing ’;’
NotehowthesemanticsoftherepetitionworkswiththeruleCmdSeq.Insidetherepetition,the
fail
labelmeansthat therearenomorecommandstobematchedandtherepetitionshouldstopwhilethesc
labelmeansthatasemicolon(;
) failedtomatch.ItwouldnotbepossibletowritetheruleCmdSeq using repetitionifwehadchosen tostop therepetition withanylabel,insteadofstoppingonlywiththefail
label,becausetherepetition wouldacceptthesc
labelastheend oftherepetitionwhereasitshouldpropagatethislabel.AlthoughthesemanticsofPEGswithlabelspresentedinFig. 3allowsustogeneratespecificerrormessages,itdoesnot give usinformationaboutthelocationwherethefailure probablyis,soitisnecessarytousesomeextramechanism(e.g., semanticactions)togetthisinformation.Toavoidthis, wecanadaptthesemanticsofPEGswithlabelstogive usatuple
(
l,
y)
incaseofafailure,wherey thesuffixoftheinputthatPEGwastryingtomatchwhenlabell wasthrown.Updating thesemanticsofFig. 3toreflectthischangeisstraightforward.In the next section, we try to establish a comparison betweenthe farthest failure position heuristic and thelabeled failuremechanismbycontrastingtwodifferentimplementationsofaparserforadialectoftheLualanguage.
5. Labeledfailuresversusfarthestfailureposition
In this section we will compare two parser implementations for the TypedLua language, one that uses the farthest failurepositionheuristicforerrorreporting,whichwasimplementedfirst,andonebasedonlabeledfailures.
TypedLua[22]isanoptionally-typedextensionoftheLuaprogramminglanguage[23].TheTypedLuaparserrecognizes plainLua programs,andalsoLua programswithtype annotations.Thefirstversion oftheparserwas implementedusing Ford’sheuristicandtheLPeglibrary.3
As LPeg doesnothavea nativeerrorreportingmechanismbased onFord’sstrategy, the failuretracking heuristicwas implementedfollowingtheapproachdescribedinSection2,whichusessemanticactions.
BelowwehavetheexampleofaLuastatementwithasyntaxerror:
a = function (a,b,) end
Inthiscase,theparsergivesusthefollowingerrormessage,whichisquiteprecise:
test.lua:1:19: syntax error, unexpected ’)’, expecting ’...’, ’Name’
Intheprevious case,thelistofexpectedtokenshadonlytwocandidates,butthisisnotalwaysthecase.Forexample, letusconsiderthefollowingLuaprogram,wherethereisnoexpressionafterthe elseif inline5:
01 if a then
02
return x
03 elseif b then
04
return y
05 elseif
06
07 end
Thecorrespondingerrormessagehasalengthylistoftokens,whichdoesnothelpmuchtofixtheerror:
test.lua:7:1: syntax error, unexpected ’end’, expecting ’(’, ’Name’, ’{’,
’function’, ’...’, ’true’, [9 more tokens]
WhenusingtheTypedLuaparserbasedonFord’sheuristicitisnotuncommontogetamessagelikethis.Ananalysis ofthetestcasesavailableintheparserpackageshowsusthataroundhalfoftheexpectederrormessageshavealistofat leasteightexpectedtokens(thereweremessageswithalistof39expectedtokens).
ThesecondimplementationoftheTypedLua parserwas basedonlabeledfailuresandusedtheLPegLabellibrary[24], whichisanextensionoftheLPeglibrarythatsupportslabeledfailures.4
The use oflabeled failures adds an annotation burden, as we haveto specify when each label should be thrown. In thecaseoftheTypedLuagrammar,wedefinedalmost50differentlabels,usingthesamebasicstrategythat weusedto annotatetheTinygrammarofSection4.
GiventhepreviousLuaprogram,theerrormessagepresentednowis:
test.lua:7:1: expecting <exp> after ’elseif’
Thiserrormessageismoreinformativethanthepreviousone,whichwasgeneratedautomatically.Weanalyzedtheerror messagesgeneratedbythetwoparsersin53examples,andconsideredthatinmorethanhalfoftheseexamplestheparser basedonlabeledfailuresproducedabettererrormessage.Inabout20%ofthecasesweconsideredtheerrormessagesof bothapproachessimilar,andinother20%ofthecasestheparserbasedonFord’sheuristicgeneratedbettererrormessages. Theerrorlocationindicatedbythetwoparsersintheexamplesanalyzedwasessentiallythesame.Thisseemstoindicate that themaindifference inpracticebetweenboth approachesisrelatedtothelengthofthe errormessagegenerated.By usinglabeledfailureswe canprobablygetasimple errormessageatthecost ofannotatingthe grammar,whilebyusing thefarthestfailuretrackingheuristicwe canautomaticallygenerateerrormessages,whichsometimesmaycontainalong listofexpectedtokens.
Apointthat isworthmentioningaboutthelabeledfailure approachisthatit isnotmandatoryto annotatetheentire grammar.Thegrammarcanbeannotatedincrementally,atthepointswherethecurrenterrormessageisnotgoodenough, andwhennospecificlabelisthrown,i.e.,whenthelabel
fail
isthrown,anerrormessagecanbegeneratedautomatically by using the position where the failure occurred. This means that combining labeled failures with the farthest failure positionreducestheannotationburden,andhelpstoidentifytheplacesintheparserwherethealabelisdesirable.In the next section, we discusssome applications of labeled failures: we can use labeled PEGs to express the error reportingtechniquesthatwehavediscussedinSection3 [15,18,17,19],andalsotoefficientlyparsecontext-freegrammars thatcanusetheLL
(
∗)
parsingstrategy[25].6. Applicationsoflabeledfailures
This section shows that PEGs withlabeled failures can express several error reportingtechniques used in the realm of parsingcombinators.They can alsoefficiently parsecontext-free grammars that are parseable by the LL
(∗)
top-down parsingstrategy.InHutton’s deterministicparser combinators[15], the
nofail
combinator is usedtodistinguish betweenfailure and error.Wecanexpressthenofail
combinatorsusingPEGswithlabelsasfollows:nofail
p≡
p/
⇑
errorThatis,
nofail
is an expression that transforms the failure of p into an errorto abort backtracking. Note that theerror
labelshould not be caught by any ordered choice. Instead,the ordered choice propagatesthis labelandcatches solelythefail
label.Theideaisthatparsingshouldfinishwithoneofthefollowingvalues:success
,fail
orerror
.The annotationoftheTinygrammartouse
nofail
issimilar totheannotation wehavedone usinglabeled failures. Basically,wejustneedtochangethegrammartousenofail
insteadof[
p]
l.Forinstance,we canwritetheruleCmdSeqasfollows:
CmdSeq
← (
Cmd(nofail SEMICOLON)) (
Cmd(nofail SEMICOLON))
∗
Ifwearewritingagrammarfromscratch,thereisnoadvantagetouse
nofail
insteadofmorespecific labels,asthe annotationburdenisthesameandwithnofail
welosemorespecificerrormessages.The
cut
combinator[18]wasintroduced toreducethespaceinefficiency ofbacktrackingparsers,wherethepossibility ofbacktrackingimpliesthatanyinputthat hasalreadybeenprocessedmustbekept inmemoryuntiltheendofparsing. Semanticallyitisidenticaltonofail
,differingonlyinthewaythecombinatorsareimplemented:toimplementcut
the parsercombinatorsusecontinuation-passingstyle,socut
candropthefailurecontinuationandconsequentlyanypending backtrack frames. Hutton’snofail
is implemented in direct style, and is not able to drop pending backtrack frames. Expressingacut
operatorwiththesamepropertiesisnotpossibleinoursemanticsofPEGs.Thefour-valuestechniquechangedthesemanticsofparsercombinatorstoimplementpredictiveparsersforLL(1) gram-marsthatautomaticallyidentifythelongestinputprefixincaseoferror,withoutneedingannotationsinthegrammar.We canexpressthistechniqueusinglabeledfailuresbytransformingtheoriginalPEGwiththefollowingrules:
J
ε
K ≡ ⇑
epsn (1)J
aK ≡
a (2)J
AK ≡
A (3)J
p1p2K ≡ Jp1K (Jp2K / ⇑error/
{epsn}ε
) /
{epsn}J
p2K (4)J
p1/
p2K ≡ Jp1K /{epsn}(
J
p2K / ⇑epsn) /
J
p2K (5)Thistranslationisbasedonthreelabels:
epsn
meansthat theexpressionsuccessfullyfinishedwithoutconsumingany input,fail
meansthattheexpressionfailedwithoutconsuminganyinput,anderror
meansthat theexpressionfailed after consumingsome input. In our translation we do not have anok
label because a resulting suffix means that the expression successfullyfinished afterconsumingsome input.Itisstraightforwardto checkthatthetranslated expressions behaveaccordingtotheTable 1fromSection3.Parsecintroducedthe
try
combinatortoannotatepartsofthegrammarwherearbitrarylookaheadisneeded.Weneed arbitrary lookahead because PEGs and parser combinators usually operate at the character level. The authors of Parsec alsoshowedacorrespondencebetweenthesemanticsofParsecasimplementedintheirlibraryandPartridgeandWright’s four-valuedcombinators,sowecanemulatethebehaviorofParsecusinglabeledfailuresbybuildingonthefiverulesabove andaddingthefollowingrulefortry
:Jtry
pK ≡ J
pK /
{error}⇑
fail (6)Ifwe taketheTinygrammarofFig. 1fromSection2,insert
try
inthenecessary places,andpass thisnewgrammar through thetransformationJ
K
,thenwegetaPEGthat automaticallyidentifieserrorsintheinput withtheerror
label. Forinstance,wecanwritetheruleRepeatCmd asfollows:RepeatCmd
← (try REPEAT)
CmdSeqUNTIL
ExpLL
(∗)
[25]isaparsingstrategyusedby thepopularparsingtoolANTLR[26,27].5 An LL(∗)
parserisatop-downparserwith arbitrarylookahead.The mainidea of LL
(
∗)
parsingis tobuild a deterministic finiteautomata foreach rulein the grammar,andusethisautomatatopredictwhichalternativeoftheruletheparsershouldfollow,basedontherestofthe input.EachfinalstateoftheDFAshouldcorrespondtoasinglealternative,orwehavean LL(∗)
parsingconflict.Mascarenhasetal.[6]showshowCFGclassesthatcorrespondtotop-downpredictiveparsingstrategiescanbeencoded withPEGsbyusingpredicatestoencode thelookaheadnecessaryforeachalternative.AstranslatingaDeterministicFinite Automata(DFA)toaPEGisstraightforward[12,6],thisgivesusonewayofencodinganLL
(
∗)
parsingstrategyinaPEG,at thecostofencodingadifferentcopyofthelookaheadDFAforeachalternative.LabeledPEGsprovideamorestraightforwardencoding,whereinsteadofapredicateforeachalternative,weuseasingle encoding ofthe lookahead DFA,where each final state endswith a labelcorresponding to one of the alternatives. Each alternativeisprecededbyachoiceoperatorthatcatchesitslabel.
Tomakethetranslationclearer,letusconsiderthefollowingexample,fromParrandFisher [25],wherenon-terminal S
usesnon-terminal Exp (omitted)tomatcharithmeticexpressions:
S
→ ID | ID
‘=
’ Exp|
‘unsigned
’ ‘∗
’ ‘int
’ID
|
‘unsigned
’ ‘∗
’ID ID
Afteranalyzingthisgrammar,ANTLRproducestheDFAofFig. 5.Whentryingtomatch S,ANTLRrunsthisDFAonthe input untilit reaches afinal state that indicates which alternative ofthe choice ofrule S should be tried. Forexample, ANTLRchoosesthesecondalternativeiftheDFAreachesstates4.
Fig. 6givesalabeledPEGthatencodestheLL
(∗)
parsingstrategyforruleS.RulesS0,S1,andS2 encodethelookaheadDFAofFig. 5,andcorrespondtostatess0,s1,ands2,respectively.Thethrowexpressionscorrespondtothefinalstates.As
thethrowexpressionsmaketheinputbacktracktowhereitwaspriortoparsingS0,wedonotneedtouseapredicate.We
canalsoturnanyuncaughtfailuresintoerrors.
Fig. 5. LL(∗)lookahead DFA for rule S.
S←S0/1ID/2ID‘=’ Exp/3‘unsigned’ ‘∗’ ‘int’ID/4‘unsigned’ ‘∗’ID ID/⇑error
S0← IDS1/‘unsigned’ S2/‘int’⇑3 S1←‘=’⇑2/!. ⇑1/ ID⇑4
S2←‘unsigned’ S2/ ID⇑4/‘int’⇑3
Fig. 6. PEG with labels that simulates the LL(∗)algorithm. 7. Conclusions
Inthispaper,wediscussederrorreportingstrategiesforParsingExpressionGrammars.PEGsbehavebadlyonthe pres-enceofsyntaxerrors,becausebacktrackingoftenmakesthePEGlosetrackofthepositionwheretheerrorhappened.This limitationwasalreadyknownbyFord,andhetriedtofixitinhisPEGimplementationbyhavingtheimplementationtrack thefarthestpositionintheinputwhereafailurehashappened[3].
Wetook Ford’sfailuretrackingheuristicandshowedthatitisnot necessarytomodifyaPEGimplementationto track failurepositionsaslongastheimplementationhasmechanismstoexecutesemanticactions,andthecurrentparsing posi-tionisexposedtotheseactions.Inaddition,wealsoshowedhowitiseasytoextendthesemanticsofPEGstoincorporate failuretracking,includinginformationthatcanindicatewhatthePEGwasexpectingwhenthefailurehappened.
Trackingthefarthestfailureposition,eitherby changingthePEGimplementation,usingsemanticactions,orredefining thesemanticsofPEGs,helpsPEGparsersproduceerrormessagesthatareclosetoerrormessagesthatpredictivetop-down parsersareabletoproduce,butthesearegenericerrormessages,sometimeswithalonglistofexpectedtokens.
As a way of generating more specific error messages, we introduced a mechanism of labeled failures to PEGs. This mechanismcloselyresemblesstandardexceptionhandlinginprogramminglanguages.Insteadofasinglekindoffailure,we introducedathrow operator
⇑
lthatcanthrowdifferentkindsoffailures,identifiedbytheirlabels,andextendedtheorderedchoiceoperatortospecifythesetoflabelsthatitcatches.Theimplementationoftheseextensionsinparsergeneratortools basedonPEGsisstraightforward.
We showedhowlabeled failurescan be usedasa wayto annotateerror points ina grammar,andtie themto more meaningfulerrormessages.Labeledfailuresareorthogonaltothefailure trackingapproachwe discussedearlier,so gram-marscanbeannotatedincrementally,atthepointswherebettererrormessagesarejudgednecessary.
We alsoshowedthat the labeled failuresapproach can expressseveraltechniquesfor errorreportingused inparsers basedondeterministicparsercombinators,aspresentedinrelatedwork[15,18,17,19].Labeledfailurescanalsobeusedas awayofencodingthedecisionsmadebyapredictivetop-downparser,aslongasthedecisionprocedurecanbeencoded asaPEG,andshowedanexampleofhowtoencodean LL
(∗)
grammarinthisway.Annotatingagrammarwithlabeledfailuresdemandscare:ifwemistakenlyannotateexpressionsthatshouldbeableto fail,thismodifiesthebehavioroftheparserbeyonderrorreporting.Inanycase,theuseoflabeledPEGsforerrorreporting introduces anannotationburdenthat islesserthan theannotationburdenintroducedby errorproductionsinLRparsers, whichalsodemandcare,astheirintroductionusuallyleadtoreduce–reduce conflicts[28].
Weshowedtheerrorreportingstrategiesinthecontextofa smallgrammarfora toylanguage,andwealsodiscussed the implementationof parsers forthe Typed Lua language, an extension of theLua language, basedon thesestrategies.
Moreover, wealsoimplementedparsersforotherlanguages,suchasCéu[29],basedonthesesapproaches, improvingthe qualityoferrorreportingeitherwithgenericerrormessagesorwithmorespecificerrormessages.
References
[1]A.V.Aho,M.S. Lam,R.Sethi,J.D.Ullman,Compilers:Principles,Techniques,andTools,2ndedition,Addison–WesleyLongmanPublishingCo.,Inc., Boston,MA,USA,2006.
[2]B.Ford,Parsingexpressiongrammars:arecognition-basedsyntacticfoundation,in:Proceedingsofthe31stACMSIGPLAN–SIGACTSymposiumon PrinciplesofProgrammingLanguages,POPL’04,ACM,NewYork,NY,USA,2004,pp. 111–122.
[3]B.Ford,Packratparsing:apracticallinear-timealgorithmwithbacktracking,Master’sthesis,MassachusettsInstituteofTechnology,September2002. [4]S.Medeiros,F.Mascarenhas,R.Ierusalimschy,Fromregularexpressionstoparsingexpressiongrammars,in:BrazilianSymposiumonProgramming
Languages,2011.
[5]S.Medeiros,F.Mascarenhas,R.Ierusalimschy,Leftrecursioninparsingexpressiongrammars,in:ProgrammingLanguages:Proceedingsofthe16th BrazilianSymposium,SBLP2012,Natal,Brazil,September23–28,2012,Springer,Berlin,Heidelberg,2012,pp. 27–41.
[6] F. Mascarenhas, S. Medeiros, R. Ierusalimschy, On the relation between context-free grammars and parsing expression grammars, Sci. Comput. Program. 89 (C) (2014) 235–250, http://dx.doi.org/10.1016/j.scico.2014.01.012.
[7]K.C.Louden,CompilerConstruction:PrinciplesandPractice,PWSPublishingCo.,Boston,MA,USA,1997.
[8]D.Grune,C.J.Jacobs,ParsingTechniques:APracticalGuide,2ndedition,SpringerPublishingCompany,Incorporated,2010.
[9]B.Ford,Packrat parsing:simple,powerful,lazy,lineartime,in:ProceedingsoftheSeventhACMSIGPLANInternationalConferenceonFunctional Programming,ICFP’02,ACM,NewYork,NY,USA,2002,pp. 36–47.
[10] I. Piumarta, peg/leg — recursive-descent parser generators for C, http://piumarta.com/software/peg/, 2007 [visited on March 2013].
[11] R. Ierusalimschy, LPeg – parsing expression grammars for Lua, http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html, 2008 [visited on March 2013]. [12]R.Ierusalimschy,Atextpattern-matchingtoolbasedonparsingexpressiongrammars,Softw.Pract.Exp.39 (3)(2009)221–258.
[13]K.Mizushima,A.Maeda,Y. Yamaguchi,Packratparserscanhandlepracticalgrammarsinmostlyconstantspace,in:Proceedingsofthe 9thACM SIGPLAN–SIGSOFTWorkshoponProgramAnalysisforSoftwareToolsandEngineering,PASTE’10,ACM,NewYork,NY,USA,2010,pp. 29–36. [14]R.Grimm,Betterextensibilitythroughmodularsyntax,in:Proceedingsofthe2006ACMSIGPLANConferenceonProgrammingLanguageDesignand
Implementation,PLDI’06,ACM,NewYork,NY,USA,2006,pp. 38–51.
[15]G.Hutton,Higher-orderfunctionsforparsing,J.Funct.Program.2 (3)(1992)323–343.
[16] M. Spivey, When maybe is not good enough, J. Funct. Program. 22 (2012) 747–756, http://dx.doi.org/10.1017/S0956796812000329, http://journals. cambridge.org/article_S0956796812000329.
[17]A.Partridge,D.Wright,Predictiveparsercombinatorsneedfourvaluestoreporterrors,J.Funct.Program.6 (2)(1996)355–364. [18]N.Röjemo,Efficientparsingcombinators,Tech.rep.,DepartmentofComputerScience,ChalmersUniversityofTechnology,1995.
[19]D.Leijen,E.Meijer,Parsec:directstylemonadicparsercombinatorsfortherealworld,Tech.rep.UU-CS-2001-35,DepartmentofComputerScience, UtrechtUniversity,2001.
[20]S.D.Swierstra,L.Duponcheel,Deterministic,error-correctingcombinatorparsers,in:AdvancedFunctionalProgramming,in:LectureNotesinComputer Science,vol. 1129,Springer,1996,pp. 184–207.
[21]S.D.Swierstra,Combinatorparsers:ashorttutorial,in:A.Bove,L.Barbosa,A.Pardo,J.SousaPinto(Eds.),LanguageEngineeringandRigorousSoftware Development,in:LNCS,vol. 5520,Springer-Verlag,2009,pp. 252–300.
[22]A.M.Maidl,F.Mascarenhas,R.Ierusalimschy,TypedLua:anoptionaltypesystemforLua,in:ProceedingsoftheWorkshoponDynamicLanguagesand Applications,Dyla’14,ACM,NewYork,NY,USA,2014,pp. 3:1–3:10.
[23]R.Ierusalimschy,ProgramminginLua,3rdedition,Lua.Org,2013.
[24] S. Medeiros, LPegLabel – an extension of LPeg that supports labeled failures, https://github.com/sqmedeiros/lpeglabel, 2014 [visited on October 2015]. [25] T. Parr, K. Fisher, Ll(*): the foundation of the ANTLR parser generator, in: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language
Design and Implementation, PLDI ’11, ACM, New York, NY, USA, 2011, pp. 425–436, http://doi.acm.org/10.1145/1993498.1993548. [26]T.Par,TheDefinitiveANTLR4Reference,2ndedition,PragmaticBookshelf,2013.
[27] T. Parr, ANTLR, http://www.antlr.org[visited on March 2014].
[28]C.L.Jeffery,GeneratingLRsyntaxerrormessagesfromexamples,ACMTrans.Program.Lang.Syst.25 (5)(2003)631–640. [29]F.Sant’Anna,Safesystem-levelconcurrencyonresource-constrainednodeswithCÉU,Ph.D.thesis,PUC–Rio,2013.