parsing - Handling extra operators in Shunting-yard -
given input this: 3+4+
algorithm turns in 3 4 + +
i can find error when it's time execute postfix expression. but, possible spot during conversion?
(wikipedia article , internet articles i've read not handle case)
thank you
valid expressions can validated regular expression, aside parenthesis mismatching. (mismatched parentheses caught shunting-yard algorithm indicated in wikipedia page, i'm ignoring those.)
the regular expression follows:
pre* op post* (inf pre* op post*)*
where:
pre prefix operator or ( post postfix operator or ) inf infix operator op operand (a literal or variable name)
the wikipedia page linked includes function calls not array subscripting; if want handle these cases, then:
pre prefix operator or ( post postfix operator or ) or ] inf infix operator or ( or , op operand, including function names
one thing note in above pre
, inf
in disjoint contexts; is, if pre
valid, inf
not, , vice versa. implies using same symbol both prefix operator , infix operator unambiguous. also, pre
, post
in disjoint contexts, can use same symbol prefix , postfix operators. however, cannot use same symbol postfix , infix operators. examples, consider c/c++ operators:
- prefix or infix + prefix or infix -- prefix or postfix ++ prefix or postfix
i implicitly used feature above allow (
used either expression grouper (effectively prefix) , function call (infix, because comes between function name , argument list; operator "call".)
it's common implement regular expression state machine, because there 2 states:
+-----+ +-----+ |state|-----op---->|state| | 1 |<----inf----| 2 | | |---+ | |---+ +-----+ | +-----+ | ^ pre ^ post | | | | +------+ +------+
we call state 1 "want operand" , state 2 "have operand". simple implementation split shunting yard algorithm presented in wikipedia 2 loops, (if don't goto
, can eliminated, clearest way present state machine).
want_operand: read token. if there no more tokens, announce error. if token prefix operator or '(': mark prefix , push onto operator stack goto want_operand if token operand (identifier or variable): add output queue goto have_operand if token else, announce error , stop. (**) have_operand: read token if there no more tokens: pop operators off stack, adding each 1 output queue. if `(` found on stack, announce error , stop. if token postfix operator: mark postfix , add output queue goto have_operand. if token ')': while top of stack not '(': pop operator off stack , add output queue if stack becomes empty, announce error , stop. if '(' marked infix, add "call" operator output queue (*) pop '(' off top of stack goto have_operand if token ',': while top of stack not '(': pop operator off stack , add output queue if stack becomes empty, announce error goto want_operand if token infix operator: (see wikipeda entry precedence handling) (normally, prefix operators considered have higher precedence infix operators.) got want_operand otherwise, token operand. announce error (*) procedure described above not deal gracefully parameter lists; when postfix expression being evaluated , "call" operator found, it's not clear how many arguments need evaluated. might function names identifiable, evaluator can attach arguments "call" until function name found. cleaner solution "," handler increment argument count of "call" operator (that is, open parenthesis marked "infix"). ")" handler needs increment argument count. (**) state machine presented not correctly handle function calls 0 arguments. call show ")" read in want_operand state , "call" operator on top of stack. if "call" operator marked argument count, above, argument count must 0 when ")" read, , in case, unlike have_operand case, argument count must not incremented.
Comments
Post a Comment