parseLib Semantic Rules

 

Overview

This section contains initial working notes on the design of the semantic rules and features definition language. The semantic definition sublanguage is a combination of the lexical analysis ideas in [1.3.3] and the feature based grammar ideas in [2.7].

Semantic Rules

The parseLib supports compiled forward production rules similar to the rulesLib (see the rulesLib documentation for an explanation of forward production list morphing) The similar rules syntax is used for semantic rules as for recognition rules. The rule names for each semantic pass are defined as follows:

#SemanticPasses#

OPTIMIZATION true

SYNTAX false

#End#

Each semantic rule name is defined as single pass (false) or multiple pass (true). Multiple rule passes operate in the same manner as with rulesLib. Each semantic rule name assumes it is receiving a directed acyclic graph expressed as a List. This DAG may contain nodes in direct mode or feature based nodes. With feature based nodes, the features may refer forward and backward (creating a general chart or graph) even though the List must be a DAG.

Semantic Rule Definitions

Reference [1.3.3] uses syntax definition from regular definitions of the form:

d1 : r1 || c1 || :: a1 ::

d2 : r2 || c2 || :: a2 ::

. . .

dn : rn || cn || :: a1 ::

Where each di is a rule name, each ri is a rule expression, each ci is a Lisp conditional expression, and each ai is a Lisp action expression. The syntax for rule definition is as follows:

#SyntaxRules#

REXPR: + Term || (= $2.Term true) || :: $2 ::

REXPR: - Term || (= $2.Number true) || :: (setq $2.Value (- 0 $2.Value))

REXPR=: - Term :: (setq $2.Value (list |-|: 0 $2.Value))

REXPR=: Term :: $1

REXPR=: REXPR RelationalOperator REXPR:: (setq $0.Value (list $2.Lisp $1 $3))

REXPR=: LeftParen REXPR RightParen :: $2

 

SEXPR: + Term || (= $2.Term true) || :: $2 ::

SEXPR: - Term || (= $2.Number true) || :: (setq $2.Value (- 0 $2.Value))

SEXPR: - Term :: (setq $2.Value (list |-|: 0 $2.Value))

SEXPR: Term :: $1

SEXPR: SEXPR Operator SEXPR:: (setq $0.Value (list $2.Lisp $1 $3))

SEXPR: LeftParen SEXPR RightParen :: $2

 

#End#

The Lisp condition rule is optional. If present, it must be enclosed by the || symbol. The Lisp action rule is mandatory. It must be enclosed by the :: symbol. The rule variable $0 is the default structure initialized by the rule. The $0 variable always has the attribute of the named rule set to true. The rule variables $1 through $9 correspond to the respective token expressions in the rule body.

Note1: All rule names must contain only uppercase characters and must contain no non uppercase characters, numerals, or underscores.

Note2: The $ symbol must not be used in an argument phrase, action, or condition rule anywhere except as a rule variable identifier $0 through $9. If the condition or action rule requires a $ symbol, for instance inside a string constant, place the $ symbol in a user defined function which is called by the argument phrase, action, or condition rule.

BNF Notation

Syntax rule names, syntax feature names, but not constants, may have trailing BNF operators of "*" or "+" or "?". For example:

SEQUENCE: Number+ :: (setq $0.Value $1) ::

Any syntax rule name and any syntax feature name (other than the special Eof and Nop features) may have trailing BNF operators. The user is required to make sure that the resulting rule does not cause the new compiler to loop endlessly on the input string. The BNF operators have the following meanings:

??        The "*" operator signifies none or more (may cause endless looping if specified inappropriately).

??        The "+" operator signifies one or more.

??        The "?" operator signifies none or one.

Note1: For syntax features, the BNF operators a vector of each repetition result.

Note2: For syntax rules, the BNF operators return a vector of each repetition result.

$this

The $this variable contains the current input token, at each invocation of a semantic rule with the + and * BNF command modifiers, during semantic analysis. The $this variable can be used in connection with user defined condition rules, for example:

MAIN| Any{(isNumber $this)}*

Argument Passing

User defined rules may be passed arguments. A Lisp argument phrase, enclosed with the ( ) symbol pair, will cause the user defined rule to receive the specified argument. Within a user defined rule definition, the %0 thru %9 variables represent any arguments which may have been passed to the rule as follows:

QUALIFY: DotOperator Name QUALIFY( (setq $0.Value (append |ref|: %0.Value $2.Value)) )

:: $3.Value ::

QUALIFY: DotOperator Name :: (setq $.0.Value (append |ref|: %0.Value $2.Value)) ::

TERM: Name QUALIFY($1) :: $2 ::

TERM: Name :: $1 ::

The TERM rule will recognize all syntax of the form Name.Name.Name ... The rule returns when a Dot Operator no longer qualifies the name. The result is a structure with the attribute TERM = true, and the Value attribute containing the complete expression already reformed into a nested ref notation list.

Note: The % symbol must not be used in an argument phrase, action, or condition rule anywhere except as a rule variable identifier %0 through %9. If the argument phrase, action, or condition rule requires a % symbol, for instance inside a string constant, place the % symbol in a user defined function which is called by the argument phrase, action, or condition rule.

Iterative Rules

User defined rules may be repeated interatively. A Lisp action rule, enclosed with the << >> symbol pair, will cause the user defined rule to repeat. The contents of the $0 variable remain intact. The builtin Eof attribute name allows a rule to test for End Of File in the following rule:

SEXPR: Term Operator Term

<< (setq $.0.Value (appendList $0.Value (list (list $2.Value $1.Value $3.Value)))) >>

SEXPR: Operator Term << (setq $.0.Value (list $1.Value $0.Value $2.Value)) >>

SEXPR: Eof :: $0 ::

The SEXPR rule will recognize all syntax of the form Term Operator Term Operator Term Operator ... The rule returns when the End Of File is reached. The result is a structure with the attribute SEXPR = true, and the Value attribute containing the complete expression already reformed into a prefix notation list.

Note: The $n symbol contains the repetition count for the rule. During the first iteration through the rule, the $n variable is set to 1.

Term Conditions

User defined rules may be also have user defined conditions attached. A Lisp condition phrase, enclosed with the { } symbol pair, will cause the user defined rule to receive the specified condition. Within a user defined rule condition, the %0 thru %9 variables represent any arguments which may have been passed to the rule, while the $0 thru $9 variables represent any terms which may have been recognized by the rule.

STRING: Quote{(= $n 1)} << true >>

STRING: Any << (setq $0.Value (appendList $0.Value $1.Value) >>

STRING: Quote{(> $n 1)} :: $0 ::

The STRING rule will recognize all syntax tokens inclosed within two quotes The rule returns only when the second quote is recognized. User defined rules may have both argument passing and user defined conditions attached. The suer defined condition is always last, as follows.

TERM: NAME(%0){(= $1.Term true)} :: $1 ::

Note: The % symbol must not be used in an argument phrase, action, or condition rule anywhere except as a rule variable identifier %0 through %9. If the argument phrase, action, or condition rule requires a % symbol, for instance inside a string constant, place the % symbol in a user defined function which is called by the argument phrase, action, or condition rule.

Special Rule Syntax

Any

In direct mode, if a rule is to accept any token, use the Any attribute. This special test works because Any tests the token directly and does not assume that it is feature based. For example:

;; This rule recognizes a plus sign between anything

RULE: Any Any{(= $2 "+")} Any :: (setq $0.Value (list '+ $1 $3)) ::

Eof

If a rule is to test for end of file, use the Eof attribute. For example:

;; This rule recognizes an end of file condition

RULE: Eof :: $0 ::

Nop

The special Nop attribute always returns a constant token of #void. The Nop attribute is designed to provide a test which always is true, but does not promote the token pointer (i.e. a no-operation rule).

MAIN: STATEMENT Semicolon << (setq $.0.Value (appendList $0.Value $1.Value)) >>

MAIN: Eof :: $LIST ::

MAIN: Nop :: (error "If we get here we have an invalid token") ::

This sample MAIN rule will recognize all syntax of the form statement; statement; statement; ... However, if the MAIN rule encounters anything else (other than statement;), then an error message will be returned. The rule returns when the End Of File is reached, or an error is generated.

Note: If the Nop test is used, user ordering of the specified rule is almost always required.

$N

We may use a previously recognized parser variable to indicate a test for equality. For example:

;; These two rules are equivalent

RULE: Any $1 :: $0 ::

RULE: Any Any{(= $2 $1)} :: $0 ::

$X

We may use a user named parser variable to indicate a test for equality. For example:

;; These three rules are equivalent

RULE: Any $1 :: $0 ::

RULE: $X $X :: $0 ::

RULE: Any Any{(= $2 $1)} :: $0 ::

%N

We may use a previously passed parser argument variable to indicate a test for equality. For example:

;; These two rules are equivalent

RULE: Any %0 :: $0 ::

RULE: Any Any{(= $2 %0)} :: $0 ::

Direct Mode

Using the vertical bar symbol (|) after a rule name indicates direct mode. In direct mode the rule does not assume that the tokens are feature based. All constants result in Any tests. For example:

;; These two rules are equivalent

RULE| Any + Any :: (setq $0.Value (list '+ $1 $3)) ::

RULE: Any + Any :: (setq $0.Value (list '+ $1.Value $3.Value)) ::

[ & ]

During the semantic rule passes, we may test for a sublist, including pushing the recognition pointer down into the sublist, with the special [ test. To test of the sublist close and to pop the recognition pointer back out of the sublist, use the ] test. This works, in both feature based and direct mode. For example:

;; This semantic rule recognizes an expression containing a sublist

;; like the following: x (+ x x ), and returns this substitution (* 3 x)

RULE| Any [ + $1 $1 ] Eof :: (list |*|: 3 $1) ::

Append List Function

The builtin appendList function allows multiple arguments to be append together to form a list as follows:

(define X '(5 6 7))

(define Y '(10 20))

(appendList '+ X Y) ==> (+ 5 6 7 10 20)

Rule Precedence

This Lambda supports multiple rule definitions up to the limits of available memory. The rule precedence, within a rule, is determined by the parseLib, to maximize search speed, and is unpredictable by the user. However any RULE may have automatic rule ordering turned off by specifying the following special rule statement as the first statement of the rule.

RULENAME: user ordering :: true ::

If automatic rule ordering is turned off, the parseLib will attempt to use the rule ordering supplied by the programmer in the DEFINITIONS file. If at all possible, the rule ordering, specified by the developer, will be closely followed by the parseLib in generating the compiler code.

Note: If the Eof and/or Nop special feature test are used in a rule, user ordering is advisable, because these tend to make it hard for parseLib to gues the correct ordering on its own.