Introduction

 

Introduction

This Lambda is a compiler compiler tool in the same general class of tools as yacc; however, rather than support an implementation-independent language specification, the parseLib tool is designed to support an implementation based specification allowing complete construction of a new compiler from start to finish. This tool supports not one, but three separate rule specification sections: lexical rules, syntax rules, and semantic rules.

The lexical rules are specified in a BNF style syntax but with extensions for argument passing between rules and other implementation relevant extensions. The syntax rule definition language uses a feature based grammar specification with extensions for argument passing between rules and other implementation relevant extensions. The semantic rule definition language uses a unification style pattern matching specification supporting forward production substitution of matched patterns for use in code optimization, semantic reduction, etc. This Lambda is capable of supporting context sensitive compilers, such as those required to parse natural languages. It is also possible, but not required, to build a chart parser with this tool.

This Lambda converts a compiler definition into a generated Lambda, which is a compiler or translator for the language specified in the compiler definition. The compiler definition language may be described as a "specification language" with which to describe context sensitive compilers, and is capable of describing, not only the syntax of the language to be compiled, but also the lexical rules, semantic rules, code optimization rules, and code generation required from the compiler.

%Definition

The parseLib tool is designed to read the specification file and generate the entire source code for the finished new compiler Lambda. This work is accomplished in cooperation with the browseLib tool managing the central file cabinet where all Lambdas are stored. The %DEFINITION file is the specification file for the new compiler. For instance, we might wish parseLib to generate a new compiler named newCompiler. The specification file would be stored (using browseLib) in the Lambda file cabinet under the name ??newCompiler:%DEFINITION??. The specification file (containing the Compiler Definition Language) is broken into seven of sections and looks, in general, as follows:

;#text#
          ; ****************************************************
          ; Complete specification for the newCompiler compiler.
          ; ****************************************************
          #LexicalFeatures#
          #end#
          #LexicalRules#
          #end#
          #SyntaxFeatures#
          #end#
          #SyntaxRules#
          #end#
          #SemanticRules#
          #end#
          #UserFunctions#
          #end#

The generation of the newCompiler Lambda from its specification file is accomplished with the following simple invocation.

(parseLib cabinetName ??newCompiler??)

The parseLib tool will automatically read the ??newCompiler:%DEFINITION?? file and will generate the complete source code for the new compiler in the Lambda file cabinet, starting with the source code for the ??newCompiler?? and also generating any child Lambdas required (i.e. ??newCompiler:childOne??, ??newCompiler:childTwo??, etc.).

The generated compiler will normally support three passes: lexical, syntax, and semantic passes. If any of the features and rules sections are omitted from the specification, then the effected passes are omitted. For instance, a specification with no syntax and no semantic rules would produce a single pass lexical compiler, while a specification with only semantic rules would produce a pattern substituting forward production Lambda.

;#text# Header

The parseLib compiler specification file may optionally begin with the following header:

;#text#

If present, the text header must occupy the first positions in the compiler specification file. The header tells the browseLib not to compile the specification file.

#parseLib# Header

The parseLib compiler specification file may optionally begin with the following header:

#parseLib#

If present, this header must occupy the first twelve positions in the compiler specification file. The header tells the Lisp parser to send the specification file directly to the parseLib for compilation.

Comments

The compiler definition language supports comments anywhere in the compiler definition as follows:

; This is a comment

The comment begins with a semicolon and halts at the then of line. This is identical to the way comments in Lisp function.

User Defined Functions

The compiler definition language (CDL) supports user defined functions in the compiler definition as follows:

#UserFunctions#
;; Constant Folding Function
(defchild theparent:foldConstants(op x y)
vars:(f)
(setq f (getGlobalValue (symbol op)))
(f x y)) ; end foldConstants
#End#

The compiler definition language allows any number of valid Lisp lambda definitions between the #UserFunctions# header and the #End# terminator. User defined functions may define and initialize persistent variables for context sensitive operations.

Note: All user defined function names should begin with a lowercase character and should contain only alphanumeric characters.

The parseLib supports user defined functions in the generated compiler. Each user defined function is a child Lambda of the resulting compiler. Persistent variables for the generated compiler may be allocated in any child Lambda. Simple computations may be performed in user defined functions, as in the following example:

;; User Defined Functions
(defchild theparent:userFunction1(n)
(* n n)) ; end userRule1
;; User Defined Functions
(defriend theparent:userFunction2(n)
(* n n)) ; end userRule1

initRule

A special user defined function, known as initRule, will be executed once during the initialization of the compiler, as in the following example:

;; User Defined Initial Function.
(defchild theparent:initRule()
pvars:(factBase)
(setq factBase Repository[oldFacts:])
factBase) ; end initRule

Here the initRule initializes a persistence knowledge base of facts to be used to hold information learned during previous compilations. Any form of initialization may take place in the user defined initRule.

preLexRule

A special user defined function, known as preLexRule, will be executed once prior to the lexical analysis phase of the compiler, as in the following example:

;; User Defined pre lexical Function.
(defchild theparent:preLexRule(input)
vars:(output)
(setq output (browseLib.checkout input))
output) ; end preLexRule

Here the preLexRule treats the input to the compiler as a key to checking out the real source, to be compiled, from the browseLib. Any form of input substitution may take place in the user defined preLexRule.

startRule

A special user defined function, known as startRule, will be executed before the start of each compilation, as in the following example:

;; User Defined Start Function.
(defchild theparent:startRule()
pvars:(varTree)
(setq varTree (new Dictionary:))
varTree) ; end startRule

Here the startRule initializes a persistence Dictionary to be used to hold the variable tree. Any form of initialization may take place in the user defined startRule.

outputRule

The parseLib requires a user defined output rule to construct the final output from the generated compiler. The output rule is a user defined child Lambda, of the resulting compiler, which accepts one argument and always returns the expected output from the compiler, which must be a faithful compilation or translation of the original ascii input source, as in the following example:

;; Output Rule Function.
(defchild theparent:outputRule(result)
vars:(ret)
(setq ret (compile (morph result.Value)))
ret) ; end outputRule

If the output rule is not specified, the generated compiler will output the boolean false for every compilation.

_errorHandler

The _errorHandler rule is a user defined child Lambda, of the resulting compiler, which accepts any error messages which may occur during compilation. The _errorHandler rule expects one argument (which is a string containing the error message received.

If the _errorHandler rule is not specified, the generated compiler will pass thru all error messages which are encountered.

Left Recursion

All parseLib lexical, syntax, and semantic rules allow left recursive rule expressions. All left recursive rules are automatically converted into right recursive form Converting a set of rules from left to right recursion is a process which can best be described by the following examples.

Example 1

EXPR: EXPR Operator EXPR :: ... ::
EXPR: ( EXPR ) :: ... ::
EXPR: Number :: ... ::
EXPR: Name :: ... ::

The EXPR rule is converted into two rules EXPR and LEXPR in right recursion form:

EXPR: LEXPR Operator EXPR :: ... ::
EXPR: LEXPR :: $1 ::
LEXPR: ( EXPR ) :: ... ::
LEXPR: Number :: ... ::
LEXPR: Name :: ... ::

Example 2

EXPR: user ordering :: true ::
EXPR: ( EXPR ) :: ... ::
EXPR: Number :: ... ::
EXPR: EXPR Operator EXPR :: ... ::
EXPR: Name :: ...

The EXPR rule is converted into two rules EXPR and LEXPR in right recursion form:

EXPR: user ordering :: true :: EXPR: LEXPR Operator EXPR :: ... :: EXPR: LEXPR :: $1 ::

 

LEXPR: user ordering :: true :: LEXPR: ( EXPR ) :: ... :: LEXPR: Number :: ... :: LEXPR: Name :: ... ::

References

The technical reference list for this chapter is as follows:

  1. Aho, Sethi, Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.
  2. Gazdar, Mellish, Natural Language Processing in Lisp, Addison-Wesley, 1989.
  3. Cabral, Korns, Analytic Information Server Reference Guide, Version 2.0, Korns Associates, 1997.
  4. Korns, Analytic Information Server rulesLib, Version 2.0, Korns Associates, 1997.