SDFby Example

Sdf
SDF2 is a rich formalism for the definition of the syntax of all kinds of computer languages. This page explores the possibilities of the formalism by means of a number of fragments of syntax definitions from the SDF2 GrammarBase. In particular, the language is contrasted with traditional formalisms that use a separate scanner to deal with lexical syntax.

(When editing this page, uncheck the ConvertSpacesToTabs? checkbox to avoid spoiling the layout of the examples.)

Context-free Syntax Definition

expression grammar

Disambiguating Expressions

Lexical Syntax Definition

idenitifiers, layout

Regular Expressions

Overloading Delimiters

BibTeX? is a language for the describtion of bibliographical information such as articles and books. For example, the following entry describes a PhD? thesis.

 @PhdThesis{Vis97.thesis,
   author = {Visser, Eelco},
   title  = {Syntax Definition for Language Prototyping},
   year   = {1997},
   month  = {September},
   school = {University of Amsterdam},
   URL    = { http://www.cs.uu.nl/~visser/thesis/ }
 }

In the syntax of BibTeX? entries the symbol { has two meanings: (1) indicating the start of the list of fields and (2) indicating the start of a field body. The second kind of use requires a lexical treatment since the body of a field consists of an arbitrary list of characters until the closing } is found. In an approach where scanner and parser are separated it is not possible for the scanner to know which kind of { is encountered. Furthermore, a field body can contain nested occurences of { and }, which should only occur in matching pairs. In

Finally, the treatment of whitespace is different between entries, between the fields of an entry and inside the body of a field.

  context-free syntax
    C {Entry C}* C                  -> Entries 
    "@" EName "{" Key "," 
                  {Field ","}* ","? 
              "}"                   -> Entry 
    Name "=" Value                  -> Field
    "{" ValWords "}"                -> Value
    (ValWord | ("{" ValWords "}"))* -> ValWords
  lexical syntax
    ~[\{\}\ \t\n]+  -> ValWord
  lexical restrictions
    ValWord   -/- ~[\{\}\ \t\n]

The complete syntax definition for BibTeX? (that also treats double quotes in field bodies correctly) can be found at

http://www.cwi.nl/~mdejonge/grammar-base/bibtex.0/index.html

Solving Lexical Ambiguities

Longest Match

follow restrictions

Reserved Words

in a normal scanner generator like LEX

when combining languages we want to have separate sets of reserved words;

a COBOL reserved word should not be used as a COBOL identifier, but might be quite usable as a SQL identifier

Ignoring Whitespace in Lexicals

In Fortran whitespace inside lexicals is not significant. This can be accomodated in Trash.SDFII by using context-free syntax to define lexicals.

Dividing a Syntax Definition into Modules

reuse of pieces of syntax

renaming

Combining Languages

COBOL is a language for manipulating business information represented by means of lists of records. COBOL programs are often mixed with fragments from other languages. For example, SQL queries can be embedded to access a database and CICS programs are used for process control. It is desirable to describe the syntax of each of the language separately and combine these descriptions as needed.

In a traditional syntax definition formalism this is not possible: (1) The grammars restrictions such LL or LALR on which the context-free syntax is based are not closed under composition. (2) the regular grammars on which the definition of the lexical syntax are based are not closed under composition either.

In practice, this translates to the following: A scanner does not consider the context in determining the sort of a token. Therefore, normal scanners cannot deal with

(LEX provides a workaround by means of modes.)

In Trash.SDFII the syntax of the composing languages can be described in separate modules and combined at will. For example, consider the following fragments from a syntax definition for COBOL. (Note that the actual combined syntax definition for COBOL, CICS and SQL combined consists of 1600 LOC divided into 38 modules.)

Module ID defines the syntax of identifiers. The 

 module ID
   lexical syntax
     [0-9]* [A-Z]                                          -> Lex-Id
     [0-9]* [A-Z] [A-Za-z0-9\-]* [A-Za-z0-9]               -> Lex-Id
     [0-9]+ [\-] [0-9\-]* [A-Z] [A-Za-z0-9\-]* [A-Za-z0-9] -> Lex-Id
   context-free syntax
     Lex-Id -> Id 
   lexical restrictions
     Lex-Id -/- [A-Za-z0-9\-]

Module COBOL defines the syntax of COBOL programs. The actual syntax definition for cobol consists of 36 modules. Here only the productions relevant for the example are shown. Note that the syntax of Picture overlaps with the syntax for Id. This overlap is disambiguated by context.

 module COBOL
   imports ID %% ...
   lexical syntax
     [0-9XxAa\(\)pZzVvSszBCRD\/\,\$\+\-\*\:]+ -> Picture
   context-free syntax
     Ident-div Env-div Data-div Proc-div            -> Program
     "DATA" "DIVISION" "." File-sec Ws-sec Link-sec -> Data-div
     "FILE" "SECTION" "." File-desc*                -> File-sec
     "FD" Id Fd-item* "." Data-desc*                -> File-desc
     Dd-header Dd-body*                             -> Data-desc

Module SQL defines the syntax for SQL queries. Queries are embedded into COBOL programs by means of the keywords EXEC SQL ... END-EXEC.

 module SQL
   lexical syntax
     [A-Z0-9\-\_\.\:]+ -> Sql-id
   context-free syntax
     "SELECT" Distinct Select-list From-into Where Order-by -> Select
     Select                                                 -> Sql-item
     "EXEC" "SQL" Sql-item+ "END-EXEC" "."                  -> Data-desc
     "EXEC" "SQL" Sql-item+ "END-EXEC"                      -> Stat

Module CICS defines the syntax of CICS commands and their embedding in COBOL programs. Note that a command can have a reference to an A-exp, which is a COBOL expression.

 module CICS
   imports PROGRAM
   lexical syntax
     [A-Z]+ -> Cics-kw
   context-free syntax
     Stat* "EXEC" "CICS" Cics-command Cics-opt* "." -> Sentence
     "EXEC" "CICS" Cics-command Cics-opt* "END-EXEC" -> Stat
            
     Cics-kw                  -> Cics-opt
     Cics-kw "(" Cics-arg ")" -> Cics-opt
     A-exp                    -> Cics-arg
     Str                      -> Cics-arg
     "ADDRESS" "OF" A-exp     -> Cics-arg
     "LENGTH" "OF" A-exp      -> Cics-arg
     "ABEND"                  -> Cics-command
     %% etc.