2.3 Parsing

Translating from one language to another always requires some kind of paring.  Whether we translate from English to German or English to SQL, parsing plays one of the most important roles.  “Parsing is the process of identifying structure in data” [10 pg 5].  It determines whether the input data has some pre-determined structure and respond accordingly.

Parsing requires a set of grammars to be defined.  These grammars are set of rules, which define how the language is structured.  Rules are specific pattern of data, which appears in the input.  These rules can further reference to other rules, which are called subrules.  Rules can also be recursive if they somehow refer back to themselves.  The rules are consists of products, which are the different ways in which the rule can be satisfied.  The actual data is referred to as terminals.  Parser may also contain actions [10 pg 20].  These concepts are summarised below.

Figure 3: Terminology used in parser grammar

grammar    :  rule(s)         # grammar is one or more rule

rule       :  production      # there are number of ways
           |  production_2    # the rule can be satisfied
           |  production_3

production :  terminal_x  subrule  terminal_y 

subrule    :  terminal_x {action}

terminal_x :  'x'             # terminals are the actual data

terminal_y :  'y'

action     :  "print 123"     # actions can execute piece of code

Given a set of grammar, there are two ways of implementing a parser: bottom-up parser or top-down parser.

2.3.1 Bottom-up parsers

“Bottom-up parsers are usually implemented as a series of states, encoded in a lookup table” [9 pg 48].  Parser always starts from some pre-defined state and moves to another state (or stays in the same state) depending on the next available token.  The success of the parsing depends upon whether the parsing has ended up in a pre-defined successful state or not.  These ‘successful’ states may have various actions defined to respond to the input.  These bottom-up parsers are also known as table-driven parsers or LR parsers [9 pg 23].

Concepts of bottom up parsing can be explained using a simple parser in the following example.

Example – Figure 4 is a state diagram for a bottom-up parser to check the syntax of an ‘if statement’ in C.

Figure 4: Bottom-up parser for 'if statement'
[Bottom-up parser for 'if statement']

Notes:

  1. Words within quotes are the terminal to be matched for that particular state.

  2. Statements within curly brackets are the action to execute (for this example print these messages) if the matching is unsuccessful.

  3. yes{0,n} means input is successfully matched any number of times.

Depending on the input to the parser, it may ended up in either the successful state or failed state.  I have not shown the state diagram for parsing ‘valid C statement’ as this is quite complex.  This ‘valid C statement’ states itself may refer to this ‘valid if statement’ thus making it a recursive call.

2.3.2 Top-down parsers

Unlike bottom-up parsers, top-down parsers “starts at the highest level and attempt to match the most general rule first” [9 pg 48].  This most general rule then refers to further rules or subrules and the lowest level of these subrules refer to terminals.  Parser moves down the tree like structure until it matches the input and comes back up the tree to the most general rule.  If none of the tree branches (or productions) matched, then the parsing is failed.  Top-down parser are also know as recursive-descent parsers or LL parsers [9 pg 23].

The same 'if statement' grammar can be converted to a top-down parser as below.

Figure 5: Top-down parser grammar for 'if statement'
[Top-down parser grammar for 'if statement']

As shown above in the diagram, these top-down parsers have a tree like structure.  Parser will transverse from left to right through every branch until a successful branch is reached. (i.e. until the parser can successfully match the last rule or terminal of any tree branch.)

Both top-down and bottom-up parsers have their own advantages and disadvantages, which are summarised in the table below.

Table 3: Advantages and disadvantages of top-down and bottom-up parsers [10 pg 26]
  Top-down parsers Bottom-up parsers
Advantages
  • Easy to code
  • Easy to debug
  • Smaller code
  • Can tokenise quickly
  • Faster Handles left recursion
Disadvantages
  • Slow Backtracking and recursion can be expensive.
  • Can not handle left recursion
  • Fixed tokenisation
  • Hard to debug
  • Code can grow very large
  • Executing of actions can be unpredictable.
  • Tail recursion is handled poorly.

Design - Database Functions