^2006^

Assignment a3, 2006, CSE3322

Also see
Hints and feedback.
NB. The production
rules use '::='
in place of '->'.
Purposes: (i) Practice with scanners and parsers, and (ii) translation.
 
Assignment a3 is to implement a small calculator language, BIL, for BigInts. BIL has big integers, variables (identifiers), and `if', `while', `read', `print' and compound (`begin' ... `end') imperative commands. Write a parser for BIL, and a translator which will generate an SML program equivalent to the source-code of a valid BIL program.
 
Use the [sample-solution] to [a1] to manipulate any BigInts. Do not copy the source code of "a1.sml" into your "a3.sml". Include it, wherever a1.sml is needed, in this way:
use "../a1.sml";
i.e. place the a1-solution in the directory above the one containing your file "a3.sml". [Submit] directory a3 containing "a3.sml"; do not submit "a1.sml"!
 
Q1. Lexical analysis: Define a suitable datatype for the tokens to be returned by the lexer. Use ML-Lex to implement a suitable lexical scanner for BIL. (There are examples from the notes [implementation-2] in the [Code/] directory.)
 
Q2. Write an LL(1) grammar for BIL (There is a non-LL(1) grammar below.) Include your LL(1) grammar as a comment in "a3.sml" immediately after the submission declaration. [Correction, see [*] in Hints.]
 
Q3. In SML-NJ, implement a recursive descent parser for BIL. The parser must signal the first syntax error that it finds, and it may stop parsing at that point.
 
BIL's Concrete Syntax:
Start symbol is <Cmd>   --i.e. a BIL program is a Cmd   --which can be a begin ... end, of course
 
<Cmd> ::= <Identifier> := <Exp> | if <Exp> then <Cmd> | if <Exp> then <Cmd> else <Cmd> | while <Exp> do <Cmd> | begin <Cmds> end | read <Identifier> | print "<printable_Chars>" | print <Exp>
  --note dangling-else possibility   --use base 10 to print BigInts,
 
<Cmds> ::= <Cmd>; <Cmds> | <Cmd>   --seq. of 1 or more <Cmd> sep. by ;
 
<ArithExp> ::= <ArithExp> <PM> <Term> | <Term>
<Term> ::= <Term> * <Factor> | <Factor>
<Factor> ::= - <Factor> | ( <Exp> ) | <Identifier> | <BigInt>
 
<PM> ::= + | -
<BigInt> ::= as in [a1] but without any leading+|-.
 
<RelExp> ::= <ArithExp> <Comparison> <ArithExp> | <ArithExp>
<Comparison> ::= = | <> | < | <= | > | >=
 
<Exp> ::= <Exp> or <Conj> | <Conj>
<Conj> ::= <Conj> and <Bterm> | <Bterm>
<Bterm> ::= not <Bterm> | <RelExp>
 
Hence, operator binding:
unary -   -- tightest
*
+, binary -
= | <> | < | <= | > | >=
not
and
or   -- least tight
 
Note that BIL's abstract syntax (~datatype for parse trees) can be much simpler than the concrete syntax.
 
Variables. All variables have global lifetime and scope throughout a BIL program regardless of where they first appear in it.
 
Types, BIL has one type of value: BigInt.
Zero is used for `false'.
All non-zero values are taken to be `true'.
The result of, e.g., `7=4+3' is `1' (true).
Every new variable is initialised to zero by default.
 
Q4. Implement a translator from BIL to SML.
Translation, prog.bil -----[your a3.sml]-----> obj.sml :
sml
...
use "a3.sml";
...
translate();
type file-name of a BIL program, e.g., p.bil:
prog.bil
prog.bil is syntactically valid
creating obj.sml
use "obj.sml";
...
Always write the same file, obj.sml
Note that obj.sml may contain updatable variables (ref, :=, !) although they are not necessary. (They are very strongly discouraged in your program, a3.sml.)
e.g.,
BIL   SML, possible translation
123456789[16]-----> ... (fromString "123456789[16]")
x := e-----> x := ... e
e+e'-----> ... (plus (... e) (... e'))
e*e' -----> ... (times (... e) (... e'))
print e-----> ... (print(... (toString 10 (... e))))
where `...' indicates more code and/or function call(s) might be necessary.
 
Submit a directory called a3.
In directory a3 there must be exactly these files:
  1. a3.sml   --your parser and translator (it must load your lexer), your LL(1) grammar must be included after the declaration as a comment;
  2. bil-lex.lex   --your lexer specification;
  3. bil-lex.lex.sml   --the output of ML-LEX;
  4. demo.bil   --an interesting example of a BIL program;
  5. obj.sml   --optionally obj.sml from translating demo.bil;
  6. nothing else at all (certainly not a1.sml).
Assignment = a3.
Directory = a3.
Due: 4pm, Friday, 6 Oct.
The standard [submission declaration] must be included at the top of each program file (demo.bil and obj.sml excepted) otherwise the mark will be zero.
There are four parts, Q1 to Q4.
Max: 10.
 
Late submissions Oct.7-13, as per usual penalty, assessment=a3late (directory=a3, program=a3.sml still). If you submitted on time we will not mark a late submission.

Hints and Feedback

Note that BIL's abstract syntax (~datatype for parse trees) can be much simpler than the concrete syntax.
 
Pay careful attention to the [submission instructions]. This time you are submitting a directory with specific contents.   [--3/9/2006].
 
It may be a good idea to let the lexer accept tokens that are "close" to being BigInts but are not quite BigInts, e.g., "1A[45]"; the error (base too big) can be detected by later semantic processing. It is a matter of taste, but doing this often results in better error messages and better error recovery.   [--8/9/2006].
 
The [implementation slides] (and also [Code/]) contain examples on ML-Lex (p.45) and parsing in ML (p.74).
 
Note that abc must be an identifier (variable) in BIL and that ABC... must be the start of a BigInt, e.g., ABC[16].   [--11/9/2006].
 
NB.
`::=' is equivalent to `->' in production rules -- alternative meta-language notation (`::=', `->', `can be replaced by').
The BIL assignment symbol is `:='.
The BIL equality operator is `='.
 
Unless I am mistaken, the following BIL program calculates m2n=m**n fairly efficiently.
begin
  ... set, or read, m and n ...;
  nn  := n;
  m2n := 1;
  while nn > 0 do
    begin
      p := 1;
      m2p := m;
      while 2*p <= nn do
        begin
          m2p := m2p * m2p;
          p := 2 * p
        end;
      m2n := m2n * m2p;
      nn := nn - p
    end;
  print m2n
end
... just waiting on a BIL compiler to be certain. (E.g., if n=11, m11=m8*m2*m1. Also see [Mersenne primes] for some big (2n-1) numbers. For example, it seems that M4253 has 1,281 digits.)
 
Marking: The parts are worth approximately 2, 2, 2, 3 respectively with another 1 for overall quality. It is not possible to get >4 on the first two parts alone.
 
[*] NB. ... | if E then C | if E then C else C | ... is not LL(1); just factor the common prefix as, if E then C Elsepart, where ElsePart -> else Cmd | empty, and leave the grammar ambiguous, to be resolved by attaching the else to the most recent then, as is usually done. (You'll be struggling to make it LL(1) otherwise: It can't be done!-() Note that the disambiguating transformation (Impl. p.62), does remove the syntactic ambiguity, but does not help with LL(1).

B. Computer Science, B. Software Engineering, BSc (Computer Science), & related double degrees.
L. Allison, Faculty of Information Technology (Clayton School of IT), ('05 was School of Computer Science and Software Engineering), Monash University, Australia 3800.
Created with "vi (Linux & Solaris)",  charset=iso-8859-1,  www.csse.monash.edu.au/courseware/cse3322/2006/A3/index.shtml