Here is the grammar of the language in the lecture notes, as we have developed it in class or close (you can download the file here):
// A simple syntax-directed translator for a simple language
grammar MyLanguageV0NoCode;
// Root non-terminal symbol
// A program is a bunch of declarations followed by a bunch of statements
// The Java code outputs the necessary NASM code around these declarations
program :
declaration*
statement*
;
// Parse rule for variable declarations
declaration :
INT NAME SEMICOLON
;
// Parse rule for statements
statement :
ifstmt
| printstmt
| assignstmt
;
// Parse rule for if statements
ifstmt :
IF LPAREN identifier EQUAL integer RPAREN
statement*
ENDIF
;
// Parse rule for print statements
printstmt :
PRINT term SEMICOLON
// Parse rule for assignment statements
assignstmt :
NAME ASSIGN expression SEMICOLON
;
// Parse rule for expressions
expression :
term
|
term PLUS term
;
// Parse rule for terms
term :
identifier
| integer
;
// Parse rule for identifiers
identifier : NAME ;
// Parse rule for numbers
integer : INTEGER ;
// Reserved Keywords
////////////////////////////////
IF: 'if';
ENDIF: 'endif';
PRINT: 'print';
INT: 'int';
// Operators
PLUS: '+';
EQUAL: '==';
ASSIGN: '=';
NOTEQUAL: '!=';
// Semicolon and parentheses
SEMICOLON: ';';
LPAREN: '(';
RPAREN: ')';
// Integers
INTEGER: [0-9][0-9]*;
// Variable names
NAME: [a-z]+;
// Ignore all white spaces
WS: [ \t\r\n]+ -> skip ;
Let us try to run ANTLR on a source file named sourcecode.txt with content:
int a;
int b;
a = 3;
b = a + 1;
if (b == 4) a = 2; endif
if (a == 3)
a = a + 1;
b = b + 6;
endif
print a;
print b;
Let us now run ANTLR and out parser:
% java -jar ~/ANTLR/antlr-4.4-complete.jar MyLanguageV0NoCode.g4
% javac -cp .:antlr-4.4-complete.jar MyLanguageV0NoCode*.java
% java -cp .:antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig MyLanguageV0NoCode program sourcecode.txt -gui
The -gui option outputs the parse tree graphically (clickable image):
If the program has a syntax error, our parser will detect it. For instance, for the source code:
int a;
int b;
b = a * 1;
We get this output:
% java -cp .:antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig MyLanguageV0NoCode program sourcecode.txt
line 3:6 token recognition error at: '*'
line 3:8 extraneous input '1' expecting ';'
Note that our program only checks grammatical correctness. For instance, the program below should likely be incorrect due to undeclared variables:
int var;
var = foo + 1;
And yet, our parser happily parses it:
This is because the notion of variable declaration is really about semantics, not about syntax, something we will discuss a bit in class.