A Simple ANTLR Parser

Here is the grammar of the language in the lecture notes, as we have developed it in class or close (you can download the file here):

// A simple syntax-directed translator for a simple language

grammar MyLanguageV0NoCode;

// Root non-terminal symbol
// A program is a bunch of declarations followed by a bunch of statements
// The Java code outputs the necessary NASM code around these declarations

program       : 
              declaration*
              statement*
              ;

// Parse rule for variable declarations

declaration   : 
              INT NAME SEMICOLON 
              ;

// Parse rule for statements

statement      : 
               ifstmt 
             | printstmt 
             | assignstmt 
               ;

// Parse rule for if statements

ifstmt      : 
            IF LPAREN identifier EQUAL integer RPAREN
            statement*
            ENDIF
            ;

// Parse rule for print statements

printstmt      : 
               PRINT term SEMICOLON 

// Parse rule for assignment statements

assignstmt      : 
                NAME ASSIGN expression SEMICOLON 
                ;

// Parse rule for expressions

expression      : 
                term
              | 
                term PLUS term 
                ;

// Parse rule for terms

term          : 
              identifier
            | integer 
              ;

// Parse rule for identifiers

identifier   : NAME  ;

// Parse rule for numbers 

integer      : INTEGER  ;

// Reserved Keywords
////////////////////////////////

IF: 'if';
ENDIF: 'endif';
PRINT: 'print';
INT: 'int';

// Operators
PLUS: '+';
EQUAL: '==';
ASSIGN: '=';
NOTEQUAL: '!=';

// Semicolon and parentheses
SEMICOLON: ';';
LPAREN: '(';
RPAREN: ')';

// Integers
INTEGER: [0-9][0-9]*;

// Variable names
NAME: [a-z]+;   

// Ignore all white spaces 
WS: [ \t\r\n]+ -> skip ;

Let us try to run ANTLR on a source file named sourcecode.txt with content:

int a;
int b;
a = 3;
b = a + 1;
if (b == 4) a = 2; endif
if (a == 3)  
    a = a + 1; 
    b = b + 6;
endif 
print a;
print b;

Let us now run ANTLR and out parser:

% java -jar ~/ANTLR/antlr-4.4-complete.jar MyLanguageV0NoCode.g4

% javac  -cp .:antlr-4.4-complete.jar MyLanguageV0NoCode*.java

% java  -cp .:antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig MyLanguageV0NoCode program sourcecode.txt -gui

The -gui option outputs the parse tree graphically (clickable image):

Foo

If the program has a syntax error, our parser will detect it. For instance, for the source code:

int a;
int b;
b = a * 1;

We get this output:

% java  -cp .:antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig MyLanguageV0NoCode program sourcecode.txt  
line 3:6 token recognition error at: '*'
line 3:8 extraneous input '1' expecting ';'

Note that our program only checks grammatical correctness. For instance, the program below should likely be incorrect due to undeclared variables:

int var;
var = foo + 1;

And yet, our parser happily parses it:

This is because the notion of variable declaration is really about semantics, not about syntax, something we will discuss a bit in class.