A Simple ANTLR Compiler

Here is the compiler we developed in class have developed it in class or close (you can download the file here):

// A simple syntax-directed translator for a simple language

grammar MyLanguageV0Code;

// Root non-terminal symbol
// A program is a bunch of declarations followed by a bunch of statements
// The Java code outputs the necessary NASM code around these declarations

program       : 
              {System.out.println("%include \"asm_io.inc\"");
               System.out.println("segment .data"); }
              declaration*
              {System.out.println("segment .text"); 
               System.out.println("\tglobal asm_main"); 
               System.out.println("asm_main:"); 
               System.out.println("\tenter 0,0"); 
               System.out.println("\tpusha"); }
              statement*
              {System.out.println("\tpopa"); 
               System.out.println("\tmov eax,0"); 
               System.out.println("\tleave"); 
               System.out.println("\tret"); } 
              ;

// Parse rule for variable declarations

declaration   : 
              {int a; }
              INT a=NAME SEMICOLON 
              {System.out.println("\t"+$a.text + "  dd 0");} 
              ;

// Parse rule for statements

statement      : 
               ifstmt 
             | printstmt 
             | assignstmt 
               ;

// Parse rule for if statements

ifstmt      : 
            {int a,b;} 
            {String label;}
            IF LPAREN a=identifier EQUAL b=integer RPAREN
            {System.out.println("cmp dword "+$a.toString+","+$b.toString);
             label = "label_"+Integer.toString($IF.index);
             System.out.println("jnz "+label); }
            statement*
            { System.out.println(label+":"); }
            ENDIF
            ;

// Parse rule for print statements

printstmt      : 
               PRINT term SEMICOLON 
               {System.out.println("\tmov eax, "+$term.toString);
                System.out.println("\tcall print_int");
                System.out.println("\tcall print_nl");} 
               ;

// Parse rule for assignment statements

assignstmt      : 
                {int a; }
                a=NAME ASSIGN expression SEMICOLON 
                {System.out.println("\tmov ["+$a.text+"], eax");} 
                ;

// Parse rule for expressions

expression      : 
                {int a,b; }
                a=term 
                {System.out.println("\tmov eax,"+$a.toString);}
              | 
                a=term PLUS b=term 
                {System.out.println("\tmov eax,"+$a.toString);}
                {System.out.println("\tadd eax,"+$b.toString);}
                ;

// Parse rule for terms

term returns [String toString]  : 
                                identifier {$toString = $identifier.toString;} 
                              | integer {$toString = $integer.toString;} 
                                ;

// Parse rule for identifiers

identifier returns [String toString]: NAME {$toString = "["+$NAME.text+"]";} ;

// Parse rule for numbers 

integer returns [String toString]: INTEGER {$toString = $INTEGER.text;} ;

// Reserved Keywords
////////////////////////////////

IF: 'if';
ENDIF: 'endif';
PRINT: 'print';
INT: 'int';

// Operators
PLUS: '+';
EQUAL: '==';
ASSIGN: '=';
NOTEQUAL: '!=';

// Semicolon and parentheses
SEMICOLON: ';';
LPAREN: '(';
RPAREN: ')';

// Integers
INTEGER: [0-9][0-9]*;

// Variable names
NAME: [a-z]+;   

// Ignore all white spaces 
WS: [ \t\r\n]+ -> skip ;

Let us try to run ANTLR on a source file named sourcecode.txt with content:

int a;
int b;
a = 3;
b = a + 1;
if (b == 4) a = 2; endif
if (a == 3)  
    a = a + 1; 
    b = b + 6;
endif 
print a;
print b;

Let us now run ANTLR and out parser:

% java -jar ~/ANTLR/antlr-4.4-complete.jar MyLanguageV0Code.g4

% javac -cp .:antlr-4.4-complete.jar MyLanguageV0Code*.java

% java  -cp .:antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig MyLanguageV0Code program sourcecode.txt

The output is:

%include "asm_io.inc"
segment .data
	a  dd 0
	b  dd 0
segment .text
	global asm_main
asm_main:
	enter 0,0
	pusha
	mov eax,3
	mov [a], eax
	mov eax,[a]
	add eax,1
	mov [b], eax
cmp dword [b],4
jnz label_16
	mov eax,2
	mov [a], eax
label_16:
cmp dword [a],3
jnz label_27
	mov eax,[a]
	add eax,1
	mov [a], eax
	mov eax,[b]
	add eax,6
	mov [b], eax
label_27:
	mov eax, [a]
	call print_int
	call print_nl
	mov eax, [b]
	call print_int
	call print_nl
	popa
	mov eax,0
	leave
	ret

You can now run this code using NASM, and check that it works! Our compiler has MANY weaknesses and is a far cry from a production compiler. For instance:

Addressing the above is very interesting, but would unfortunately take us well beyond the scope of this course. Instead, in a programming assignment you will add to the language recognized by the compiler and thus need to extend the compiler.