Here is the compiler we developed in class have developed it in class or close (you can download the file here):
// A simple syntax-directed translator for a simple language
grammar MyLanguageV0Code;
// Root non-terminal symbol
// A program is a bunch of declarations followed by a bunch of statements
// The Java code outputs the necessary NASM code around these declarations
program :
{System.out.println("%include \"asm_io.inc\"");
System.out.println("segment .data"); }
declaration*
{System.out.println("segment .text");
System.out.println("\tglobal asm_main");
System.out.println("asm_main:");
System.out.println("\tenter 0,0");
System.out.println("\tpusha"); }
statement*
{System.out.println("\tpopa");
System.out.println("\tmov eax,0");
System.out.println("\tleave");
System.out.println("\tret"); }
;
// Parse rule for variable declarations
declaration :
{int a; }
INT a=NAME SEMICOLON
{System.out.println("\t"+$a.text + " dd 0");}
;
// Parse rule for statements
statement :
ifstmt
| printstmt
| assignstmt
;
// Parse rule for if statements
ifstmt :
{int a,b;}
{String label;}
IF LPAREN a=identifier EQUAL b=integer RPAREN
{System.out.println("cmp dword "+$a.toString+","+$b.toString);
label = "label_"+Integer.toString($IF.index);
System.out.println("jnz "+label); }
statement*
{ System.out.println(label+":"); }
ENDIF
;
// Parse rule for print statements
printstmt :
PRINT term SEMICOLON
{System.out.println("\tmov eax, "+$term.toString);
System.out.println("\tcall print_int");
System.out.println("\tcall print_nl");}
;
// Parse rule for assignment statements
assignstmt :
{int a; }
a=NAME ASSIGN expression SEMICOLON
{System.out.println("\tmov ["+$a.text+"], eax");}
;
// Parse rule for expressions
expression :
{int a,b; }
a=term
{System.out.println("\tmov eax,"+$a.toString);}
|
a=term PLUS b=term
{System.out.println("\tmov eax,"+$a.toString);}
{System.out.println("\tadd eax,"+$b.toString);}
;
// Parse rule for terms
term returns [String toString] :
identifier {$toString = $identifier.toString;}
| integer {$toString = $integer.toString;}
;
// Parse rule for identifiers
identifier returns [String toString]: NAME {$toString = "["+$NAME.text+"]";} ;
// Parse rule for numbers
integer returns [String toString]: INTEGER {$toString = $INTEGER.text;} ;
// Reserved Keywords
////////////////////////////////
IF: 'if';
ENDIF: 'endif';
PRINT: 'print';
INT: 'int';
// Operators
PLUS: '+';
EQUAL: '==';
ASSIGN: '=';
NOTEQUAL: '!=';
// Semicolon and parentheses
SEMICOLON: ';';
LPAREN: '(';
RPAREN: ')';
// Integers
INTEGER: [0-9][0-9]*;
// Variable names
NAME: [a-z]+;
// Ignore all white spaces
WS: [ \t\r\n]+ -> skip ;
Let us try to run ANTLR on a source file named sourcecode.txt with content:
int a;
int b;
a = 3;
b = a + 1;
if (b == 4) a = 2; endif
if (a == 3)
a = a + 1;
b = b + 6;
endif
print a;
print b;
Let us now run ANTLR and out parser:
% java -jar ~/ANTLR/antlr-4.4-complete.jar MyLanguageV0Code.g4
% javac -cp .:antlr-4.4-complete.jar MyLanguageV0Code*.java
% java -cp .:antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig MyLanguageV0Code program sourcecode.txt
The output is:
%include "asm_io.inc"
segment .data
a dd 0
b dd 0
segment .text
global asm_main
asm_main:
enter 0,0
pusha
mov eax,3
mov [a], eax
mov eax,[a]
add eax,1
mov [b], eax
cmp dword [b],4
jnz label_16
mov eax,2
mov [a], eax
label_16:
cmp dword [a],3
jnz label_27
mov eax,[a]
add eax,1
mov [a], eax
mov eax,[b]
add eax,6
mov [b], eax
label_27:
mov eax, [a]
call print_int
call print_nl
mov eax, [b]
call print_int
call print_nl
popa
mov eax,0
leave
ret
You can now run this code using NASM, and check that it works! Our compiler has MANY weaknesses and is a far cry from a production compiler. For instance:
It doesn’t catch undeclared variables
It makes very poor use of the register set
It leads to assembly code with wasteful instructions
Addressing the above is very interesting, but would unfortunately take us well beyond the scope of this course. Instead, in a programming assignment you will add to the language recognized by the compiler and thus need to extend the compiler.