In this series of projects you will write a compiler for a small subset of Pascal. In this assignment, you will start writing the syntax analysis and code generation component of the compiler. Specifically, you will write the parts of the compiler needed to handle program statements, global variables, assignment, expressions, and writeln. Your compiler should generate MIPS assembly code runable on SPIM.
Step 1. Lexical functions
The input to this stage of your compiler will be the String postlex that your lexical analyzer generated. A pointer variable will point to the current position (initially zero) in postlex. You will need to write the following methods to obtain tokens for your syntactic analyzer:
- void moveToNext(): moves the pointer to the beginning of the next token
- String getReservedWord(): returns the reserved word at the current postlex position. If the current token is not a reserved word, it returns "".
- String getSymbol(): returns the symbol at the current postlex position. If the current token is not a symbol, it returns "".
- int getNumber(): returns the integer value at the current postlex position. If the current token is not a value, it returns -1.
- int getIdentifier(): returns the integer index of the identifier at the current postlex position. If the current token is not an identifier, it returns -1.
- void syntaxerror(String): prints out a syntax error, and terminates the compiler.
You should also create an output String outputcode, originally "". If compiling is successful, you should print out outputcode at the end and write it to a text file. Your output text file should have the same name as your input .PAS file, but have the extension .ASM.
Step 2. Parsing PROGRAM, BEGIN, END.
The next step is to write a recursive descent parser to parse a basic miniPascal program consisting of nothing other than the following:
Your recursive descent parser should follow your BNF. Most likely, it will consist of a single method void parseProgram() which will look for PROGRAM, identifier, semicolon, BEGIN, END. in turn. If it does not find them, it will call syntaxerror.
Upon finding each component, parseProgram should write the appropriate assembly code to outputcode. After PROGRAM, you should write the beginning of the MIPS file ".data", after BEGIN, the beginning of the code ".text .globl main main:", and after END. the end of the code "li $v0, 10 syscall".
Once you complete this step, you should verify your compiler by compiling the above program and running your output on SPIM. If your compiler output loads correctly into SPIM and runs, you are ready to proceed.
Step 3. Parsing VAR and variables
You should now add variables. Write the part of the parser to handle VAR and global variable definitions. For each integer and boolean variable, assign it a name such as "V5" based on its index of its identifier. In your output code, every variable should be defined as a ".word 0" in the ".data" segment. You can write this output code as soon as you parse the variable's identifier (since both BOOLEANs and INTEGERs are defined as .word).
When you complete Step 3, you should be able to parse:
APPLE, PEAR: INTEGER;
Step 4. Parsing assignment statements: identifer := expression;
Now you should parse expressions using recursive descent parsing. Your parser should parse a series of assignment statements such as APPLE := PEAR + 4;
The class parser example should be helpful for structuring your code.
The assembly code generation works as follows. When a terminal (either an identifier or value) is reached, you should write assembly code to load that variable or value into register $v0 (for example, "lw $v0, V5" or "li $v0, 4"). To perform an addition, backup $v0 to $s0 ("move $s0, $v0"), parse the second term, and add them, putting the result in $v0 ("add $v0, $s0, $v0"). At the end of any expression parsing, the result of the expression should be in register $v0. Assignment simply become storing $v0 to a variable, such as "sw $v0, V5").
The tricky part of this is handling the stack. More than one of your parsing functions will backup in $s0. The result is that $s0 might get overwritten when one parsing function calls another. Consequently, the first thing you should do in every parsing function that makes up your expression parsing is to back up $s0 on the stack. You can do this by writing "sub $sp, $sp, 4 sw $s0, ($sp)" to outputcode. The last thing you should do in every parsing function, prior to returning, is to restore $s0. You can do this by writing "lw $s0, ($sp) add $sp, $sp, 4" to outputcode.
When you complete Step 4, you should be able to parse:
APPLE, PEAR: INTEGER;
APPLE := 15;
Your assembly should look something like:
V1: .word 0
V2: .word 0
li $v0, 15
sw $v0, V1
sub $sp, $sp, 4
sw $s0, ($sp)
lw $v0, V1
move $s0, $v0
li $v0, 3
add $v0, $s0, $v0
lw $s0, ($sp)
add $sp, $sp, 4
sw $v0, V2
For booleans, you should treat TRUE as having a value of 1 and FALSE as having a value of 0. When dealing with boolean operators such as =, <, >, and <>, write the appropriate value of 1 or 0 to $v0.
Step 5. Parsing WRITELN()
You should now implement WRITELN(expression) statements that will write assembly code to print out the result of an expression, followed by a newline.
First, immediately after you output ".data" in parseProgram, add the line "CRLF: .byte 0xd,0xa,0x0" to outputcode. You will use this for your newline.
Your WRITELN code should make a call to parse an expression. It should then write the appropriate syscall assembly code to outputcode to print the result of the expression. Recall that to output a number in SPIM the number should be in $a0 and you use syscall 1. So you should output "move $a0,$v0 li $v0,1 syscall". Next you should print out the newline, so you should output "la $a0, CRLF li $v0,4 syscall".
If the parser detects a syntax error, it should call the syntaxerror method with a meaningful message, such as "; expected" or "no END. found".