In this phase, you'll build upon your custom lexical analyzer to create a parser that constructs an Abstract Syntax Tree (AST) from the token stream. The parser validates the syntactic structure of the input according to the language grammar and creates a tree representation suitable for further processing.
Parsing is the process of converting a flat sequence of tokens into a hierarchical tree structure that represents the syntactic organization of the program.
An Abstract Syntax Tree is a tree representation that:
- Captures the essential syntactic elements
- Removes unnecessary syntactic details
- Provides a clean, hierarchical view of the code
phase2/
├── include/
│ ├── tokens.h # Token definitions from Phase 1
│ ├── lexer.h # Lexer interface
│ └── parser.h # Parser definitions
├── src/
│ ├── lexer/
│ │ └── lexer.c # Lexer implementation from Phase 1
│ └── parser/
│ └── parser.c # Parser implementation
└── test/
├── input_valid.txt
└── input_invalid.txt
The starter code provides:
- Basic AST node structure
- Simple statement parsing (declarations and assignments)
- Error reporting framework
- AST printing utilities
Example of currently supported syntax:
int x; // Variable declaration
x = 42; // Assignment statement- Review your lexer's token generation
- Print out tokens to understand the input
- Verify token sequence before parsing
// Example of examining tokens
void print_token_stream(const char* input) {
int position = 0;
Token token;
do {
token = get_next_token(input, &position);
print_token(token);
} while (token.type != TOKEN_EOF);
}- Create a parser context
- Set up initial parsing state
- Prepare for token consumption
// Basic parser initialization
void parser_init(const char *input) {
source = input;
position = 0;
advance(); // Get first token
}// Check if current token matches expected type
static int match(TokenType type) {
return current_token.type == type;
}
// Consume current token and move to next
static void advance(void) {
current_token = get_next_token(source, &position);
}
// Expect specific token or report error
static void expect(TokenType type) {
if (match(type)) {
advance();
} else {
parse_error(PARSE_ERROR_UNEXPECTED_TOKEN, current_token);
}
}- Implement operator precedence
- Add support for:
- Binary operations (+, -, *, /)
- Comparison operators (<, >, ==, !=)
- Parenthesized expressions
- If statements:
if (condition) { statements } - While loops:
while (condition) { statements } - Repeat-Until:
repeat { statements } until (condition) - Print statements:
print expression; - Block statements:
{ statement1; statement2; }
- Factorial function support
- Block scoping
- Extend error types
- Improve error messages
- Add error recovery
- Track line and column information
// Create AST node with basic information
static ASTNode *create_node(ASTNodeType type) {
ASTNode *node = malloc(sizeof(ASTNode));
if (node) {
node->type = type;
node->token = current_token;
node->left = NULL;
node->right = NULL;
}
return node;
}// Parse variable declaration
static ASTNode *parse_declaration(void) {
ASTNode *node = create_node(AST_VARDECL);
advance(); // consume 'int'
if (!match(TOKEN_IDENTIFIER)) {
parse_error(PARSE_ERROR_MISSING_IDENTIFIER, current_token);
return NULL;
}
node->token = current_token;
advance();
expect(TOKEN_SEMICOLON);
return node;
}
// Parse assignment statement
static ASTNode *parse_assignment(void) {
ASTNode *node = create_node(AST_ASSIGN);
node->left = create_node(AST_IDENTIFIER);
node->left->token = current_token;
advance();
expect(TOKEN_EQUALS);
node->right = parse_expression();
expect(TOKEN_SEMICOLON);
return node;
}// Valid syntax
int x;
x = 42;
if (x > 0) {
print x;
}
// Error cases
if (x > ) { // Invalid expression
while x > 0 { // Missing parentheses
int ; // Missing identifiervoid print_ast(ASTNode *node, int level) {
if (!node) return;
// Indent and print node details
for (int i = 0; i < level; i++) printf(" ");
switch (node->type) {
case AST_PROGRAM:
printf("Program\n");
break;
case AST_VARDECL:
printf("VarDecl: %s\n", node->token.lexeme);
break;
// Add more node type printings
}
// Recursively print children
print_ast(node->left, level + 1);
print_ast(node->right, level + 1);
}- Start with the most straightforward possible implementation
- Add complexity incrementally
- Test each feature thoroughly
- Focus on error handling
- Document your changes
- Update AST printing for debugging
- Handling complex expressions
- Implementing error recovery
- Managing memory for AST nodes
- Supporting various language constructs
- Complete implementation of your custom parser.c
- Test files demonstrating your features
- Documentation of your:
- Grammar rules
- Error handling strategy
- AST structure
- Test cases
- Parse simple declarations
- Handle basic assignments
- Implement expression parsing
- Add error handling
- Support more complex statements
- Optimize AST generation
- Compiler design textbooks
- Online parsing tutorials
- Lecture notes on syntax analysis
Good luck! 🚀