In part 1 we discussed the design goals of Sarcasm and devised a grammar specification that covers most of Irony’s features.
Continuing from commit 15c9e6, in which I implemented the grammar specs by hand with Irony, we will discuss the following topics:
- Construction of abstract syntax tree
- Generator workflow
- MarkDown generator
- Having some fun with visualization
AST Overview
After grammar class is implemented, the first thing you are going to do is to create a parser instance:
1 |
|
With the Parser instance, we can parse source code by simply:
1 | var parseTree = parser.Parse(sourceCode); |
If something is wrong with the grammar or source code, parseRoot and astRoot will be null. For now I will not go into error handling.
Two kinds of trees are generated when Irony parses the source code: parsing tree and optional abstract syntax tree. To create an AST, you must do the following:
1. Set language flag in grammar class’s constructor
1 | LanguageFlags |= LanguageFlags.CreateAst; |
2. Create a bunch of AST node class deriving from Irony.Interpreter.Ast (also remember to add reference to assembly Irony.Interpreter.dll).
1 | // Make your own base class will make life eaiser |
3. Assign AST node class to each Terminal/NonTerminal instances.
1 | // For terminals |
4. Override AST node’s Init method to handle initialization.
1 | // In AST node class |
That’s almost all you need to know about how to construct an AST. However, if you mess it up, things can get ugly since the debug information is extremely not helpful. The most common exception on can get is:
System.NullReferenceException: Object reference not set to an instance of an object.
at Irony.Ast.AstBuilder.BuildAst(ParseTreeNode parseNode) in f:\Dev\Tool\Irony_2013_12_12\Irony\Ast\AstBuilder.cs:line 97
This will not help you at all. But I will tell you that this always has to do with forgetting to set AST node type to one of your Terminals/Non-Terminals.
Here are some tips I learned in the hard way (the only way mostly, since Irony’s documentation is poor):
- Assign AST node type to all Terminals/NonTerminals, including any intermediate/temporary ones.
- Except the ones marked as transient. They will not be created at all.
- CommentTerminals will NOT be mapped to AST at all. You will get the above error regardless the AST node type is set or not.
Generator Workflow
AST marks the watershed between compiler’s front end and back end. Although there’re still some work (e.g. type validation and semantic analysis) left to be done, we can already generate something with this AST now. The most commonly used method here is the visitor pattern:
1. Declare a interface for Visitor, one overload for each AST node
1 | public interface ISarcasmVisitor |
2. Add virtual/abstract method to AST base class and implement in all derived class
1 | public abstract class SarcasmNode : AstNode |
3. Then we can create a generator by implement specific ISarcasmVisitor for different workflow, not only for target code generation but also outlining, semantic analysis.
1 | public abstract class TargetGenerator |
MarkDown Generation
MarkDown generation for Sarcasm is very straight forward, since the syntax is in MarkDown. All I need to do is remove comment delimiters, add correct amount of line endings, format grammar rule into block and escape some special characters.
Again I won’t bother with the details here. Just see the code for yourself.
Something Fun with Visualization
The original plan was to start generate C# parser class from here. Then I found an interesting project arbor.js (especially its halfviz demo) and decided to do something fun with it. The idea is to make a better tool for debug. What debug information is better than a visualized one?
The halfviz demo converts a simple language called HalfTone to a node network. With the generator framework in place, it took me less than half an hour to generate node representation from Sarcasm grammar source file. This can be used to visualize references between terminals and non-terminals:
You can play with it live here. It looks more confusing in this form, for now. But with some interaction (filtering, folding, highlighting for example), it can help develops quickly navigate though the grammar.
Here’s another concept of how to visualize grammar related errors in this form (click to enlarge):
Imagine view build errors in Visual Studio with this graph and navigate to the line that is responsible by click on the node. I definitely will try to create something like that later when I begin to make tool chain for Sarcasm.