Feature image

The Making Of Sarcasm (2) - AST, Generators and Fun with Visualization

In part 1 we discussed the design goals of Sarcasm and devised a grammar specification that covers most of Irony’s features.

Continuing from commit 15c9e6, in which I implemented the grammar specs by hand with Irony, we will discuss the following topics:

  • Construction of abstract syntax tree
  • Generator workflow
  • MarkDown generator
  • Having some fun with visualization

AST Overview

After grammar class is implemented, the first thing you are going to do is to create a parser instance:

1
2
3

var language = new LanguageData(new SarcasmGrammar());
var parser = new Irony.Parsing.Parser(_language);

With the Parser instance, we can parse source code by simply:

1
2
3
var parseTree = parser.Parse(sourceCode);
var parseRoot = parseTree.Root;
var astRoot = ParseRoot.AstNode;

If something is wrong with the grammar or source code, parseRoot and astRoot will be null. For now I will not go into error handling.

Two kinds of trees are generated when Irony parses the source code: parsing tree and optional abstract syntax tree. To create an AST, you must do the following:

1. Set language flag in grammar class’s constructor

1
LanguageFlags |= LanguageFlags.CreateAst;

2. Create a bunch of AST node class deriving from Irony.Interpreter.Ast (also remember to add reference to assembly Irony.Interpreter.dll).

1
2
3
4
5
6
// Make your own base class will make life eaiser
public abstract class SarcasmNode : AstNode {/*...*/}
// For other nodes
public class Document : SarcasmNode {/*...*/}
public class IdNode : SarcasmNode {/*...*/}
/*...*/

3. Assign AST node class to each Terminal/NonTerminal instances.

1
2
3
4
5
// For terminals
var ID = new IdentifierTerminal("ID");
ID.AstConfig.NodeType = typeof (IdNode);
// For non-terminals
var Directive = new NonTerminal("Directive", typeof(DirectiveNode));

4. Override AST node’s Init method to handle initialization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// In AST node class
public AstNode ChildNode;

public override void Init(AstContext context, ParseTreeNode treeNode)
{

// Keep this
base.Init(context, treeNode);
// treeNode is the corresponding node in parse tree, contains information like:
// Token
var token = treeNode.Token;
// Term
var term = treeNode.Term;
// Child nodes
var nodes = treeNode.GetMappedChildNodes();

// Set AsString to a human readable format. It will be used to display AST in Irony.GrammarExplorer
AsString = "Id: " + token.Text;
// Use AddChild to build tree structure, it returns an AstNode instance of child's AST node
ChildNode = AddChild(string.Empty, nodes[0]);
}

That’s almost all you need to know about how to construct an AST. However, if you mess it up, things can get ugly since the debug information is extremely not helpful. The most common exception on can get is:

System.NullReferenceException: Object reference not set to an instance of an object.
at Irony.Ast.AstBuilder.BuildAst(ParseTreeNode parseNode) in f:\Dev\Tool\Irony_2013_12_12\Irony\Ast\AstBuilder.cs:line 97
This will not help you at all. But I will tell you that this always has to do with forgetting to set AST node type to one of your Terminals/Non-Terminals.

Here are some tips I learned in the hard way (the only way mostly, since Irony’s documentation is poor):

  • Assign AST node type to all Terminals/NonTerminals, including any intermediate/temporary ones.
  • Except the ones marked as transient. They will not be created at all.
  • CommentTerminals will NOT be mapped to AST at all. You will get the above error regardless the AST node type is set or not.

Generator Workflow

AST marks the watershed between compiler’s front end and back end. Although there’re still some work (e.g. type validation and semantic analysis) left to be done, we can already generate something with this AST now. The most commonly used method here is the visitor pattern:

1. Declare a interface for Visitor, one overload for each AST node

1
2
3
4
5
6
7
public interface ISarcasmVisitor
{
void Visit(IdNode node);
void Visit(Document node);
void Visit(StringValueNode node);
// others
}

2. Add virtual/abstract method to AST base class and implement in all derived class

1
2
3
4
5
6
7
8
9
10
11
12
public abstract class SarcasmNode : AstNode
{
public abstract void Accept(ISarcasmVisitor visitor);
}

public class Document : SarcasmNode
{
public override void Accept(ISarcasmVisitor visitor)
{

visitor.Visit(this);
}
}

3. Then we can create a generator by implement specific ISarcasmVisitor for different workflow, not only for target code generation but also outlining, semantic analysis.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public abstract class TargetGenerator
{
public SarcasmParser Parser { get; set; }

protected TargetGenerator(SarcasmParser parser)
{

Parser = parser;
}

protected virtual void BeforeVisitor(StreamWriter writer, ISarcasmVisitor visitor) { }
protected abstract ISarcasmVisitor MakeVisitor(StreamWriter writer);
protected virtual void AfterVisitor(StreamWriter writer, ISarcasmVisitor visitor) { }
public bool Generate(StreamReader sourceStream, StreamWriter targetStream)
{

// Parse first
Parser.Parse(sourceStream);
if (Parser.IsValid)
{
var visitor = MakeVisitor(targetStream);
BeforeVisitor(targetStream, visitor);
// Visit AST
Parser.Document.Accept(visitor);
AfterVisitor(targetStream, visitor);
}
targetStream.Flush();
return Parser.IsValid;
}
}

MarkDown Generation

MarkDown generation for Sarcasm is very straight forward, since the syntax is in MarkDown. All I need to do is remove comment delimiters, add correct amount of line endings, format grammar rule into block and escape some special characters.

Again I won’t bother with the details here. Just see the code for yourself.

Something Fun with Visualization

The original plan was to start generate C# parser class from here. Then I found an interesting project arbor.js (especially its halfviz demo) and decided to do something fun with it. The idea is to make a better tool for debug. What debug information is better than a visualized one?

The halfviz demo converts a simple language called HalfTone to a node network. With the generator framework in place, it took me less than half an hour to generate node representation from Sarcasm grammar source file. This can be used to visualize references between terminals and non-terminals:

sarcasm-vis

You can play with it live here. It looks more confusing in this form, for now. But with some interaction (filtering, folding, highlighting for example), it can help develops quickly navigate though the grammar.
Here’s another concept of how to visualize grammar related errors in this form (click to enlarge):
sarcasm-concept

Imagine view build errors in Visual Studio with this graph and navigate to the line that is responsible by click on the node. I definitely will try to create something like that later when I begin to make tool chain for Sarcasm.