Tuesday, June 3, 2008

Using .NET as a target plaform in staged metaprogramming

Abstract
This article is a brief proof-of-concept which illustrates how to use .NET as a target platform in staged metaprogramming. Arithmetic expression DSL defined in previous article is used as an example. Process of building of simple compiler of our DSL is presented.

Definition of problem
Our arithmetical expression DSL allows to specify expressions like this:
(expression (sub (add 2 (mul 3 4)) 5))
Let's make compiler of our DSL to the CLR (runtime of the .NET platform).

Quick introduction to the CLR
AFAIK CLR core language is the Intermediate Language (IL). It can be treated as a stack-based assembler.
Here is a short introduction to the subset of IL we are interested in:

IL program text consists of instructions.
ldc.i8 value instruction is used to push 64-bit integer constant value to the stack.
call instruction is used to call functions: callee gets arguments from the stack and if need it lefts return value at the top at exit.
We can use add/sub/mul instructions to add, subtract and multiply values.
void[ mscorlib]System.Console::WriteLine(int64) function allows us to display result.
ret instruction is used to return from the function.

IL code can be compiled to an executable (called assembly) using ilasm.exe utility.

If you have experience with C# (or other .NET high-level language) then easiest way to get idea what IL looks like is to write some C# code and look what IL code is produced. This can be done using IL disassembler utility (ildasm.exe) or .NET Reflector.

Strategy of compiling
What is easiest way to deliver our expression DSL to CLR? In previous article we defined transformation of our DSL to the abstract stack machine AST. What we need is to transform it to the IL.

Let's do this in two steps: first step will be transformation of abstract stack machine AST to IL AST. Second step will be transformation of IL AST to IL text.
After that we could use ilasm.exe to generate executable file (.NET assembly) from IL text.

Of course one could implement all of this in single step. However different steps is better solution because it allows us to separate aspects of transforming and text-generation. These aspects could be implemented and tested separately. Moreover implementation of the second step has independent value: it can be reused in the future for the purposes of other tasks.

First step: transformation of stack machine AST to AST of IL:
We need transform from the abstract stack machine code like this:
(stack-machine-code
(push-integer 2)
(push-integer 3)
(push-integer 4)
(mul)
(add)
(push-integer 5)
(sub)
(print)
)
to the IL AST like this:
(assembly Example
(method 'main
(entrypoint)
(body
(il
(ldc.i8 2)
(ldc.i8 3)
(ldc.i8 4)
(mul)
(add)
(ldc.i8 5)
(sub)
(call "void [mscorlib]System.Console::WriteLine(int64)")
(ret)))))

Here we've specified that we need to create an executable assembly with method (function) main which is entry point and which contains specified IL code as a body. Note: this is not standard representation but just-invented AST of the IL which reflects our current thinking about a subset of the CLR platform.

We can peform this transformation using following code:
(define-ast-handler (stack-machine-to-il)
(define (stack-machine-code . instructions)
`(assembly Example
(method 'main
(entrypoint)
(body
(il ,@instructions (ret))))))
(define (push-integer value) `(ldc.i8 ,value))
(define (mul) '(mul))
(define (add) '(add))
(define (sub) '(sub))
(define (print) '(call "void [mscorlib]System.Console::WriteLine(int32)"))
)
Second step: generating IL text
Generating IL text is simply transforming fragments of IL AST to appropriate strings. With the input of the IL AST shown in above section we expect to got following text:
.assembly extern mscorlib {}
.assembly Example { .ver 1:0:0:0 }
.method static void main() {
.entrypoint
ldc.i8 2
ldc.i8 3
ldc.i8 4
mul
add
ldc.i8 5
sub
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
Here is a code that does all this stuff:
(require (lib "list.ss"))
(define (concat-list l) (foldr string-append "" l))

(define-ast-handler (to-il-text)
(define concat-list ,concat-list)
(define (assembly name . rest)
(format
".assembly extern mscorlib {}\n.assembly ~a { .ver 1:0:0:0 }\n~a"
name
(concat-list rest)))
(define (method name . rest)
(format ".method static void ~s() {\n~a\n}" name (concat-list rest)))
(define (body il) il)
(define (il . instructions) (concat-list instructions))
(define (ldc.i8 value) (format "ldc.i8 ~a\n" value))
(define (mul) (format "mul\n"))
(define (add) (format "add\n"))
(define (sub) (format "sub\n"))
(define (call name) (format "call ~a\n" name))
(define (ret) (format "ret\n"))
(define (entrypoint) ".entrypoint\n")
)
Compiling
Compiling is simply performed by invocation of ilasm.exe utility.
This can be done like this:
(define (compile il-text)
(if (file-exists? "test.il") (delete-file "test.il"))
(with-output-to-file example.il" (lambda () (write-string il-text)))
(system "ilasm.exe example.il"))
Now we have executable file example.exe.

Conclusion
We've just implemented proof-of-concept compiler of custom domain specific language to the .NET platform. Simple method of transformation of abstract syntax trees was used to map our language to IL. Speaking other words we've actively used staged metaprogramming.
Doing this in top-down way we haven't needed to know all details and all features of the target platform. For large set of real-word tasks we need just small subset of target platform.
Of course to build effective implementation we need deep understanding of target platform and underlying hardware. Top-down way can help us in this challenge:
  • It allows us to learn target platform in systematic way in the context of our task
  • When our knowledge about target platform is increased and we found new way to build more effective solution we can refactor or even completely rewrite low-level implementation without affecting the high level

No comments: