Outside the Box: June 2008

Sunday, June 22, 2008

Distributed systems design guidelines

Building high performance high available fault tolerant and scalable distributed system is a challenge problem.

Following are three primary design guidelines:

1. Avoid centralized components
2. Avoid centralized data
3. Avoid centralized processes

Sounds obviously, doesn't it?

Other design issues (from the "Distributed Systems: Principles and Paradigms" book):

1. No machine has complete information about the system state
2. Machines make decision based only on local information
3. Failure of one machines does not ruin the algorithm
4. There is no explicit assumption that a global clock exists (it's impossible to get all the clocks exactly synchronized)

Tuesday, June 3, 2008

Using .NET as a target plaform in staged metaprogramming

Abstract
This article is a brief proof-of-concept which illustrates how to use .NET as a target platform in staged metaprogramming. Arithmetic expression DSL defined in previous article is used as an example. Process of building of simple compiler of our DSL is presented.

Definition of problem
Our arithmetical expression DSL allows to specify expressions like this:

(expression (sub (add 2 (mul 3 4)) 5))

Let's make compiler of our DSL to the CLR (runtime of the .NET platform).

Quick introduction to the CLR
AFAIK CLR core language is the Intermediate Language (IL). It can be treated as a stack-based assembler.
Here is a short introduction to the subset of IL we are interested in:

IL program text consists of instructions.
ldc.i8 value instruction is used to push 64-bit integer constant value to the stack.
call instruction is used to call functions: callee gets arguments from the stack and if need it lefts return value at the top at exit.
We can use add/sub/mul instructions to add, subtract and multiply values.
void[ mscorlib]System.Console::WriteLine(int64) function allows us to display result.
ret instruction is used to return from the function.

IL code can be compiled to an executable (called assembly) using ilasm.exe utility.

If you have experience with C# (or other .NET high-level language) then easiest way to get idea what IL looks like is to write some C# code and look what IL code is produced. This can be done using IL disassembler utility (ildasm.exe) or .NET Reflector.

Strategy of compiling

What is easiest way to deliver our expression DSL to CLR? In previo us article we defined transformation of our DSL to the abstract stack machine AST. What we need is to transform it to the IL.

Let's do this in two steps: first step will be transformation of abstract stack machine AST to IL AST. Second step will be transformation of IL AST to IL text.
After that we could use ilasm.exe to generate executable file (.NET assembly) from IL text.

Of course one could implement all of this in single step. However different steps is better solution because it allows us to separate aspects of transforming and text-generation. These aspects could be implemented and tested separately. Moreover implementation of the second step has independent value: it can be reused in the future for the purposes of other tasks.

First step: transformation of stack machine AST to AST of IL:
We need transform from the abstract stack machine code like this:

(stack-machine-code
   (push-integer 2)
   (push-integer 3)
   (push-integer 4)
   (mul)
   (add)
   (push-integer 5)
   (sub)
   (print)
)

to the IL AST like this:

(assembly Example
   (method 'main
       (entrypoint)
       (body
            (il
               (ldc.i8 2)
               (ldc.i8 3)
               (ldc.i8 4)
               (mul)
               (add)
               (ldc.i8 5)
               (sub)
               (call "void [mscorlib]System.Console::WriteLine(int64)")
               (ret)))))

Here we've specified that we need to create an executable assembly with method (function) main which is entry point and which contains specified IL code as a body. Note: this is not standard representation but just-invented AST of the IL which reflects our current thinking about a subset of the CLR platform.

We can peform this transformation using following code:

(define-ast-handler (stack-machine-to-il)
    (define (stack-machine-code . instructions)
       `(assembly Example
            (method 'main
               (entrypoint)
               (body
                  (il ,@instructions (ret))))))
    (define (push-integer value) `(ldc.i8 ,value))
    (define (mul) '(mul))
    (define (add) '(add))
    (define (sub) '(sub))
    (define (print) '(call "void [mscorlib]System.Console::WriteLine(int32)"))
)

Second step: generating IL text
Generating IL text is simply transforming fragments of IL AST to appropriate strings. With the input of the IL AST shown in above section we expect to got following text:

.assembly extern mscorlib {}
.assembly Example { .ver 1:0:0:0 }
.method static void main() {
.entrypoint
ldc.i8 2
ldc.i8 3
ldc.i8 4
mul
add
ldc.i8 5
sub
call void [mscorlib]System.Console::WriteLine(int32)
ret
}

Here is a code that does all this stuff:

(require (lib "list.ss"))
(define (concat-list l) (foldr string-append "" l))

(define-ast-handler (to-il-text)
  (define concat-list ,concat-list)
  (define (assembly name . rest)
       (format
           ".assembly extern mscorlib {}\n.assembly ~a { .ver 1:0:0:0 }\n~a"
           name
          (concat-list rest)))
  (define (method  name . rest)
      (format ".method static void ~s() {\n~a\n}" name (concat-list rest)))
  (define (body il)  il)
  (define (il . instructions) (concat-list instructions))
  (define (ldc.i8 value) (format "ldc.i8 ~a\n" value))
  (define (mul) (format "mul\n"))
  (define (add) (format "add\n"))
  (define (sub) (format "sub\n"))
  (define (call name) (format "call ~a\n" name))
  (define (ret)  (format "ret\n"))
  (define (entrypoint) ".entrypoint\n")
)

Compiling
Compiling is simply performed by invocation of ilasm.exe utility.
This can be done like this:

(define (compile il-text)
    (if (file-exists? "test.il") (delete-file "test.il"))
    (with-output-to-file example.il" (lambda () (write-string il-text)))
    (system "ilasm.exe example.il"))

Now we have executable file example.exe.

Conclusion
We've just implemented proof-of-concept compiler of custom domain specific language to the .NET platform. Simple method of transformation of abstract syntax trees was used to map our language to IL. Speaking other words we've actively used staged metaprogramming.
Doing this in top-down way we haven't needed to know all details and all features of the target platform. For large set of real-word tasks we need just small subset of target platform.
Of course to build effective implementation we need deep understanding of target platform and underlying hardware. Top-down way can help us in this challenge:

It allows us to learn target platform in systematic way in the context of our task
When our knowledge about target platform is increased and we found new way to build more effective solution we can refactor or even completely rewrite low-level implementation without affecting the high level

Sunday, June 1, 2008

Simple way of AST transformation using Scheme

Abstract
This article demonstrates simple way of transforming abstract syntax trees represented in S-expressions using Scheme programming language [1, 2]. Arithmetical expression evaluation domain is used as an example.

Problem definition
Imagine that we have following problem:
We need to build simple arithmetical expression processing engine. Expressions in our domain consists of integer numbers and operations of addition, subtraction and multiplication. We required to represent expression in some unified (canonical) way. We need to allow direct evaluation to show value. We are are required to generate C code for evaluating and printing the expression. We are required to print formula in infix way. Additionally we required to generate code for some stack machine that allows operations of pushing integer numbers to the stack, adding subtracting and multiplying and printing pushed values. Moreover, we don't have specification of machine codes or assembler for this machine right now, but we are required to start solving the problem...

Representations of arithmetical expressions
There are different possible ways of representation of expressions. Let's take into attention classical infix notation of some expression is 2+3*4-5. To interpret this expression we need information about operation priority and order of evaluation. This expression could be represented by following equivalent abstract syntax tree:

    expression
     |
    sub
   /   \ 
  add   5
 /   \
2    mul
    /   \
   3     4

Where add/sub/mul is just selected shorthands for addition/substration/multiplication. To understand this tree we don't need information about order/priorities. This tree could be represented in alternative sequential form using S-expression:

(expression (sub (add 2 (mul 3 4)) 5))

For complex expressions indentation is commonly used to keep it human-readable:

(expression
 (sub
   (add
      2
     (mul 3 4))
   5))

Let this notation will be our DSL to represent expressions.

Calculating value
Now let's try to solve problem of calculating expression value. We will use Scheme programming language [1,2]. Scheme was selected by following reasons:

Scheme allows its syntax extension; directly supports homogeneous metaprogramming
Scheme is very simple to learn and use
Scheme is minimal dialect of LISP that is standardised and high-portable

Because Scheme programs are also written in S-expressions we could say that programming in Scheme is performing directly in abstract syntax tree.
Basic idea of method of handling AST is following: Let define mul/add/sub/expression as scheme procedures and evaluate our DSL as scheme program:

(define (expression v) v)
(define (add x y) (+ x y))
(define (sub x y) (- x y))
(define (mul x y) (* x y))

(define (calculate-value expr)
 (eval expr))

We've just got procedure calculate-value that calculates value of provided expression in run-time. Result 9 will be evaluated.if one call it like this:

(calculate-value '(expression (sub (add 2 (mul 3 4)) 5)) )

Same way we can solve other parts of our problem.

Generating printable infix representation
Here is code for generation of printable infix representation:

(define (expression v) v)
(define (add x y) (format "(~a+~a)" x y))
(define (sub x y) (format "(~a/~a)" x y))
(define (mul x y) (format "(~a*~a)" x y))

(define (to-infix expr)
 (eval expr))

(to-infix '(expression (sub (add 2 (mul 3 4)) 5))) produces string "((2+(3*4))-5)"

Generating C code
Here is generation of C code:

(define (expression v)
 (format
#include 
int main()
{
 printf(\"%d\",~a);
 return 0;
}" v))
(define (add x y) (format "(~a+~a)" x y))
(define (sub x y) (format "(~a/~a)" x y))
(define (mul x y) (format "(~a*~a)" x y))

(define (generate-c expr)
 (eval expr))

For our expression this code produces following code that could be directly passed to C-compiler:

#include 
int main()
{
 printf("%d",((2+(3*4))/5));
 return 0;
}

Note: in non-toys problems it may be better to consider of two-stage transformation: at first stage we can generate AST of C language. At second stage this AST could be transformed to plain C code. By implementing and testing two stages separately we are modularizing our meta-system and make it more maintainable.

Targeting to the stack machine
Target machine codes or assembler syntax not yet specified, so let's transform our expression to abstract stack machine AST. For our expression we expect to get something like this:

(stack-machine-code
 (push-integer 2)
 (push-integer 3)
 (push-integer 4)
 (mul)
 (add)
 (push-integer 5)
 (sub)
 (print)
)

When stack machine specification is available we can create transformation from this AST to representation required by specification.
Here is a code:

(define (expression v) `(stack-machine-code ,@(walk v) (print)))
(define (walk v)
 (if (number? v)
    `((push-integer ,v))
    v))
(define (add x y) `(,@(walk x) ,@(walk y) (add)))
(define (sub x y) `(,@(walk x) ,@(walk y) (sub)))
(define (mul x y) `(,@(walk x) ,@(walk y) (mul)))

(define (to-stack-machine expr)
 (eval expr))

alternative implementation without using splicing (@) is here:

(define (expression v) (append '((stack-machine-code)) (walk v) '((print))))
(define (walk v)
 (if (number? v)
    `((push-integer ,v))
    v))
(define (add x y) (append (walk x) (walk y) '((add))))
(define (sub x y) (append (walk x) (walk y) '((sub))))
(define (mul x y) (append (walk x) (walk y) '((mul))))

(define (to-stack-machine expr)
 (eval expr))

Making it practical and... more simple
Above four transformations are defined in obvious way by mapping AST nodes to Scheme procedures. However this solution is practically unusable: if we use these transformations simultaneously we get into trouble of overlapping of different definitions of add/sub/mul/expression.
Possible solution is to make definition local:

(define (calculate-value expr) ;; INVALID CODE
 (define (expression v) v))
 (define (add x y) (+ x y))
 (define (sub x y) (- x y))
 (define (mul x y) (* x y))
 (eval expr))

Unfortunately (or fortunately?) this doesn't work: by Scheme conventions eval "doesn't see" local definitions. That's not big deal: it is allowed to directly specify environment as a second argument of eval and create isolated environments. But doing such low-level things every time is not a good idea. So I created simple macro define-ast-handler that is intended to do all this stuff. Final code is here:

(load "define-ast-handler.scm")

(define-ast-handler (calculate-value)
 (define (expression v) v)
 (define (add x y) (+ x y))
 (define (sub x y) (- x y))
 (define (mul x y) (* x y))
)

(define-ast-handler (to-infix)
 (define (expression v) v)
 (define (add x y) (format "(~a+~a)" x y))
 (define (sub x y) (format "(~a/~a)" x y))
 (define (mul x y) (format "(~a*~a)" x y))
)
;; ... generate-c  and to-stack-machine are defined similar way

This code declares transformation procedures named calculate-value, to-infix, etc. There is no more overlapping of names. As you can see there is no more "eval" there. So final version became more simpler in comparison with original.

Substituting of variable
Now Imagine that our requirements was extended. Now we are required to handle 'x' variable in expression notation and allow its substitution during evaluation.
All what we need is to add argument x to calculate-value procedure:

(define-ast-handler (calculate-value x)
 (define (expression v) v))
 (define (add a b) (+ a b))
 (define (sub a b) (- a b))
 (define (mul a b) (* a b))
)

Now we can use it by passing value of x in runtime:

(calculate-value 3 '(add 2 x)) ;; --> evaluates to 5

For purposes of transforming AST to printable infix notation we might not want to substitute variable. We just want to print x symbol. All what we need is to define it's printable value:

(define-ast-handler (to-infix)
 (define x "x")
 (define (expression v) v)
 (define (add a b) (format "(~a+~a)" a b))
 (define (sub a b) (format "(~a/~a)" a b))
 (define (mul a b) (format "(~a*~a)" a b))
)

Limitations
Not any AST of arbitrary DSL could be transformed easily this way. Here is discussion of some limitations:

We might be required to walk tree other way but walking order of syntax tree is locked by Scheme.

Syntax is not directly mappable to procedure application. In this situation we may consider to map nodes not to procedures but macros using local define-syntax definitions.

Not all nodes can be allowed to be included to each other. In this case define-syntax is able to help define rules.

We have no access to the parent nodes while evaluating child nodes. In this case closures might be helpful. Method is following: for child node we create closure and when parent node is parsed child's closure is evaluated with passing necessary info via parameters.

We use scheme-report-environment to evaluate user's AST. That's mean we haven't clean DSL. For our implementation of arithmetic expression DSL folowing expression is sill valid: (expression (plus 3 (if (eq? x 1) 2 3))). In some cases this can be feature, not bug ;). But in some cases we might need to restrict syntax by some reason. We can try overcome this issue by modifying define-ast-handler macro and using null-environment with some additional clean-up instead of scheme-report-environment. However this is still non-ideal solution: in this case bodies of node-handling procedures don't have access to scheme primitives.

Conclusion
Method of transforming abstract syntax tree has just been presented. Direct reusing of Scheme's infrastructure allows error diagnostic and debugging for free. Method has significant limitations but primary its advantage is simplicity.

References
1. Wikipedia article: http://en.wikipedia.org/wiki/Scheme_(programming_language)
2. For this article PLT Scheme (mzscheme) was used: http://www.plt-scheme.org/

Outside the Box