Wednesday, December 20, 2006

More on Syntax

My last post appeared on reddit--thanks dons! This induced some comments I'd like to respond to. First of all, there already is a macro system for Haskell, called (somewhat misleadingly) Template Haskell. It already provides the capabilities to generate arbitrary Haskell code. (More correctly, Haskell 98 code, since extensions like Generalized ADTs are not supported.) It also provides the given quasi-quotation mechanism I used in my last post's mock-ups: [| ... |]. However it has two problems. Firstly, macros are marked specially using the $(macro ...) syntax, which is not as seamless as it could be, although there might be good reasons to keep it, namely to make it easily recognizable, when macros are involved. Secondly, its quasi-quotation syntax is very limited, i.e., you cannot introduce new bindings and it's hard to modularize code--but I might be wrong with here since I might not have pushed it as far as possible. The problem is: when you cannot use the quasi-quotation syntax then you're left building up the quite complex Haskell parse tree yourself. Due to limited documentation and Haskell's syntax rules you usually write your macros by first getting the AST of some sample code, e.g. using:
-- | print a human-readable representation of a given AST
printAST :: ExpQ -> IO ()
printAST  ast = runQ ast >>= putStrLn . show

pp = printAST [| let x = $([|(4+)|]) in x 5 |]
which then (reformatted) looks like this:
$ pp
LetE [ValD (VarP x_0)
           (NormalB (InfixE (Just (LitE (IntegerL 4)))
                            (VarE GHC.Num.+) Nothing))
                     []]
     (AppE (VarE x_0) (LitE (IntegerL 5)))
Then you try to customize this for your purposes. Not pretty. My actual attempt was to take a type name as a parameter, inspect it, and then generate some boilerplate code. Well, I tried but gave up after being unable to construct some type. Maybe I didn't try hard enough. Anyways, macro-writing should be that hard! My proposed solution certainly is just a sketch of an idea, essentially pointing to prior art. I don't claim that this will in fact work nicely or even that it will work at all. I am pretty confident that it might, though, and I am planning to give it a shot later on. Maybe extending Template Haskell with features similar t o Scheme's syntax-case might be enough, for a start. And yet, I don't consider this a high-priority project, since a lot of uses for macros in Lisp can be solved differently in Haskell, as has also been mentioned in the comments to my previous post:
  • Controlling the order of evaluation is not necessary in Haskell since, due to lazyness. And if we have to control it somehow, we mostly use monads.
  • The whole category of (with-something (locally bound vars) ...) can be implemented almost as conveniently using withFoo \locally bound vars -> do ...
  • A lot of cases for special syntax can be achieved using clever operator and constructor naming. E.g., in wxHaskell: t <- timer f [interval := 20, on command := nextBalls vballs p], or, for an in-progress project of mine I simulate a convenient assembler syntax by allowing a notation like: res <-- a `imul` c. However, I was not able to use the <- notation, since I have different scoping rules than Haskell and I'm not in a monad.
  • Many cases of boilerplate code generation can be covered using generic programming, e.g. using Scrap Your Boilerplate.
So where would (more usable) macros still make sense?
  • Allow more flexible syntax for domain-specific embedded languages (DSELs), e.g. an XML library or a parser library might profit from this right now. (Yes, I think Parsec could be more readable). Also, DSLs like Happy would be even nicer if embedded directly into Haskell. Arrows and Monads were considered general enough concepts to introduce new syntax for them, but I think there's more out there that deserves it. I also think that an upcoming project of mine might hit the limits of what's currently possible in Haskell. Some people seem to agree.
  • Speaking of ParseC there's still one common use for Lisp-macros: optimizing at compile-time. You can get quite far by carefully designing your combinators for your DSELs. However, combining nice syntax and performance is very hard. In Lisp, the loop embedded language, for example, does quite heavy transformations on the given code. Partial evaluation is probably the more general solution here, but it seems to be not quite ready for primetime, yet.
  • The point, that a powerful enough system would essentially make syntactic sugar a library can be seen as a positive side effect, too. But I think this doesn't have much practical significance.
Bottom line: There certainly are less uselful applications of macros in Haskell than, e.g. in Lisp, but there are serious enough arguments to at least consider them.

Thursday, December 14, 2006

Syntax

Coming to Haskell from Common Lisp, I soon started to miss macros, and tried out Template Haskell .. and gave up. Haskell's syntax parse tree is way too complex to be usefully manipulated by Haskell functions. You can get used to Lisp syntax, but you always have to justify it to outsiders and there certainly lies a lot of value in resembling mathematical syntax. I certainly agree with dons' post on this issue. If only it weren't for the disadvantages! Sure, syntactic extension have to be done carefully, and powerful abstraction mechanisms in the language always have to come first. But having macros as an integral part of the language definition would result in desugaring being a library instead of a part of the compiler. This is a very powerful way of syntactic abstraction. I know that ML, being designed as a meta-language, has some ways of syntactic abstraction, though I haven't took a closer look at these, yet. Let me however outline a way of achieving macros almost as powerful as general Lisp macros, which works for languages with less simple syntax. This system is (partly) implemented in the Dylan programming language and is called D-Expressions. It is essentially an extension of the Lisp syntax. Lisp in its most primitive form it looks like this: Expr ::= Atom | ( Expr* ) This represents a simple tree. However, why should we restrict ourselves to represent nested structure with parens? We might as well use brackets or braces or keywords. So: Expr ::= Atom | ( Expr* ) | [ Expr* ] | { Expr* } | def Expr* end This way we still retain the unambiguous nesting structure, but have a more varied syntax. Reader macros in Common Lisp (and probably Scheme, too) do this and already perform some sort of desugaring, by translating { ... } into (some-macro ...). This however has one big problem: You can only have one macro for { ... }. Common Lisp "fixes" this by using some prefix to the parens, e.g. #c(3 4) denotes a complex number. This isn't exactly beautiful and doesn't scale well, though. In fact we'd rather like to have the chance to use this more variable syntax inside our macros and we'd need a way to make it scalable. Dylan solves this by allowing three kinds of macros:
  • Definition-style macros have the form: define modifiers* macro-name Expr* end
  • Function-style macros have the form: macro-name(Expr*)
  • Statement-style macros have the form: macro-name Expr* end
Macros are defined using pattern matching rewrite rules, e.g.:
define macro when
  { when ?cond:expr ?body:body end }
    =>  { if ?cond ?body else #f end }
end
Here ?cond:expr states that the pattern variable cond matches a form of the syntactic category of an expression. Similarly, for ?body:body. Multiple patterns are possible and extending this to Lisp-style abstract syntax tree transformations is possible, too, as shown in this paper on D-Expressions . Adding this feature to Haskell would probably require some small modifications to the syntax of Haskell, but I think we don't have to drop whitespace sensitivity. This way embedding DSLs in Haskell should be even more seemlessly and rants about the do-notation would not be needed. Here's how the implementation of the syntactic sugar to list expressions could look like:
macro do {
  [| do { ?e:expr } |] => [| ?expr |]
  [| do { ?e:pat <- ?e:expr; ??rest:* } |] 
    => [| ?expr >>= (\?pat -> do { ??rest ... }) |]
  [| do { let ?pat = ?expr; ??rest:* } |] 
    => [| let ?pat = ?expr in do { ??rest ... } |]
 ...
}
where ??rest:* matches anything up to the closing "}" (resulting the pattern variable rest to be bound to a sequence of tokens) and ??rest ... expands all tokens in rest. Sure, there are a lot of open issues to be solved--e.g. type specifications representing a seperate sub-language of Haskell, and how to actually achieve whitespace sensitivie macros--but I think it would be very useful extension. Comments welcome! :) Edit: The last example actually is just a mockup of some possible syntax and does not even make much sense in Dylan either. But you get the idea (I hope). I've been pointed to a similar idea, called Syntax Macros. Seems to be along the same lines.

Saturday, November 18, 2006

Specification-based Testing

I just had the seldom opportunity to hear a live talk by John Hughes given to a small group of Chalmers students which I happened to part of. He was re-giving his talk he held one week ago at the Erlang User Conference about specification-based testing of Ericsson software (written in Erlang). In the following I'll try to give a summary of what I consider the most essential parts of his very interesting talk. For further information see John's (et al.) paper or the LtU discussion. Users of QuickCheck know what nice advantages specification-based testing has: Instead of lots and lots of test cases one writes properties of functions, for example (in Erlang):
  prop_reverse() ->
    ?FORALL(Xs, list(int())),
    ?FORALL(Ys, list(int())),
      reverse(Xs++Ys) ==
        reverse(Xs) ++ reverse(Ys).
In fact this specification is wrong as a quick check (pun intended) shows us:
  Failed! After 12 tests.
  [-3,2]
  [3,0] 
  Shrinking....(10 times)
  [0][1]
QuickCheck provides two important features:
  • It provides the interface to a controlled generation of test cases (as opposed to simply random ones, which usually are of little help).
  • It generates and runs tests of the specified properties and--this might be new to users of the GHC package Test.QuickCheck--shrinks them to smaller test cases. This is important as it often hard to find the actual cause of a bug, especially if test cases are very large. (John mentioned that the Mozilla developers actually offer T-Shirts to people just reducing test cases as they found this to be an extremely time consuming task.
In our examples the error was in the specification and can be fixed by substituting Xs and Ys in the last line.

Advantages of Specification-based Testing

The most obvious advantage of using QuickCheck over usual test cases is that it allows us to dramatically reduce the number of test cases. In fact it also does a great job in finding corner cases. In their field study at Ericsson, while spending only 6 days to implement all the test properties (and a library for state-machine-based testing), they found 5 bugs in an already well-tested soon-to-release product, one of which would have been really unlikely to be triggered by any test case and revealed 9 bugs in an older version of the project, of which only one has been documented at this time. (see the paper). Neil Mitchell also recently posted about the merits of QuickCheck can be. John added one note, though. The bugs they found were very small (due to shrinking) and in fact would have never been caused in the real system, because the command sequences that lead to the failures would have never been triggered by the real controller. However, they tested against the documented (400+ pages) specification. This leads us to one other important advantage of specification-based testing. QuickCheck forces (or allows) us to actually state the specification as executable properties. therefore we might not only find errors in the program but, as seen in our example, also in the specification. This can be very useful and he gave a nice demonstration of buggy (or incomplete) specification for the Erlang functions register/2, unregister/1, and whereis/1 that provide a sort of name server for Erlang processes. To do this he generated a sequence of Erlang calls and checked if their results corresponded to the model of a state machine based on the specification. That is our current state consisted of a simple map, representing the (assumed) state of the name server. Using preconditions he controlled the allowable sequences of generated commands and checked their outcome using postconditions. This small experiment showed quite a couple of incompletenesses in the Erlang specification, e.g., it does state that register/2 will fail if you try to register the same process under different names, but it does not state that the process is not added to the list of registered processes (although sensible to assume--nevertheless, the specification is incomplete). (I think he had some more serious example, but I can't remember what exactly it was.)

Remarks

Having it used myself I am very convinced of the advantages of QuickCheck. In reply to my question about testability of non-deterministic faults, caused by the effects of concurrent execution of processes, John remarked, that you can get quite deterministic behavior (thus causing reproducible test results) by running the programs on a single CPU and rely on the deterministic scheduler. I am not so sure how far this reaches, but then again, you should use Erlang's behaviors as much as possible.

Friday, November 10, 2006

Being Lazy

Lazy? Me? No. Noo. Never! But here's a Reddit discussion (warning: long!) that--even though blown up by an obvious troll--has some nice statements about performance, usability, and composability implications of lazy evaluation. Quite interesting (if you filter out the noise).

Friday, October 06, 2006

Uno

Everything has a beginning and an ending. So, this is the beginning of my humble blog--let's hope it's not going to see its end soon. This blog will (presumably) be about programming languages, compilers, concurrency, maybe a bit about a human-computer-interaction, and about some general life-related stuff. With time I'll also try to adopt to some blog usability...