Tim Jansen's blog


2004/05/30
Nullable Types in C#
I just read about nullable types in C# 2.0 and was quite surprised how similar it is to Eek's syntax. In C# you write
int? a = 1;
and in Eek you write
int a? = 1
to define an nullable variable 'a' with a default value of 1. Eek's implementation will be different though. C# differentiates between value types (like int) and reference types, and in C# 1.0 only reference types could be null. That's why they added the extra feature for value types (which is actually just a short notation for a wrapper class). In Eek everything is an object, there are no value types, all variables are references, and references are not nullable unless the question mark '?' modifier is used. An interesting feature in C#'s syntax is the '??' operator. The statement
int x = a ?? 1;
takes 'a' if 'a' is not null, and 1 otherwise. This is an nice short cut, and I wonder whether I should provide something like this. Right now Eek's specs contain only two ways of eliminating nulls. Either the 'if/then/else' operator
int x = if a then a else 1
that allows default values, or the 'any' conversion
int x = any(a)
, which will fail at runtime if 'a' is null. I am a little bit concerned about the number of Eek's operators. I want to keep the number as low as possible, to avoid Perl's line noise effects that occur when people are exposed to operators that they have never seen before. Right now Eek has all operators that Java has except the post- and pre- increment and decrement operators ('++' and '--'). The ternary operator 'x ? y : z' will be replaced by 'if x then y else z', because '?' and ':' are used for too many other things and this syntax allows several 'if's with a single 'else'. Additionally Eek has '..' and '.@' for accessing node descandants and attributes in XML trees, and the filter operator '.()'. These three are taken from E4X. And Eek has several extra literals, especially '{someblock}' for closures and 'prefix:name' for XML QNames could have a line-noise effect. So I would really hate to add another operator. On the other hand, I expect null-elemination to be needed frequently. 9 hours later... I think I found a solution: I use the '||' operator. This isn't too far from the original behaviour of '||', because every nullable reference in Eek is implictly convertible to bool anyway. If right-hand operand is a non-nullable reference other than bool, the left hand operator will be returned if it is not null and otherwise the right-hand operator. Without the modification it would be forbidden to use any non-nullable right hand operand except bool (and with bool it does not make any difference). Both operands must be compatible with the expected value for the expression, if there is one. With the modified '||' operator the example can be written as
int x = a || 1
. Two days later... Ok, bad idea. The precedence of '||' is too low to be usable without parentheses.



2004/05/22
Bootstrapping: Compiler and Runtime Engine
Bootstrapping: Compiler and Runtime Engine I will probably be busy with the language specification for another two months, but I am now pretty sure about the architecture of compiler and runtime and have plans for the bootstrapping phase. The transformation of a textual Eek program to binary code will have three steps, unlike Java and C# which have two. In step one the textual source code is parsed and converted into a XML format. This conversion does not lose any semantics and with the help of optional annotations the formatting can be kept as well. The relationship between the two should be similar to the one between XQuery and XQueryX. The intermediate format is intended as an easy way to extract information from the source (e.g. for API documentation) and to allow modifications (like refactoring). In the far future an IDE may also use it as primary storage format instead of text. Step two is the transformation from the lossless XML format to a language independent format that is intended for distribution, similar to Java's bytecode classes. Unlike Java it will not be a new binary format, but also XML. And it will not be instruction-oriented, but keep the program in a tree structure that's easier to work on. It may be somewhat comparable to gcc's RTL trees, just stack-based and stored on disk. Eek knows assemblies like C# and there should be one file for each assembly. Step three is the runtime engine that transforms the lossless XML format to executable code. Eventually there should be a compiler, using something like libjit, that converts the assembly's trees inside the XML file into binary code. I don't think that it will be a real JIT, rather a caching compiler that's invoked on demand. For the bootstrap process I will write the compilers/converters for the first two step in Java. I already started writing Eek's XML library for Java, to allow easy porting to Eek later. It's not very far yet though. As intermediate solution for step three I will write an interpreter in Java, using the XML library, instead of a real compiler. When all three parts are working (and I can finally write my first Eek programs), I will port the two parts to Eek. And then, finally, write runtime engine/compiler that makes assemblies executable. If everything works out as I hope, that may be in a year or so :)



2004/05/16
More Generics
More Generics The generics autocasting idea in yesterday's entry has a disadvantage that I do not want to leave unmentioned. In the following example code
List<String> list1 = ArrayList()
Object list2 = ArrayList()
the constructor invocation 'ArrayList()' would have two different meanings. The first one creates an 'ArrayList<String>', the second one creates a 'ArrayList<Object>'. I don't think that this kind of ambiguity is acceptable. Therefore I need to forbid generics constructor invocations without type specification if the target type does not imply the types. Not too bad, the Object type is not needed that often anyway. Java allows it only for source-backward-compatibility reasons. Beside the autocasting for constructors, my current concept for generics in Eek looks like this:
  • The generic object is aware of the types it has been created with. Or more precisely, there is a Class instance for every instantiated class/type combination.
  • The syntax for declaring a Map class as generic would be
    class MyMap<Object K, Object V?>
    end
    
    Generic types use the same declaration syntax as variables. Thus they can be nullable (but do not have to) and the 'any' type is allowed. The syntax is simpler that Java's, but also less powerful. I am aware that re-using a declaration syntax (the one for variables) for a completely different purpose (giving a reference type a name) is not ideal, but it's better than having two different syntaxes. Note that 'Object' is used in this example, but any other class or interface would be possible.
  • The generic type aliases, 'K' and 'V' in the example, can be used almost like class names. Because 'K' is already nullable, it is allowed to create nullable 'K' references with the nullable modifier '?'. 'V' is already nullable, so it is forbidden to make 'V' instances nullable. Let's add some dummy methods to 'MyMap':
    abstract class MyMap<Object K, Object V?>
            abstract V get(K key)
    
            abstract K? findKey(V value)
    end
    
  • Instantiation is done like in Java, just with possibility to make references nullable if the generic type is already nullable. Otherwise it must not be nullable. Here are a few legal instantiation examples (assuming the class is not abstract):
    Object o1 = MyMap<Object,Object?>()  // as generic as possible
    Object o2 = MyMap<Object,Object>()   // makes values non-nullable
    Object o3 = MyMap<Object,any>()      // Object? is compatible with any
    Object o4 = MyMap<String,Object>()   // maps strings on objects
    Object o5 = MyMap<String,String?>()  // maps strings on nullable strings
    MyMap<Object,Object?> map1 = MyMap() // generic types implied by target
    MyMap<String,String?> map2 = MyMap<String,String>()  // legal cast, generic type compatible
    MyMap<Object,Object> map3 = MyMap<String,String>()   // legal cast, generic types compatible
    MyMap<Object,Object?> map4()         // compact constructor syntax is allowed
    MyMap<String,any> map5()             // compact constructor syntax is allowed
    
  • The following instantiations would be illegal:
    Object o1 = MyMap<Object?,Object>()  // Error: K must not be nullable
    Object o2 = MyMap<any,Object>()      // Error: K must not be any(which is nullable)
    MyMap map1 = MyMap<Object, Object?>()                // Error: variable map1 does not specify generic types
    MyMap<String,String> map1 = MyMap<Object, Object>()  // Error: generic types incompatible
    MyMap<Object,Object> map2 = MyMap<Object,Object?>()  // Error: generic type incompatible (nullable vs non-nullable)
    
  • Extending a generic class is possible. When extending the generic types may either be specified or may be kept generic:
    class StringMap extends MyMap<String, String>
    end
    
    class ExtendedMap<Object K, Object V> extends MyMap<K, V>
    end
    
    class StringKeyMap<Object T> extends MyMap<String, T>
    
    It is not allowed to use the generic aliases of the super class(es) in the sub-class declaration or implementation.



2004/05/15
Generics vs. Autocasting in Eek
Generics vs. Autocasting in Eek Sam Pullara has written a blog entry in which he suggests to use autocasts instead of Java 1.5's generics. It caught my attention because I have a similar decision to make for Eek. I haven't started to specify generics and am not sure about the details yet, so just emitting that feature is an attractive option. What he proposes is to get rid of casts when the value is assigned to a typed local variable (note that this is different from Eek's 'any' type, because 'any' messages are dispatched at runtime and not at compile-time). If Eek would use Java's generics syntax, his Java example would look like this:
Map<String, String> map = HashMap<String, String>()
map.put("key", "value")
String value = map.get("key")
This is the example without generics and autocast ('any()' is Eek's way of upcasting):
Map map = HashMap()
map.put("key", "value")
String value = any(map.get("key"))
And this example shows the syntax with autocasts:
Map map = HashMap()
map.put("key", "value")
String value = map.get("key")
The first two examples both have a redundancy: with generics the type of the Map needs to be specified twice. Without generics and autocasts, the values from the Map need to be up-casted. The second and third example lack compile-time safety for the Map and the code does not specify the allowed types for the Map. I probably hate unspecified types in collections even more than the casts in pre-1.5 Java. If you read code written by others it can be quite difficult to find out the allowed types of a collection. This alone is enough reason for me not to give up generics in favour of auto-casting. I think I have found a third way that is a limited kind of autocasting and makes the syntax more compact, but at the same time keeps the type-safety of generics: just do not require the constructor to specify types, but take the types that are expected in the surrounding expression. With that rule the generics example looks like this:
Map<String, String> map = HashMap()
map.put("key", "value")
String value = map.get("key")
It's shorter, type-safe and documents the allowed types. I think I will take it.



2004/05/10
Home
Home This is the beginning of my English blog on tjansen.de. It will replace my blog on kdedevelopers.org, because most of the things that I am working on today are not KDE-related anymore. The main purpose is to document the development of Eek, the programming environment that I am working on. I have reposted a short outline of the programming language(s) in the last entry, and I republished all the kdedeveloper blog entries that eventually made me begin this and convinced me to start another programming language. They show the way that I went from the early beginning, when I thought about adding non-GUI-specific and server-specific functionality to KDE. One of my major ideas was to base all I/O on XML-like hierarchical data structures (actually not a new idea, I played with that thought 5 years ago, but without XML). That made me realize how bad today's programming languages are for manipulating XML, which caused me to think about programming languages in general (again not really new, I planned a new programming language in 98/99 as well). Eventually the ideas came together and so I started writing Eek's specification in March. I did begin this project, but that does not mean that I will eventually finish it. It would not be the first time that I abandon a project at some point, for various reasons, and I am aware that the chances of completing this one are very low. Like almost everybody else I think that the state of IT is pretty bad, and this is my attempt at fixing it. So when I fail, at least I can claim that I have tried it...



 

This blog is my dumping ground for thoughts and ideas about Eek. Someday Eek will be a programming language and system, somewhat comparable to Java in scope. It is my attempt to bring sanity to the world of computing.
At least I hope so. Right now it is far from being finished and I can't guarantee that it ever will be. I am still working on the specification, but I won't release anything before I got my first prototype running. The world does not need more vapourware and unusable beta-software. All publicly available information about Eek is contained in this blog. You can find the latest summary here.
This page is powered by Blogger. Isn't yours? Creative Commons License
This work is licensed under a Creative Commons License.