Tim Jansen's blog


2004/01/23
An Alternative Syntax for Multiple Return Values in C-based Languages
An Alternative Syntax for Multiple Return Values in C-based Languages Most functions do not need more than one return value, but when you need more there’s no easy way around it. Java has this problem, functions are limited to one return value, and in my experience it is often complicated to get along with this limitation. In most cases you either write more functions than necessary or you need to create a additional classes that need even more constructors, accessor methods and documentation. C++ and C# allow multiple return values using pointers and references. This solves the problem, but I think it feels wrong and especially the C++ syntax does not create very readable code. So I am going to show you an alternative, derived from Python’s tuples. But first the existing syntax variants and their disadvantages: Classic C syntax The classic C syntax for multiple return values is to use a pointer. Here is an example function that parses an integer in a string. It returns a boolean to show whether it was parsed successfully and the integer itself:
// implementation:
int parseInt(const char *str, bool *success) {
	const char *s = str;
	int r = 0;
	while (*s) {
		char c = *s;

		if ((c < '0') || (c > '9')) {
			*success = false;
			return 0;
		}
		r = r * 10 + (c - '0');
	}
	*success = true;
	return r;
}

// invocation:
bool s;
int v = parseInt("2004", &s);
Disadvantages:
  • Neither declaration nor invocation syntax indicate whether ’success’ is really a return value. It may also be just an optimization for an input value (admittedly unlikely in this example) or may be both input and output value. Only the documentation and the implementation can help
  • You can not find out whether null is allowed for ’success’ without looking at the documentation or the implementation
  • The compiler won’t catch a bug if ’success’ is not initialized before returning in some code paths, because it does not know the purpose of ’success’
Classic C syntax with null This is the same as above, but it allows a 0 for ’success’ in order to make it optional:
// implementation:
int parseInt(const char *str, bool *success = 0) {
	const char *s = str;
	int r = 0;
	while (*s) {
		char c = *s;
		if ((c < '0') || (c > '9')) {
			if (success)
				*success = false;
			return 0;
		}
		r = r * 10 + (c - '0');
	}
	if (success)
		*success = true;
	return r;
}

// invocation
int v = parseInt("2004");
Disadvantages:
  • You still need to look at documentation/implementation to find out what success is good for
  • The compiler will still not notice when success has not been set before returning, and the check whether ’success’ is null adds another potential error
  • Two additional lines-of-code were needed in the implementation to make success optional
C++ syntax with references
// implementation:
int parseInt(const char *str, bool &success) {
	const char *s = str;
	int r = 0;
	while (*s) {
		char c = *s;
		if ((c < '0') || (c > '9')) {
			success = false;
			return 0;
		}
		r = r * 10 + (c - '0');
	}
	success = true;
	return r;
}

// invocation:
bool s;
int v = parseInt("2004", s);
Advantages:
  • References do not have the ‘null’ issue or other pointer problems
Disadvantages:
  • The invocation does not have any hint that the second argument will be modified. This can make code very hard to read if you do not know the functions, because any function may modify any argument
  • You still do not know whether ’success’ is a input or an output value
  • Default values are not possible, you always need to have a bool even if you do not look at it
  • The compiler won’t notice the bug when ’success’ is not initialized in some code paths, because it does not know the purpose of ’success’
C# syntax This is the same function in C#. IMHO the C# syntax is vastly superior to the C++ alternatives:
// implementation:
int parseInt(String str, out bool success) {
	char s[] = str.ToCharArray();
	int r = 0;
	foreach (char c in s) {
		if ((c < '0') || (c > '9')) {
			success = false;
			return 0;
		}
		r = r * 10 + (c - '0');
	}
	success = true;
	return r;
}

// invocation:
bool s;
int v = parseInt("2004", out s);
Advantages:
  • It’s obvious in declaration and invocation that ’success’ is an output argument (in/out arguments use the keyword ‘ref’)
  • The compiler can check whether ’success’ has been set by the function before returning
  • There are no pointer issues
Disadvantages:
  • Default arguments are not possible (a C# limitation)
  • You always need to declare the bool before invoking the function
Using Python-style tuples An alternative to the C# syntax would be using Python-like tuples. Tuples are comma-separated values in parentheses that can be on the left and right side of an assignment statement. The syntax would look like this:
int x, y, z;
(x, y, z) = (1, 2, 3);

// The equivalent of the last line is:
x = 1;
y = 2;
z = 3;

// The source tuple can contain expressions as items:
(x, y) = (z-2, 5*5);

// the right side can have more items than the left (but not the other way round):
(x, y) = (1, 2, 3, 4, 5);

// the left side can be a regular value; then only the first item is taken:
x = (1, 2, 3);

// local variable declaration in a tuple
(int a, int b, int c) = (10, 20, 30);

// A tuple can combine several types, as long as the types of both sides match:
(x, bool f, double d) = (5, true, 3.14);

// Unlike other languages, the assignment is processed item-by-item:
(x, y) = (5, 10);
(x, y) = (y, x);
// now a and b are both 10! Swapping is not possible.

// When you embed the statement it returns the first item's value:
if ( (f, a) = (true, x) ) {
	// always executed
}
Note that tuples only exist as a helper construct for assignments. You can not use operators on them, they are not represented by an object, they can not be used like arrays etc.

Now that there are tuples it becomes easy to extend the function syntax to have several return values - just return a tuple:
// implementation:
(int, bool) parseInt(String str) {
	char s[] = str.ToCharArray();
	int r = 0;
	foreach (char c in s) {
		if ((c < '0') || (c > '9'))
			return (0, false);
		r = r * 10 + (c - '0');
	}
	return (r, true);
}

// invocation:
(int v, bool s) = parseInt("2004");
What I like most about that syntax is that it makes the code more compact. In this example it removed 3 lines-of-code. It is also a nice solution for optional return values.
If you don’t need the second return value, just write
int v = parseInt("2004");
You can name the return value and then use it like a C# reference argument. The C# function
void inc2(ref int number1, ref int number2) {
	number1++;
	number2++;
}
could be written as
(int number1, int number2) inc2(int number1, int number2) {
	number1++;
	number2++;
}
Note that input and output values have the same name and no return statement is needed, since the return values are named and already set. When you name output variables you can also combine initialized return values with the return statement. Here’s an alternative implementation for parseInt():
(int r = 0, bool success = true) parseInt(String str) {
	char s[] = str.ToCharArray();
	foreach (char c in s) {
		if ((c < '0') || (c > '9'))
			return (0, false);
		r = r * 10 + (c - '0');
	}
}
Another two LOC’s less. As tuples can have only a single item, it’s also possible to use the same syntax for a function with only one return value:
(int r) add(int a, int b) {
	r = a+b;
}
To summarize it, I think that tuples are a better solution for the multiple-return-value problem than argument references. They feel more natural because they bundle input and output values in declaration and invocation. The concept is closer to the Smalltalk concepts of messages, which makes it easier to create bindings to message-based protocols like SOAP. And last but not least it would help you to write shorter code.



2004/01/17
Categorizing Classes without Namespaces or Packages
Categorizing Classes without Namespaces or Packages This time it started with a a thread on kde-core-devel: I wrote about using classes for organizing functions and later wondered why I am using static class methods - that’s what C++ has namespaces for. I think the answer lies somewhere between the way I am using namespaces and the way tools like Doxygen organize the documentation. I am using namespaces in a Java-package-like way. They are a coarse categorization for classes, with 5-50 classes per namespace. That’s what I am used to from Java, and it makes a lot of sense for classes. It would be possible to collect functions in namespaces instead of classes, but then you would have to browse through both the classes and the namespaces in Doxygen-generated documentation. Having two kinds of categorization is just too much, it makes documentation too hard to read and code too hard to find. The C++ syntax does not really help you with organizing your code or reading other people’s code, especially if you are not using an IDE. It’s hard to find out in which file a symbol is declared, and even more difficult to find out where it is implemented. Functions and global variables are the worst, unlike classes they usually do not have their own header. Avoiding these problems requires quite a lot of discipline, you need to keep a consistent naming scheme for all files. This does not help when you work with someone else’s sources though. Another part of C++ that I don’t like is the separation of declarations/headers and implementation. I hate typing more than necessary, syncing headers with the implementation can be annoying and the stupid #ifdef protectors that you need to write in every header are just braindead. Two features in Java that work really well are packages and the source file layout. When you have a class My.Library.Helper you are required to write both implementation and declaration into a file My/Library/Helper.java. This makes it easy to locate the implementation. As everything in Java is in a class there are no problems with locating functions. They are static methods and can only be invoked by specifying the class name (e.g. Math.round()). This is annoying to type, but makes reading someone else’s source code much easier. Unfortunately Sun recommends to use a reverse DNS name as package name, and when you follow it you will have to hide your source code somewhere in a deep directory hierarchy. Unless you are writing a library I would suggest you to ignore Sun’s advice and use a short one-step package name. Java 1.5 shows that Sun recently got hit by a healthy dose of reality, and one of the results are ’static imports’. If you declare static import java.lang.Math; you do not need to write Math.cos() any longer, you can just write cos(). This makes it a little bit more difficult to find the method’s declaration (you need to look out for static imports), but I can understand Sun’s decision. The lack of static imports caused people to do strange things, many people inherited from classes that contained useful static methods only to avoid writing the class prefix for every invocation… it’s a nice compromise between the ease of use of C++’s uncategorized function, and Java’s “everything is a class method"‘ principle. To make it short, I would choose Java’s package system over C++’s header/namespace system any time. But when I thought about it I came to an interesting question: why does Java differentiate between packages and classes? A class with the full path My.Library.Helper could be a class Helper in the package My.Library, or a class Library.Helper in the package My. Inner/sub classes and packages use the same naming scheme, they are identical for the API’s user, so why should there be a difference in the declaration? Isn’t is possible to have only one? I think that with a few tweaks you can get rid of the package mechanism and make everything a class. There are three small problems to be solved: The first problem is how to define a class My.Library.Helper. The traditional Java syntax would be
class My {
	class Library {
		class Helper {
		}
	}
}
But that would be annoying to type (even if not annoying enough to stop the C++ guys to use a similar syntax for namespaces..). So just do the obvious and create an alternative syntax that lets the developer specify the full class name, like C# does:
class My.Library.Helper {
}
Problem one solved. The second issue is that Java allows only one “class Name {}” declaration to specify a class. You can’t do this when the class mechanism is also used for categorization purposes, as you don’t want to specify all sub-classes in a single file. Thus you need to allow more than one declaration for a class, and the compiler needs to merge them like C#’s ‘partial’ class attribute does. This requires that the source file naming scheme needs to be changed: a class with the name My.Library.Helper can be defined in all files that have either the name My/Library/Helper.java or My/Library/Helper/*.java. If a class is a regular class a developer will choose the former scheme. When the class is used like a package, containing only sub-classes, the latter file scheme is used. In some cases it may also make sense to have both. Both schemes allow developer and compiler to easily find all declarations of a class, and certainly not more difficult than with Java’s current package/inner-class mix. Problem two solved. Problem three are imports. So far Java allowed you to import either a single class (’import My.Library.Helper’) or to import a whole package (’import My.Library.*’). Exactly the same syntax can be kept for sub-classes. A simple name imports a single class, an appended ‘*’ imports all sub-classes. Problem three solved. To make it short, there should be no need for packages or namespaces. With a few small changes in the language the class is sufficient as a single way of organizing methods, variables and other classes.



2004/01/11
Combining the Advantages of Qt Signal/Slots and C# Delegates/Events
Combining the Advantages of Qt Signal/Slots and C# Delegates/Events My favorite Qt feature is the Signal/Slots mechanism. Before working with Qt I only knew the horrors of Java’s event handling by implementing interfaces, and libraries that worked only with simple functions but not with class methods. Qt was the first library that allowed to handle an event in a method of a specific object instance - which is what you actually want most of the time. Unfortunately Qt Signal/Slots are not really well integrated into the language. This is mainly due to the preprocessor mechanism that works only on the declarations. Thus the declaration syntax is fine, but the use of signals and slots in the definition has some problems and limitations:
  • It lacks compile-time checks. You can misspell signal or slot names, you can connect a signal to a slot even when their signature does not match and so on. This happened to me far too often and I only noticed the problem when the application was running. In most simple cases it is not so bad, when all signal/slots are connected at start-up you can see the errors on STDOUT (a bad solution IMHO, it should throw an exception or exit). But it gets worse as you write more complex code that connects signal/slots at a later point. I had this several times, for example my Desktop Sharing client rewires signal/slots on certain events. Then a wrong connect() becomes really ugly, because it may only trigger a bug in rare situations and the error is only visible when you watch STDOUT.
  • You can’t use slots as target for callbacks or invoke a slot by name. This was certainly not a design goal of Qt signal/slots, but it makes the mechanism less powerful than C#’s delegates and creates the need for a second mechanism
  • The connect syntax is unnecessary complicated because of the SIGNAL()/SLOT() macros. Not that bad, but the syntax could be easier if it would be integrated into the language
  • Another minor syntax problem is the slot keyword. It should not be required to declare a slot, it would be easier to be able to connect any method. It happened to me more than once that I needed to write a slot that did nothing but call a function.
Before I continue to show C#’s delegate feature, here is a piece of code that uses signal/slots in Qt. I will use it as a reference to show other syntaxes:
class Counter : public QObject {
	Q_OBJECT
private:
	int mValue;

public:
	Counter() : mValue(0) {}
	int get() const { return mValue; }
	void set(int v) { mValue = v; emit changed(v); }
	void inc() { emit changed(++mValue); }
	void dec() { emit changed(--mValue); }

public signals:
	void changed(int newValue);
};

class CounterTest : public QObject {
	Q_OBJECT
private:
	Counter mCounter;
public:
	CounterUser() {
		connect(&mCounter, SIGNAL(changed(int)),
			this, SLOT(counterChanged(int)));
	}

	void start() {
		mCounter.set(5);
		mCounter.dec();
		mCounter.inc();
	}

public slots:
	void counterChanged(int newValue) {
		qDebug("The Counter changed, new value is %d.", newValue);
	}
};

void main() {
	CounterUser cu;
	cu.start();
}
As you can see it creates a Counter class that uses a Qt signal to notify slots when the counter’s value has changed. CounterTest creates a Counter, connects a slot that notifies the user of changes and then modifies it a few times. Delegates/Events in C# Delegates are a C# mechanism for safe callbacks to object instance methods. Many people use the word delegate for two things, which confused me a lot at the beginning. It becomes easier to understand when you differentiate between these two: a delegate type describes the signature of a class method. The delegate type is not restricted to a class, only the signature of the method matters. A delegate instance referers to a specific method of a specific object instance. Every delegate instance has a delegate type. The declaration of a delegate type looks like this:
delegate int IntModifierCallback(int newValue);
This line declares a delegate type called ‘IntModifierCallback’ that gets an integer as argument and returns an int. The declaration looks as if delegates would be another native C# type, comparable to C++ function pointers, but actually delegate type declarations are just syntactic sugar for a regular class declaration. This is a simplified version of the generated code:
sealed class IntModifierCallback : System.Delegate {
	public IntModifierCallback(object o, unsigned int functionPtr) { /* some code here */ }

	public virtual int Invoke(int newValue) { /* some code here */ }
}
The constructor takes the object instance as the first argument, and a function pointer as a second. ‘Invoke’ always matches the delegate type’s signature and must be called to invoke the delegate instance. Creating a delegate instance works like creating any other object instance, using the new keyword. It can then be called like a regular function:
delegate int IntModifierCallback(int newValue);

class SomeClass {
	int square(int v) { return v*v; }
}

class Test {
	void Main() {
		SomeClass sc = new SomeClass();
		IntModifierCallback cb = new IntModifierCallback(sc, SomeClass.square);

		Console.WriteLine("4*4 is " + cb(4));
	}
}

The C# equivalent of a Qt signal is a ‘event’. To use events you need to declare a multicast delegate type. It has exactly the same syntax as a regular delegate, but the return type of the function signature must be void. The generated class will then derive from System.MulticastDelegate instead of System.Delegate. Multicast delegates have an additional feature, you can add and remove delegates instances to/from a multicast delegate. This allows you to call several delegates with a single invocation, just like Qt allows you to connect several slots to a signal and then invoke them all by emitting the signal once:
// Multicast delegate: the return type is void
delegate void PrintANumber(int);

class SomeClass {
	void printDecimal(int v) { Console.WriteLine("Decimal: {0}", v); }
	void printHex(int v) { Console.WriteLine("Hexadecimal: {0:x}", v); }
}

// ...
class Test {
	void Main() {
		SomeClass sc = new SomeClass();

		PrintANumber cb = null;
		cb += new PrintANumber(sc, SomeClass.printDecimal);
		cb += new PrintANumber(sc, SomeClass.printHex);
		cb(10);
	}
}
A delegate instance is added using the ‘+=’ operator and removed using ‘-=’. Note that ‘cb’ is set to null and the first ‘+=’ will create an instance for ‘cb’. The last element to get the Signal/Slot-like mechanism is the ‘event’ keyword. You could also expose a multicast delegate instance as a property and add your the event listeners using ‘+=’, but then you could overwrite the delegate instance with ‘=’ and delete the delegate instance list. The ‘event’ keyword works almost like a property, but has two important differences: other classes can only use ‘+=’ and ‘-=’ operators, but can not invoke or replace the delegate. And you do not need to write accessor methods for the property, they are created automatically (you can write them yourself if you want though). So here is the Qt example rewritten to C#:
delegate void CounterEvent(int newValue);

class Counter {
	private int mValue;

	public event CounterEvent changed;

	public Counter() : mValue(0) {}
	public int get() { return mValue; }
	public void set(int v) { mValue = v; changed(v); }
	public void inc() { changed(++mValue); }
	public void dec() { changed(--mValue); }
};

class CounterTest {
	private Counter mCounter;

	public CounterUser() {
		mCounter.changed += new CounterEvent(this, CounterTest.counterChanged);
	}

	public void start() {
		mCounter.set(5);
		mCounter.dec();
		mCounter.inc();
	}

	public void counterChanged(int newValue) {
		Console.WriteLine("The Counter changed, new value is {0}.", newValue);
	}

	static void Main() {
		start();
	}
};
Improving the C# syntax C# delegates are more powerful than Qt Signal/Slots, but I think that they can be improved:
  • All delegate types should be anonymous, and there should be only one type for each signature. Thus all delegates with identical signature always have the same type. This is a prerequisite for the following points.
  • Creating a delegate instance in C# is relatively complicated and requires the existance of function pointers. An easier solution is to get a delegate instance by accessing the method like a field/property. Thus if you have a reference to a “CounterUser” object called ‘cu’, you can get a delegate to its counterChanged() method using “cu.counterChanged”.
  • Instead of declaring delegate types you can define a delegate ‘prototype’ that defines the function signature. The prototype is just a moniker for the anynymous type, not a real type declaration, but more like a typedef. Thus several prototypes with the same function signature but different names are still compatible. The name of the prototype should not be exported or displayed as part of the API. It’s valid for the whole source file when defined at the top or valid for the class if defined in the class body. The syntax is like the delegate type definition, but with a ‘prototype’ keyword instead of ‘delegate’.
  • Events can, alternatively, be defined with the ‘delegate’ syntax and thus without declaring a prototype. This is the most common use of delegates and with this feature you rarely need to define a prototype.
  • The ‘delegate’ keyword is not needed anymore.
With these changes the Qt example looks like this:
class Counter {
	private int mValue;

	public event void changed(int newValue);

	public Counter() : mValue(0) {}
	public int get() { return mValue; }
	public void set(int v) { mValue = v; changed(v); }
	public void inc() { changed(++mValue); }
	public void dec() { changed(--mValue); }
};

class CounterTest {
	private Counter mCounter;

	public CounterUser() {
		mCounter.changed += counterChanged;
	}

	public void start() {
		mCounter.set(5);
		mCounter.dec();
		mCounter.inc();
	}

	public void counterChanged(int newValue) {
		Console.WriteLine("The Counter changed, new value is {0}.", newValue);
	}

	static void Main() {
		start();
	}
};
Note that there is no delegate or prototype declaration needed anymore. The delegate type does not need a name. Here is the first C# example modified to use the new syntax:
prototype int IntModifierCallback(int newValue);

class SomeClass {
	int square(int v) { return v*v; }
}

class Test {
	void Main() {
		SomeClass sc = new SomeClass();
		IntModifierCallback cb = sc.square;
		
		Console.WriteLine("4*4 is " + cb(4));
	}
}

I think that the new syntax has several advantages:
  • It frees the developer from naming delegate types. Finding good delegate type names turned out to be quite difficult, and people started to use schemes that just describe the prototype (like IntIntToBool), which is just plain stupid. Protoype names do not matter because they won’t be exposed as API and are not needed for events.
  • The keyword ‘delegate’ was confusing, because it suggested that you create a delegate (=instance), but you created a delegate type. It is like using an ‘object’ keyword instead of ‘class’. ‘prototype’ is clearer.
  • It works fine in languages like Java that do not have function pointers (which are needed in the C# syntax for the second argument of the delegate constructor).
  • The syntax for the delegate instance creation is friendlier.
  • It cannot happen that the same function signature has two incompatible delegate types.



 

This blog is my dumping ground for thoughts and ideas about Eek. Someday Eek will be a programming language and system, somewhat comparable to Java in scope. It is my attempt to bring sanity to the world of computing.
At least I hope so. Right now it is far from being finished and I can't guarantee that it ever will be. I am still working on the specification, but I won't release anything before I got my first prototype running. The world does not need more vapourware and unusable beta-software. All publicly available information about Eek is contained in this blog. You can find the latest summary here.
This page is powered by Blogger. Isn't yours? Creative Commons License
This work is licensed under a Creative Commons License.