Memoization explained part 3: implementation


4 April 2012, by

previous article in series

Last time we looked at the memoization interface through some unit tests. Today, it’s time to start implementing the basic memoizer.

First of all, as we’ve seen in our tests, when nulls are encountered they should throw an ArgumentNullException. This would also appear in the GetMemoizer function’s XML documentation.
So the first line of every GetMemoizer overload will be pretty simple:

	if (func == null) {
		throw new ArgumentNullException("func");
	}

Now let’s start implementing the bulk of the method. We’ll start with the simple, parameter-less memoizer.

First, we need a place to put the result. Since the result type is not constrained to a class, we’ll need to use default(TResult) and not null.

However, this still doesn’t help us to ascertain whether we’ve calculated the result already. What if the function we’re memorizing returns the default value? In that case, we’d call it again and again even though we’ve already calculated the result.

TResult result = default(TResult);
bool calculated = false;

Now, the function we return will do something very simple: If we’ve calculated the value already we return it. Otherwise, we calculate it, save the result, and return it.

if (calculated) {
	return result;
} else {
	result = func();
	calculated = true;
	return result;
}

Now all we need to do is wrap this lovely method in something that would hold the values of result, calculated and func. We could create some object that we initialize with the correct values, something like this:

public static Func<TResult> GetMemoizer<TResult>(Func<TResult> func) {
	if (func == null) {
		throw new ArgumentNullException("func");
	}

	return new Memo<TResult>(func).Method;
}

public class Memo<TResult> {
	private Func<TResult> func;
	private TResult result;
	private bool calculated;

	public Memo(Func<TResult> func) {
		this.func = func;
		calculated = false;
	}

	public TResult Method() {
		if (calculated) {
			return result;
		} else {
			result = func();
			calculated = true;
			return result;
		}
	}
}

This passes all the tests – but it could be more readable. Happily for us, C# has anonymous methods, and with the lambda syntax (and automatic closures) it is much easier to follow the flow of this method:

public static Func<TResult> GetMemoizer<TResult>(Func<TResult> func) {
	if (func == null) {
		throw new ArgumentNullException("func");
	}

	TResult result = default(TResult); // compiler requires this line
	bool calculated = false;

	return () => {
		if (calculated) {
			return result;
		} else {
			result = func();
			calculated = true;
			return result;
		}
	};
}

Note: we need the line “result = default(TResult);” to stop the compiler complaining that result may not have a value assigned to it. Although a human can spot that this would never happen in practice, the compiler cannot*.

Now let’s move on to a memoizer for a parameterized function.

Basically, this would be the same thing; except this time, instead of having a single value per function, we’d have a value for any parameter value that has been calculated. This is normally referred to as a mapping, or in .net lingo, a dictionary. A nice feature of this mapping is that it also makes a distinction between null or default values and missing values, so we don’t need an extra mapping for the boolean (i.e. to parallel the calculated from the previous method).

This makes the function fairly straightforward – the difference between this one and the previous one are quite small.

public static Func<T, TResult> GetMemoizer<T, TResult>(Func<T, TResult> func) {
	if (func == null) {
		throw new ArgumentNullException("func");
	}

	Dictionary<T, TResult> results = new Dictionary<T, TResult>();

	return key => {
		TResult result;
		if (results.TryGetValue(key, out result)) {
			return result;
		} else {
			result = func(key);
			results.Add(key, result);
			return result;
		}
	};
}

The attentive reader would notice that the first GetMemoizer could be implemented in terms of the second one – by simply memorizing a function that ignores its parameter, and always checking the same parameter in the dictionary, e.g. always calling it with 0. This would definitely work – this seemingly duplicated code could be thought of as a mere optimization.

Much like we can implement the parameter-less GetMemoizer in terms of the single-parameter one, we can also implement multi-parameter memoizers the same. We’ll see this and more in our next post.

Reader exercise: Suppose we memoize a parameter-less function that may refer to a great deal of data via its target or closures. How would you optimize the memoized function’s memory footprint so that this data is not kept when it is no longer needed? Please put your thoughts in the comments, I’d be interested to hear them!

* Eric Lippert has discussed before why this is beyond the compiler’s capabilities with a different example; while it’s possible to detect this situation, “one of the goals of the standardization process is to make rules that are clear and unambiguous” and this feature would complicate the standard without much benefit.

Tags: , ,

Categories: Technical

«
»

Leave a Reply

* Mandatory fields


one × 1 =

Submit Comment