Memoization explained – part 1


2 September 2011, by

Memoization is a way to optimize your code by having complicated functions cache their return values and reuse them for subsequent calls using the same inputs.

I’ve built a small Memoization library and will explore its usage in a series of articles. These will cover:

  • The interface design
  • Unit Tests
  • Simple memoizers
  • Advanced memoizers
  • Tradeoffs of memoizing.

Today, I would like to introduce the interface design. There is only one public static overloaded method, named Memoizer.GetMemoizer(). The function to be memorized is passed to this method as a parameter.

The memoizer returns a reference to a new function wrapping a call to the original function (and with the same signature).  This new function checks if it has previously processed the original function for the supplied input parameters, returning the previous results if so. If not it calls the original function with these inputs and caches the results.

The GetMemoizer method is generic, and overloaded by a number of input parameters. Currently, there are up to three supported parameters for a function, but this can easily be increased.
The overloads are:

public static class Memoizer {
 public static Func<TResult> GetMemoizer<TResult>(Func<TResult> func);
 public static Func<T, TResult> GetMemoizer<T, TResult>(Func<T, TResult> func);
 public static Func<T1, T2, TResult> GetMemoizer<T1, T2, TResult>(Func<T1, T2, TResult> func);
 public static Func<T1, T2, T3, TResult> GetMemoizer<T1, T2, T3, TResult>(Func<T1, T2, T3, TResult> func);
}

Let’s start by looking at the first overload. The parameter is a function that accepts no arguments, and returns some TResult.

For example, say I need pi to a million decimal places. I’ll keep it as a string to keep things simple.

public static string GetPi() {
 // This is probably the fastest way to calculate pi to a million places, as long as we have an Internet connection
 byte[] data;
 using (System.Net.WebClient client = new System.Net.WebClient())
 // download the value from http://www.piday.org/
 data = client.DownloadData("http://www.piday.org/includes/pi_to_1million_digits_v2.html");
 // convert the data to characters
 char[] chars = System.Text.Encoding.UTF8.GetChars(data);
 // we skip all characters until the first '3' that marks the beginning of the number
 // then, we filter only digits and the decimal period.
 IEnumerable<char> filtered = chars
   .SkipWhile(c => c != '3')
   .Where(c => (c >= '0' && c <= '9') || c == '.');
 // make a string
 return new string(filtered.ToArray());
}

This is a simple method that returns the value for pi to a million places. This is, in fact, a terrible function, but it will serve its purpose. Lucky it’s not in production code!

When we run code that uses this function, we have a noticeable delay while the just-over-1MB file is downloaded from www.piday.org. Suppose we call GetPi multiple times? We would incur the overhead of downloading the file each time. That’s why we want to memoize the result, like so.

public static readonly Func<string> GetPi = Memoizer.GetMemoizer(GetPiInternal);
private static string GetPiInternal() {
 // The exact same code from our previous implementation of GetPi().
}

Now, the first time we call GetPi, it will download the file from www.piday.org. The second time, it will simply return the same precalculated string from the cache. The upside is that calling the function again will be quick. The downside – this 1000002 character string is kept in memory as long as the Func object returned from GetMemoizer(GetPiInternal) is; for the most common use case of assigning a member to this result this will be the lifetime of the containing object.

Next time, I’ll introduce my first test cases, preparing the road for the first implementation.

next article in series

Tags: ,

Categories: Technical

«
»

Leave a Reply

* Mandatory fields


four − = 2

Submit Comment