Don’t be too lazy – LINQ’s lazy evaluation gotchas


13 August 2012, by

previous article in series

Having discussed how to take advantage of lazy evaluation in LINQ, it only seems right to discuss some of the surprises this might cause you. I’ve picked out a few examples from my own experience of where understanding what’s going on behind the scenes is important to avoid falling into a trap.

Accidental evaluation

The laziness of LINQ is generally a good thing. But it’s quite easy to fall into traps. The simplest example is the following – a one-word tweak on the example in my previous post:

public Beer GetSomeNiceBeer()
{
  return GetBeerList().Where(BeerIsNice).ToList().First();
}

Console.Out.WriteLine("The first nice beer: " + GetSomeNiceBeer().Name);

 

Output
Old Tom's
Young Dan's
Now Beer
The first nice beer: Young Dan's

Observe the output: All three beers have now been tested. The difference in the source code is the ToList() – we’ve taken the lazy IEnumerable<Beer> returned by the Where, and forced it to be a List; but a List is an actual list of items and so we need to know all the items in the list. So although First will stop once it’s found a nice beer, it’s too late for the Where which has already been fully evaluated.

Depending on circumstances, this won’t necessarily be a bad thing. After all, you might want to check all that beer for some reason (although if your code is free from side-effects, it shouldn’t be necessary). But it’s sometimes important to know whether you’re forcing something to evaluate itself or not:

  • Maybe your Where clause is very processor-intensive – you gain some efficiency savings by avoiding executing the condition too often.
  • If you’re using LINQ-to-SQL (or something similar), you might find yourself downloading the entire database into memory by forcing evaluation too early. Leave it to the last possible minute, so the actual database query is as specific as possible. (Yes, LINQ-to-SQL is clever enough that it will turn your First into a SELECT TOP 1… in the SQL it executes on the database server).

Understand when LINQ is being lazy and when it’s not, and you’ll probably be fine.

Multiple evaluation

On the flip side, you may want to ensure that you force your expression to evaluate. In particular, some IEnumerables can only be enumerated once. Consider the following slightly contrived example:

private static IEnumerable<char> Keystrokes()
{
  while (true)
  {
    yield return Console.ReadKey(true).KeyChar;
  }
}

var firstFourKeystrokes = Keystrokes().Take(4);

if (firstFourKeystrokes.SequenceEqual(new[] { 'b', 'e', 'e', 'r' }))
{
  firstFourKeystrokes.ToList().ForEach(Console.Write);
}

If you type in “beer”, you probably want to see “beer” printed out to the screen. But you won’t – you will in fact see nothing, because the second use of firstFourKeystrokes is waiting for another four keystrokes to appear.

The solution is simple – just change the definition of firstFourKeystrokes to:

var firstFourKeystrokes = Keystrokes().Take(4).ToList();

Now that you’ve forced evaluation, your list of keystrokes will be secure.

My favourite Visual Studio add-in, ReSharper, will conveniently warn you about this potential problem:

Multiple Enumeration in ReSharper

Pitfalls while debugging

There’s an even more insidious case you should be aware of, which even ReSharper won’t catch. Suppose you put a breakpoint on the if-test in the above example, and examine the contents of firstFourKeystrokes in the debugger. Doing this will force evaluation, leaving you with potentially unexpected behaviour! In this particular example it’s quite obvious what’s happening because you’ll have to press four more keys than you expected, but it’s easy to be caught out by this.

My other favourite debugging headache is simpler, and is a much more direct result of lazy evaluation. Put a breakpoint on the Console.ReadKey line in the above example – where will it get hit? It’s fairly obvious on reflection that this will first happen when you hit the SequenceEqual line. However in the heat of a debugging session it’s easy to forget this, and expect it to happen when the Keystrokes method is called – there’s nothing like a piece of code that doesn’t get invoked when you expect it to throw you off your debugging stride and yield a result that’s “impossible”.

Modified closures

This one’s probably the most subtle. Consider the following example:

public List<Beer> GetNonMatchingBeer(List<string> namesToExclude)
{
  IEnumerable<Beer> beers = GetBeerList();

  foreach (string name in namesToExclude)
  {
    beers = beers.Where(beer => beer.Name != name);
  }

  return beers.ToList();
}

We’re building up a big list of where-clauses – GetBeerList().Where(…).Where(…).etc. It’s all lazily evaluated, so we don’t actually do any filtering by name until the ToList() at the end. So what will the following code return?

GetNonMatchingBeer(new List<string>() {"Old Tom's", "Young Dan's"});

We start with three beers, two of which are in the exclusion list. So what you want it to return is the third beer in the list.

But here’s what it actually returns:

Output
Old Tom's
Now Beer

Why?  It’s all because of lazy evaluation.

On the last line of GetNonMatchingBeer, you have an IEnumerable<Beer> called beers that hasn’t yet been evaluated. Then you call ToList() on it, which evaluates it. Now, and only now, does that beer => beer.Name != name function get invoked. The key to the problem is understanding what name represents at the time that you evaluate the function. The first Where method was called at a time when name == “Old Tom’s”.  However, by the time it’s executed, name == “Young Dan’s” because you’ve proceeded around the for-loop one more time. So effectively, you’re applying the same where-condition twice, rather than applying two different ones.

There’s a simple fix for this:

public List<Beer> GetNonMatchingBeer(List<string> namesToExclude)
{
  IEnumerable<Beer> beers = GetBeerList();

  foreach (string name in namesToExclude)
  {
    string temporaryName = name;
    beers = beers.Where(beer => beer.Name != temporaryName);
  }

  return lBeer.ToList();
}

The change is to declare a temporary variable inside the for-loop. The scope of this variable is the inside of the for-loop – every time round the loop, you get a brand new temporaryName with no relationship at all to the other times round the loop. Hence each Where function uses a different variable, and you get the expected output:

Output
Now Beer

ReSharper again steps to the fore here, and will warn you about this mistake and automatically correct it for you:

Modified Closure in ReSharper

Lesson:  When you see squiggles in Visual Studio, pay attention to them.

There’s an interesting article on closing over the loop variable by Eric Lippert which examines this topic in more detail. Most interestingly, it seems that the plan for C# 5 is to change the behaviour of foreach-loops so that this problem does not occur – a brave decision, since this will break any existing code that wanted this behaviour. Not that I can think of a good example of where this might be the case!

A ReSharper aside

As an aside, you might notice that there’s a second ReSharper squiggle in the above. It’s telling you that you can replace the foreach-loop with a LINQ expression.  If you let it do its stuff, the result looks like this:

beers = namesToExclude.Aggregate(beers, (beersSoFar, name) => beersSoFar.Where(beer => beer.Name != name))

Not bad (I only renamed one variable in the result to make it more readable) – a good demonstration of another LINQ method (Aggregate), but personally I find the for-loop clearer in this admittedly rather contrived case.

next article in series

Tags: , ,

Categories: Technical

«
»

Leave a Reply

* Mandatory fields


− 1 = two

Submit Comment