Friday, May 20, 2011

Lambda expressions, anonymous classes and memory leaks.

[Intermediate level]
[This blog post was originally published on an internal company blog on February 20th 2011]


Hi All,



Couple days back I was writing code as usual and I decided to use a lambda expression as a function. The code I wrote looked something like this:

 public void Moo()
    {
      int x = 10;
      SomeEvent += (i) => { Console.Out.WriteLine(x); };
    }
Then, I stopped for a second. What is going on here? I just used a local variable in a method that is going to be called long after the stack frame where this local variable was defined is gone. This doesn't make any sense. Clearly, once the program counter (instruction pointer) leaves the method Moo, the variable x defined on the stack will be freed. I ran the code and everything worked perfectly, 10 was printed on the screen once SomeEvent was fired. Strange!
I decided to ask a friend... "What is going on here?" I asked. "Maybe it's like anonymous class" he said. This actually made some sense to I decided to dig deeper using my trusty (but no longer free) friend: Reflector.

Before I begin copy-pasting IL code, a word about anonymous classes. In .NET 3.5 Microsoft introduced LINQ which allows easy queries on various types of collections. But they noticed a problem. You can have many different types of queries on the same collection and every time a result record would look different. For example if you have a DB table with the colums ID, Name and Title. Sometimes you want to retrieve only the name, sometimes only the title, sometimes you want to count the number of rows for each name (so the result is (Count, Name)). Previously the user would have to define a new class or struct for each return type but anonymous classes solve this by using the "var" keyword.
(Yes, this is the correct place to use "var" and not when you are too lazy to write the actual variable type!)
I will not give a concrete example on how to use LINQ but for example if I want to define a new class that has two fields, Count and Name I can do it like:

var p = new { Name = "Boris"Count = 10 };

Now I can access p.Count or p.Name as usual; Anyway... I wrote a small class to see what IL created. The content of the file is:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication3
{
  public delegate void HowItWorksHandler(int number);
  class Class1
  {
    public event HowItWorksHandler SomeEvent;
    public Class1()
    {
    }
    public void Moo()
    {
      int x = 10;
      SomeEvent += (i) => { Console.Out.WriteLine(x); };
    }
    public void Honey()
    {
      var p = new { Name = "Boris", Count = 10 };
    }
    public void Sheep()
    {
      SomeEvent += (i) => { Console.Out.WriteLine(i + 1); };
    }
    public void Memory()
    {
      MemoryHolder mh = new MemoryHolder();
      SomeEvent += (x) => { mh.BigMemory = 10;};
    }
    void Class1_How(int number)
    {
      throw new NotImplementedException();
    } 
  }
  public class MemoryHolder
  {
    public int BigMemory;
    public MemoryHolder()
    {
      Console.Out.WriteLine("Memory holder created");
    }
    ~MemoryHolder()
    {
      Console.Out.WriteLine("Memory holder free");
    }
  }
}
Now lets start from the end... For the method Honey where I used an anonymous class a new class was created as expected. It was located in the dll but with no namespace and had quite a full definition:
[CompilerGenerated, DebuggerDisplay(@"\{ Name = {Name}, Count = {Count} }", Type="<Anonymous Type>")]
internal sealed class <>f__AnonymousType0<<Name>j__TPar, <Count>j__TPar>
{
    // Fields
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <Count>j__TPar <Count>i__Field;
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <Name>j__TPar <Name>i__Field;

    // Methods
    [DebuggerHidden]
    public <>f__AnonymousType0(<Name>j__TPar Name, <Count>j__TPar Count);
    [DebuggerHidden]
    public override bool Equals(object value);
    [DebuggerHidden]
    public override int GetHashCode();
    [DebuggerHidden]
    public override string ToString();

    // Properties
    public <Count>j__TPar Count { get; }
    public <Name>j__TPar Name { get; }
}

Nothing of much interest here but note that the compiler overrides the default methods to some less generic implementation. But, the really interesting method is the method Moo. Let's look at the IL there:

.method public hidebysig instance void Moo() cil managed
{
    .maxstack 4
    .locals init (
        [0] class ConsoleApplication3.Class1/<>c__DisplayClass1 CS$<>8__locals2)
    L_0000: newobj instance void ConsoleApplication3.Class1/<>c__DisplayClass1::.ctor()
    L_0005: stloc.0 
    L_0006: nop 
    L_0007: ldloc.0 
    L_0008: ldc.i4.s 10
    L_000a: stfld int32 ConsoleApplication3.Class1/<>c__DisplayClass1::x
    L_000f: ldarg.0 
    L_0010: ldloc.0 
    L_0011: ldftn instance void ConsoleApplication3.Class1/<>c__DisplayClass1::<Moo>b__0(int32)
    L_0017: newobj instance void ConsoleApplication3.HowItWorksHandler::.ctor(object, native int)
    L_001c: call instance void ConsoleApplication3.Class1::add_SomeEvent(class ConsoleApplication3.HowItWorksHandler)
    L_0021: nop 
    L_0022: nop 
    L_0023: ret 
}
Look at that, a new class named c__DisplayClass1 was defined, it has a local variable x and a method called Moo. When we do our += to the event a new instance is created (L_0000), the local variable x is copied to the instance variable x (L_000a) and the method that is 
being added to the event is the method Moo of this new instance (L_0011 - L_0017). Now lets look at the class code:

[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
    // Fields
    public int x;

    // Methods
    public void <Moo>b__0(int i)
    {
        Console.Out.WriteLine(this.x);
    }
}
What a nice surprise, the compiler generated method Moo is exactly the same as our code inside the lambda expression :)
Now all the code makes sense! To make sure, I added a method named Sheep which does not use a local variable. In this case a new method
was added to our existing class (Class1) and it looks like this:

[CompilerGenerated]
private static void <Sheep>b__3(int i)
{
    Console.Out.WriteLine((int) (i + 1));
}
No surprises here.
If you made it to this point then you should be wondering... if this is what happens when I use a local variable in a lambda expression isn't that a huge potential for a memory leak?
Well, the answer is YES! This is a huge potential for a memory leak.
What would happen in this code:

      using (StreamReader reader = new StreamReader("MyFile"))

      {
        SomeEvent += (x) => { string s = reader.ReadToEnd(); };
      }
The stream instance was copied onto an anonymous class which is stored god knows where but it was closed once the using statement is over. An even worse case is when you think the variable will get garbage collected but it is actually held by some anonymous method (see the method Memory in the original code that simulates a memory holding class, the finalizer of the MemoryHolder class is never called after the Memory method is called, no matter how many times you call GC.Collect()).
Conclusion
1) Using anonymous classes and methods or lambda expressions has some overheads and garbage classes/methods created.
2) Using local variables in lambda expressions can cause memory leaks and other issues.
3) It seems that the functionality of anonymous classes and of local variables in lambda  expressions is not the same.
4) Using "var" instead of an actual type name doesn't make your code cool.
On a personal note, I am using .NET since version 1.1 and I have the feeling that in each version they add more and more "AutoMagical" (http://en.wiktionary.org/wiki/automagicalconcepts which make your code shorter but not very maintainable. I think the best example for the most "AutoMagical" feature is Binding in WPF.
You write some string in a XAML file and some magic changes the value of the ViewModel. I personally try to avoid any "AutoMagical" behavior and no one
has yet to convince me otherwise (maybe I am just used to Delphi where you could debug the code up to the assembler commands :)). My honest
advice for you is to consider the same.
Thanks for reading,
Boris.

8 comments:

  1. Loved your post.

    I especially liked the sentence "(Yes, this is the correct place to use "var" and not when you are too lazy to write the actual variable type!)" :)
    I used to be a var fan, but now I understand that in most cases it's wrong.

    You say that lambda expressions are a potential memory leak. One could argue that using delegates in the first place is a potential memory leak, what do you think?

    P.S. How did you learn IL?

    ReplyDelete
  2. The problem here is not the use of a delegate but the fact that unintentionally you may trigger a whole process of creating background scope classes just because you used local variables in a lambda expression. These classes are generated and controlled by the framework but hold references to your own classes. This is how you usually get a .Net memory leak (when someone points to your class but you don't know who and it never gets garbage collected).

    I learned IL mainly from books and a little of reading IL code. If you need a free alternative to Reflector I suggest using ILSpy (or see one of my blog posts on tools for the weary developer)

    ReplyDelete
  3. Boris, you're wrong. When you release your references to Class1, the SomeEvent is released and that's the point when the framework will also collect the lambda helper class instances - and then the MemoryHolder class is also collected.

    There are no general issues with local variables in lambda expressions in .NET framework.

    Beside this, it's not a good idea to add lambdas to events because you cannot remove them from the events.

    ReplyDelete
    Replies
    1. Hi Carsten,
      I have never claimed that MemoryHolder will not be cleared when Class1 is cleared. If you understood that from my text then I am sorry that I wasn't clear.
      The point I was trying to make is that in the Memory method you create a _local_ variable of the MemoryHolder class. When you leave the Memory method you expect this local variable "mh" to be garbage collected but it will not be (assume that the instance of Class1 will stay alive forever) because behind the scenes the instance of Class1 still holds reference to it.

      In managed code the term "Memory Leak" takes a different angle. I call any situation where there is referenced memory that was not referenced by your code but instead by the framework code a potential "Memory Leak".

      I hope this comment made my original point clearer and sorry for the confusion.

      Delete
    2. Boris, no problem.

      I replied to your post after so long time because there is a question on StackOverflow where somebody asks for alternatives for lambda expressions with local variables because they have memory leaks - with a link to you blog post.

      Usually, the local variable will be cleared when you leave the Memory method. When you use lambda expressions for List.Exists(x => x....) or something like this and have local variables there, the lambda expression and its anonymous type instance will be released when you leave the method. But this does not happen when you assign the lambda method to an event. The framework does not know when the event is triggered, so it must hold a hard reference to the lambda class and a copy of all "local" variables. That's by design.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Another question:

    using (StreamReader reader = new StreamReader("MyFile"))
    {
    SomeEvent += (x) => { string s = reader.ReadToEnd(); };
    }

    I think this code will always raise an ObjectDisposedException. The stream reader is disposed at the end of the using block. So you cannot read from the stream when the event is raised -> Exception.

    ReplyDelete
    Replies
    1. Well, this is kind of the point I am making in this blog post :).

      Have you noticed that this post talks about various problems that may arise by using local variables in Lambda expressions?

      Delete