Performance-obsessive disorder, or generics and duck typing in C♯

Ducks (CC-BY-2.0 by gaetanlee)
Ducks (CC-BY-2.0 by gaetanlee)

POD (Performance Obsessive Disorder) is an intended pun for Plain Old Data. This entry imagines how IEnumerable<T> of .NET could be upgraded for better performance, featuring duck typing with generics. This entry is related to this StackOverflow question.

Copyright Notice. The hero image has been cropped to various sizes for different scenarios.

C♯ has many beautiful features. The three appreciated in this entry are (1) explicit interface implementation, (2) duck typing in foreach, and (3) generics. They are logically chained together to achieve efficient C♯.

I will first recall the basic background stories (duck typing foreach & generics for explicit interface members), which are both performance-related. It is thus natural to think what happens when foreach meets generics, which leads to a more efficient IEnumerable leveraging covariance.

TL; DR. The code is available at this GitHub gist under MIT license.

Duck typing foreach

For the uninitiated, explicit interface implementation means that interface members can be implemented by hidden (private) members of a class or structure, so that they are invisible via a reference or value statically typed in that class or structure. (But they are of course visible once you cast the object to the interface type.) This resolves many problems, notably name collision among members from unrelated interfaces. It also opens up a possibility for performance optimisation.

The most notable example is IEnumerable<T>, IEnumerator<T>, and foreach in C♯. The two interfaces provides enumeration facilities and are defined as follows:

// Over-simplified
interface IEnumerable<out T>
  IEnumerator<T> GetEnumerator();
interface IEnumerator<out T>
  T Current { get; }
  bool MoveNext();

The foreach loop looks like the following:

foreach (var item in enumerable)
  // Do something

// Over-simplified equivalent code
var enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
  var item = enumerator.Current;
  // Do something

Those familiar with C♯ might nit-pick me with the omission of IDisposable and IEnumerable, IEnumerator (non-generic interfaces, equivalent to ...<object>), and the condition under which the equivalence holds, but you get the point.

It is important to realise that code transformation for foreach duck-types, meaning that the call enumerable.GetEnumerator() does not necessarily call IEnumerable<T>.GetEnumerator(), and consequently enumerator might not be (statically or dynamically) IEnumerator<T> at all. Heck! Even enumerable doesn’t have to be IEnumerable<T>.

How is this useful? The BCL collections implement IEnumerable<T> explicitly and provide a public GetEnumerator returning a structure. Interfaces are reference types and an interface reference can only point to the managed heap, whereas structs are value types and can be allocated on stack. This saves a lot of heap allocations should enumeration happens frequently (which does).

Generics for explicit interface members

A classic quiz in C♯: How do you call an explicit interface member? Suppose bar is an object whose static type implements int IFoo.Foobar() explicitly. It may or may not have a visible void Foobar(). When it has a visible one, we still want to call the interface method, not the one visible from the static type.

A naïve attempt is to say ((IFoo)bar).Foobar(). This is fine if bar is statically reference-typed. However, if bar is statically value-typed, the line has many problems:

  • It boxes the object, hurting performance.
  • It boxes the object thus making a copy of it, so any mutation done by the interface method will not be reflected in bar. (Let’s forget for now that mutating structs are usually bad. Plus, the BCL enumerators are mutating structures so we have to address that.)

A clever solution is to use constrained generics:

static int CallFoobarFromIFoo<T>(ref T bar) where T : IFoo
  return bar.Foobar();

This works no matter what T is and will not copy bar even if it’s statically value-typed. C♯ generics are true generics and in particular, generics instantiated with value types get their own copy of instantiated code for the specific value type. It is crucial to understand that the previous piece of code is not equivalent to the following:

// NOT equivalent
static int CallFoobarFromIFoo(ref IFoo bar)
  return bar.Foobar();

When foreach meets generics

At this point, you might wonder what happens for the following:

public static void UseIEnumerable(IEnumerable enumerable)
  // The body are the same in the methods below
  foreach (var item in enumerable)
    // Do something
public static void UseIEnumerableT<T>(IEnumerable<T> enumerable);
public static void UseTEnumerableT<TEnumerable, T>(TEnumerable enumerable)
  where TEnumerable : IEnumerable<T>;
public static void UsePreciseType<T>(List<T> enumerable);

How do they compare to each other? Hint: List<T>.GetEnumerator() returns List<T>.Enumerator structure.

The first, second, and last ones should be clear. The third one is the most curious. The GetEnumerator that gets called here is always IEnumerable<T>.GetEnumerator, not the visible GetEnumerator in TEnumerable.

By looking at the source code of List, we realise that the List<T>.Enumerator will be boxed in the third version. Boxing is not the only loss — we also suffer an interface method call (virtual function call) for each invocation of MoveNext and Current since the generic method has no idea who is implementing IEnumerable<T>. In contrast, in the last version, the enumerator is statically typed as List<T>.Enumerator, no boxing happens and calls to MoveNext and Current are direct and non-virtual.

(For readers familiar with C++, you should now see that C♯ generics are very different from C++ templates even for this simple case.) Interestingly, the fact that the method being called is always the interface method (not the visible method on TEnumerable) is what enables the aforementioned trick of calling an explicitly implemented interface member. What was a blessing is now a curse.

A more efficient IEnumerable

What can we do to improve performance in this case? The reason we want to implement IEnumerable<T> explicitly is to hide its GetEnumerator so that we can return a structure (or a sealed class, with the advantage that MoveNext and Current can be non-virtual calls) in our ducky GetEnumerator. Typical enumerators implement IEnumerator<T> implicitly (unless they implement multiple unrelated enumerations, which is bad), and sane enumerators do not cause a difference even if the interface is implemented explicitly. Moreover, as far as typing is concerned, there is no harm to choose the explicit interface methods over the visible methods. The only need is to let the generic method know the precise static type of the IEnumerator<T>.

Therefore, a more performant and backward compatible IEnumerable could be defined as follows:

interface IEnumerable<out T, out TEnumerator> : IEnumerable<T>
  where TEnumerator : IEnumerator<T>
  new TEnumerator GetEnumerator();

static void UseIEnumerableTTEnumerator<T, TEnumerator>
  (IEnumerable<T, TEnumerator> enumerable)
  // C# compiler does not deduce the constraint posed by
  // IEnumerable<T, TEnumerator>, and we need to help it
  // so that the foreach loop in the body compiles
  where TEnumerator : IEnumerator<T>;

static void UseTEnumerableTTEnumerator<TEnumerable, T, TEnumerator>
  (TEnumerable enumerable)
  where TEnumerable : IEnumerable<T, TEnumerator>
  where TEnumerator : IEnumerator<T>;

Now everything previously implementing IEnumerable<T> should now implement an interface being-an IEnumerable<T, IEnumerator<T>>. For example, List<T> should implement IEnumerable<T, List<T>.Enumerator>. An observant reader should have realised that IEnumerable<T, IEnumerator<T>> is the new self of IEnumerable<T> in the same sense as IEnumerable<object> is the new self of IEnumerable, and that List<object> is the new self of ArrayList.

Now there is a catch: C♯ will not infer T (and TEnumerator) from TEnumerable in UseTEnumerableT or UseTEnumerableTTEnumerator, so the type parameters must be explicitly specified. This behaviour is by design. Well, the type T and TEnumerator cannot generically (pun intended!) be inferred from TEnumerable, as a class can choose to implement two IEnumerables, in which case there is no clear preference.

Check out the code in this GitHub gist. It uses explicit method implementation to discover which version of the method gets called.

To compare the extent of possible optimisation (N.B. boxing only applies when the relevant type is struct):

  • UsePreciseType has all the performance advantages.
  • All except UseIEnumerable avoid boxing T.
  • UseTEnumerable* (ugly) avoids boxing TEnumerable and might optimise interface call to GetEnumerator into direct call.
  • Use?EnumerableTTEnumerator avoids boxing TEnumerator and might optimise interface calls to MoveNext, Current into direct call.

I personally find UseIEnumerableTTEnumerator a sweet spot in balancing generality, syntax simplicity on call site, and performance.

One more thought. The avoidance of boxing TEnumerator should be meaningful as long as BCL collection returning a struct enumerator is meaningful. However, it is not clear to me whether the avoidance of interface calls to MoveNext and Current serves practical purpose. Potentially, the JIT compiler could realise that the IEnumerator<T> reference never changes, so it could cache the virtual function pointer instead of looking it up on every call.

Addendum. In the related StackOverflow question, caching of virtual function pointer is unlikely to be done because the method is a coroutine, which means its local variables are stored in an instance of a compiler-generated class (implementing the state machine of the iterator block). It is extremely unlikely that the JIT compiler will sneak in the cached virtual function pointer somewhere in the instance.

Please enable JavaScript to view the comments powered by Disqus.