POD (Performance Obsessive Disorder) is an intended pun for Plain Old Data. This entry imagines how
IEnumerable<T>
of .NET could be upgraded for better performance, featuring duck typing with generics. This entry is related to this StackOverflow question.Copyright Notice. The hero image has been cropped to various sizes for different scenarios.
C♯ has many beautiful features. The three appreciated in this entry are
(1) explicit interface implementation,
(2) duck typing in foreach
,
and (3) generics.
They are logically chained together to achieve efficient C♯.
I will first recall the basic background stories (duck typing foreach
& generics for explicit interface members), which are both performance-related.
It is thus natural to think what happens when foreach
meets generics, which leads to a more efficient IEnumerable
leveraging covariance.
TL; DR. The code is available at this GitHub gist under MIT license.
Duck typing foreach
For the uninitiated, explicit interface implementation means that interface members can be implemented by hidden (private) members of a class or structure, so that they are invisible via a reference or value statically typed in that class or structure. (But they are of course visible once you cast the object to the interface type.) This resolves many problems, notably name collision among members from unrelated interfaces. It also opens up a possibility for performance optimisation.
The most notable example is IEnumerable<T>
, IEnumerator<T>
, and foreach
in C♯.
The two interfaces provides enumeration facilities and are defined as follows:
// Over-simplified
interface IEnumerable<out T>
{
IEnumerator<T> GetEnumerator();
}
interface IEnumerator<out T>
{
T Current { get; }
bool MoveNext();
}
The foreach
loop looks like the following:
foreach (var item in enumerable)
{
// Do something
}
// Over-simplified equivalent code
var enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
{
var item = enumerator.Current;
// Do something
}
Those familiar with C♯ might nit-pick me with the omission of IDisposable
and IEnumerable
, IEnumerator
(non-generic interfaces, equivalent to ...<object>
), and the condition under which the equivalence holds, but you get the point.
It is important to realise that code transformation for foreach
duck-types, meaning that the call enumerable.GetEnumerator()
does not necessarily call IEnumerable<T>.GetEnumerator()
, and consequently enumerator
might not be (statically or dynamically) IEnumerator<T>
at all.
Heck! Even enumerable
doesn’t have to be IEnumerable<T>
.
How is this useful?
The BCL collections implement IEnumerable<T>
explicitly and provide a public GetEnumerator
returning a structure.
Interfaces are reference types and an interface reference can only point to the managed heap, whereas struct
s are value types and can be allocated on stack.
This saves a lot of heap allocations should enumeration happens frequently (which does).
Generics for explicit interface members
A classic quiz in C♯: How do you call an explicit interface member?
Suppose bar
is an object whose static type implements int IFoo.Foobar()
explicitly. It may or may not have a visible void Foobar()
. When it has a visible one, we still want to call the interface method, not the one visible from the static type.
A naïve attempt is to say ((IFoo)bar).Foobar()
.
This is fine if bar
is statically reference-typed.
However, if bar
is statically value-typed, the line has many problems:
- It boxes the object, hurting performance.
- It boxes the object thus making a copy of it, so any mutation done by the interface method will not be reflected in
bar
. (Let’s forget for now that mutatingstruct
s are usually bad. Plus, the BCL enumerators are mutating structures so we have to address that.)
A clever solution is to use constrained generics:
static int CallFoobarFromIFoo<T>(ref T bar) where T : IFoo
{
return bar.Foobar();
}
This works no matter what T
is and will not copy bar
even if it’s statically value-typed.
C♯ generics are true generics and in particular, generics instantiated with value types get their own copy of instantiated code for the specific value type.
It is crucial to understand that the previous piece of code is not equivalent to the following:
// NOT equivalent
static int CallFoobarFromIFoo(ref IFoo bar)
{
return bar.Foobar();
}
When foreach
meets generics
At this point, you might wonder what happens for the following:
public static void UseIEnumerable(IEnumerable enumerable)
{
// The body are the same in the methods below
foreach (var item in enumerable)
{
// Do something
}
}
public static void UseIEnumerableT<T>(IEnumerable<T> enumerable);
public static void UseTEnumerableT<TEnumerable, T>(TEnumerable enumerable)
where TEnumerable : IEnumerable<T>;
public static void UsePreciseType<T>(List<T> enumerable);
How do they compare to each other?
Hint: List<T>.GetEnumerator()
returns List<T>.Enumerator
structure.
The first, second, and last ones should be clear.
The third one is the most curious.
The GetEnumerator
that gets called here is always IEnumerable<T>.GetEnumerator
, not the visible GetEnumerator
in TEnumerable
.
By looking at the source code of List
, we realise that the List<T>.Enumerator
will be boxed in the third version.
Boxing is not the only loss — we also suffer an interface method call (virtual function call) for each invocation of MoveNext
and Current
since the generic method has no idea who is implementing IEnumerable<T>
.
In contrast, in the last version, the enumerator is statically typed as List<T>.Enumerator
, no boxing happens and calls to MoveNext
and Current
are direct and non-virtual.
(For readers familiar with C++, you should now see that C♯ generics are very different from C++ templates even for this simple case.)
Interestingly, the fact that the method being called is always the interface method (not the visible method on TEnumerable
) is what enables the aforementioned trick of calling an explicitly implemented interface member.
What was a blessing is now a curse.
A more efficient IEnumerable
What can we do to improve performance in this case?
The reason we want to implement IEnumerable<T>
explicitly is to hide its GetEnumerator
so that we can return a structure (or a sealed class, with the advantage that MoveNext
and Current
can be non-virtual calls) in our ducky GetEnumerator
.
Typical enumerators implement IEnumerator<T>
implicitly (unless they implement multiple unrelated enumerations, which is bad), and
sane enumerators do not cause a difference even if the interface is implemented explicitly.
Moreover, as far as typing is concerned, there is no harm to choose the explicit interface methods over the visible methods.
The only need is to let the generic method know the precise static type of the IEnumerator<T>
.
Therefore, a more performant and backward compatible IEnumerable
could be defined as follows:
interface IEnumerable<out T, out TEnumerator> : IEnumerable<T>
where TEnumerator : IEnumerator<T>
{
new TEnumerator GetEnumerator();
}
static void UseIEnumerableTTEnumerator<T, TEnumerator>
(IEnumerable<T, TEnumerator> enumerable)
// C# compiler does not deduce the constraint posed by
// IEnumerable<T, TEnumerator>, and we need to help it
// so that the foreach loop in the body compiles
where TEnumerator : IEnumerator<T>;
static void UseTEnumerableTTEnumerator<TEnumerable, T, TEnumerator>
(TEnumerable enumerable)
where TEnumerable : IEnumerable<T, TEnumerator>
where TEnumerator : IEnumerator<T>;
Now everything previously implementing IEnumerable<T>
should now implement an interface being-an IEnumerable<T, IEnumerator<T>>
.
For example, List<T>
should implement IEnumerable<T, List<T>.Enumerator>
.
An observant reader should have realised that
IEnumerable<T, IEnumerator<T>>
is the new self of IEnumerable<T>
in the same sense as IEnumerable<object>
is the new self of IEnumerable
, and that List<object>
is the new self of ArrayList
.
Now there is a catch: C♯ will not infer T
(and TEnumerator
) from TEnumerable
in UseTEnumerableT
or UseTEnumerableTTEnumerator
, so the type parameters must be explicitly specified. This behaviour is by design.
Well, the type T
and TEnumerator
cannot generically (pun intended!) be inferred from TEnumerable
, as a class can choose to implement two IEnumerable
s, in which case there is no clear preference.
Check out the code in this GitHub gist. It uses explicit method implementation to discover which version of the method gets called.
To compare the extent of possible optimisation (N.B. boxing only applies when the relevant type is struct
):
UsePreciseType
has all the performance advantages.- All except
UseIEnumerable
avoid boxingT
. UseTEnumerable*
(ugly) avoids boxingTEnumerable
and might optimise interface call toGetEnumerator
into direct call.Use?EnumerableTTEnumerator
avoids boxingTEnumerator
and might optimise interface calls toMoveNext
,Current
into direct call.
I personally find UseIEnumerableTTEnumerator
a sweet spot in balancing generality, syntax simplicity on call site, and performance.
One more thought.
The avoidance of boxing TEnumerator
should be meaningful as long as BCL collection returning a struct
enumerator is meaningful.
However, it is not clear to me whether the avoidance of interface calls to MoveNext
and Current
serves practical purpose.
Potentially, the JIT compiler could realise that the IEnumerator<T>
reference never changes, so it could cache the virtual function pointer instead of looking it up on every call.
Addendum. In the related StackOverflow question, caching of virtual function pointer is unlikely to be done because the method is a coroutine, which means its local variables are stored in an instance of a compiler-generated class (implementing the state machine of the iterator block). It is extremely unlikely that the JIT compiler will sneak in the cached virtual function pointer somewhere in the instance.
Please enable JavaScript to view the comments powered by Disqus.