LINQ Internals (Where, Select, SelectMany and Join)

LINQ (Language-Integrated Query) was introduced in C# language with .Net Framework version 3.5. The main distinction that LINQ brings to the table, is the ability to write queries with type check.

In traditional query language, the query construct is sting based. And these languages do not support type check or IntelliSense. Above all, there are different types of language for each source.

With the introduction of LINQ, there is only one query construct we will need to learn. Since LINQ is an integral part of .Net Framework, all the operators and constructs are similar to the rest of the framework. Which makes the learning curve very lean and easy.

In this blog post, I am going to walk through internal of some of the main LINQ queries. To start with, first of all I will create a new .Net Core application and go from there.

Building blocks of LINQ

LINQ internally uses three features available in .Net Languages to achieve its functionality. These are the following:

Extension Methods
Delegate
yield keyword

Extension Methods

As the name suggests, the Extension Methods feature allows us to extend an existing type. Meaning, it lets us add a method to an existing type without modifying the type.

Extension methods are static methods. And are essentially syntactic sugar around the static methods. Tough the main distinction of the extension method is that when we call an extension method, it is on an object and not on a type like static methods.

To demonstrate the concept and syntax for extension methods let us consider a problem. Let us say we want to find out the first character inside of a string. And instead of creating a static method, we want to have the integrated experience, meaning when we do a . on a string, we need a method to give us that.

To achieve that we will create a new class StringExtension (NOTE: Extension methods can only be declared inside of a static class). And inside of this class we will create an extension method GiveFirstCharacter, which will return the first character of a string.

In the extension method syntax, the first parameter is always the object on which we call the extension method. And the parameter has this keyword in front of it.

namespace LinqInternals.Demo
{
    public static class StringExtension
    {
        public static string GiveFirstCharacter(this string item)
        {
            return item.Substring(0, 1);
        }
    }
}

Now if I call this method inside of the Main method on a string “Extension Methods”, I will get “E” in the response.

using System;

namespace LinqInternals.Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            var text = "Extension Methods";
            Console.WriteLine(text.GiveFirstCharacter());
        }
    }
}

Delegate

Delegate is a first class type in .Net languages (like C#), which encapsulates reference to a method with particular list of parameter and type of return.

Since Delegates are standard types we can be pass them around like any other type. Hence one of the main use cases where we use delegate is for callbacks.

To demonstrate how delegates work, let build a simple example, where we will callback a function to print a simple message to the console.

Firstly, we will create a class CustomerProcessor. This class will have a method PrintName, that is responsible for printing the name of the customer.

using System;

namespace LinqInternals.Demo
{
    public class CustomerProcessor
    {
        public void PrintName(string customerName)
        {
            Console.WriteLine(customerName);
        }
    }
}

Secondly, inside of the Program class, we will declare a new delegate Print. with the same signature as the PrintName method.

Thirdly, we will create an instance of the Print delegate, passing the PrintName method of CustomerProcessor instance.

Finally, we will execute the instance of Print delegate.

using System;

namespace LinqInternals.Demo
{
    class Program
    {
        delegate void Print(string name);

        static void Main(string[] args)
        {
            var customerProcessor = new CustomerProcessor();

            Print printCustomer = new Print(customerProcessor.PrintName);
            printCustomer("Jon Doe");
        }
    }
}

Now, if we run the application, we will see “Jon Doe’ printed out in the console.

yield Keyword

If we have to define yield keyword in one sentence it will be “yield holds the state of an enumeration“. I will explain how yield works later when we dig deeper into the LINQ query. Since yield is arguably the most important feature of the LINQ query.

For now, let us first start with an example of how to use the Where clause in LINQ, and then we will get into how it is implemented internally. And when we do that, the usage and understanding of yield will be clear.

Where Query

The Where LINQ query helps to reduce a result set based on a condition. The Where query conceptually works similar to how Where condition works for SQL.

Let us consider we have an array of integers. And from the array we need to find out all the even numbers. We can either do it using a for/foreach loop. Or we can achieve this in a line using a LINQ query.

using System;
using System.Linq;

namespace LinqInternals.Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            var items = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
            var evenItems = items.Where(x => x % 2 == 0);
            foreach (var item in evenItems)
            {
                Console.WriteLine(item);
            }
        }
    }
}

Now, let us analyze the line items.Where(x => x % 2 == 0);. In this line, we are calling the Where LINQ extension method on IEnumerable. The Where method takes Func delegate, which gives a Type as input and returns a bool.

Hence, for the Func expression, we are checking when we divide the integer value by 2, we do not have any reminder. And this condition will be true only for even numbers.

Now, if we run this application, we will see all even numbers printed out.

Our version of Where

Now let us create our own version of Where. As I mentioned above, the three main constituents of LINQ is Extension Methods, Delegate, and Yield keyword. Hence let us first start creating a new class for the extension method. And I will name the class as IEnumerableExtension.

Inside this class, I will create a new extension method for IEnumerable and I will call it NewWhere. And the signature of the method will be as follows, same as that of the Where extension method:

public static IEnumerable<T> NewWhere<T>(this IEnumerable<T> items, Func<T, bool> predicate)
{
}

The next obvious step for implementation will be to loop through items and find matches based on the Func predicate.

public static IEnumerable<T> NewWhere<T>(this IEnumerable<T> items, Func<T, bool> predicate)
{
    foreach (var item in items)
    {
        if (predicate(item))
        {
        }
    }
}

Once the predicate returns true, this is where we want to return the item. But we cannot just return it. So we might add the item into a list and return the list in the end. But if we do that the entire IEnumerable will be iterated immediately instead of a lazy execute which is how the Where works.

This is where the yield keyword comes into play. As I mentioned earlier, the yield keyword maintains the state of an enumeration. Meaning it will return the item and then wait for the next iteration of the enumeration.

I have explained the above phenomena in my YouTube video here. It will be more clear when you watch the video, as I debug through each iteration to show how the yield return keyword works.

using System;
using System.Collections.Generic;

namespace LinqInternals.Demo
{
    public static class IEnumerableExtension
    {
        public static IEnumerable<T> NewWhere<T>(this IEnumerable<T> items,
            Func<T, bool> predicate)
        {
            foreach (var item in items)
            {
                if (predicate(item))
                {
                    yield return item;
                }
            }
        }
    }
}

Finally, if I now update the Main method and replace the Where extension method with NewWhere, I will see the exact response in the console.

Select Query

A Select query is mainly for mapping operation. So if we want a smaller set of attributes or a derived set of attributes from a data set, we will use the Select LINQ query. To demonstrate this, let us say we have a list of customers. And from the list, we just want the name of the customers.

Firstly, let us create the Customer type.

namespace LinqInternals.Demo
{
    public class Phone
    {
        public string Number { get; set; }
        public PhoneType PhoneType { get; set; }
    }
}

namespace LinqInternals.Demo
{
    public class Customer
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public Phone[] Phones { get; set; }
    }
}

Secondly, I will update the Main method to create new customers and implement Select on the customer array to get only customer name.

using System;
using System.Linq;

namespace LinqInternals.Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            var customers = new[] { 
                new Customer{
                    Id=1,
                    Name="Jon",
                    Phones=new []{
                        new Phone
                        { Number="123", PhoneType= PhoneType.Cell},
                        new Phone
                        { Number = "345", PhoneType= PhoneType.Home}
                    }
                },
                new Customer
                {
                    Id=2,
                    Name="Jane",
                    Phones=new[]
                    {
                        new Phone
                        { Number="345-345-3456", PhoneType= PhoneType.Cell},
                        new Phone
                        { Number="456-678-5678", PhoneType= PhoneType.Home}
                    }
                }
            };

            var customerNames = customers.Select(c => c.Name);
            foreach (var item in customerNames)
            {
                Console.WriteLine(item);
            }
        }
    }
}

Finally, if I run the application, I will see Jon and Jane printed out in the console.

Our version of Select

Now, let us create our own version of Select extension method. We will create a new extension method NewSelect inside of the existing IEnumerableExtension class.

The Select extension method is even simpler compare to Where in its implementation. Select takes a Func selector, which gives T as input and TResult as the output. In the case of Select all we have to do is to loop through the collection and return the result of the Func selector.

public static IEnumerable<TResult> NewSelect<T, TResult>(this IEnumerable<T> items, Func<T, TResult> selector)
{
    foreach (var item in items)
    {
        yield return selector(item);
    }
}

Finally, I will replace the Select call inside of the Main method with the NewSelect method. And if I run the application now, I will see exact same response as we saw with Select.

SelectMany Query

When we need to flatten a hierarchical data structure, that is when we can use SelectMany LINQ query.

From the above Customer data structure, if we want to just get a list of all the Phone objects, that is when we will use SelectMany.

using System;
using System.Linq;

namespace LinqInternals.Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            var customers = new[] { 
                new Customer{
                    Id=1,
                    Name="Jon",
                    Phones=new []{
                        new Phone
                        { Number="123", PhoneType= PhoneType.Cell},
                        new Phone
                        { Number = "345", PhoneType= PhoneType.Home}
                    }
                },
                new Customer
                {
                    Id=2,
                    Name="Jane",
                    Phones=new[]
                    {
                        new Phone
                        { Number="345-345-3456", PhoneType= PhoneType.Cell},
                        new Phone
                        { Number="456-678-5678", PhoneType= PhoneType.Home}
                    }
                }
            };

            var customerPhones = customers.SelectMany(c => c.Phones);
            foreach (var item in customerPhones)
            {
                Console.WriteLine($"{item.Number} - {item.PhoneType}");
            }
        }
    }
}

Now if I run the application, I will see the phone numbers from all the clients.

Our version of SelectMany

Now, let us try to create our own version of SelectMany. For SelectMany, the selector Func returns an IEnumerable of type TResult instead of a simple type TResult.

And as you might have excepted, in SelectMany implementation, we will have two for loops, instead of one. Since we are flattening a data structure.

The first foreach loop will be on the top level IEnumerable. Whereas the inner foreach loop will be on the result of the Func selector.

public static IEnumerable<TResult> NewSelectMany<T, TResult>(this IEnumerable<T> items, Func<T, IEnumerable<TResult>> selector)
{
    foreach (var item in items)
    {
        foreach (var innerItem in selector(item))
        {
            yield return innerItem;
        } 
    }
}

Now if I replace SelectMany with NewSelectMany in the Main method, I will still see the same response on the console output.

For both Select and SelectMany implementations, you can watch my YouTube video here.

Join LINQ Query

The next and final topic I want to cover today is the Join LINQ query. The Join query is similar to what the Join SQL query does. It joins two datasets based on common keys.

To demonstrate Join, let us create another Type called Address, which will store the address of the customers.

namespace LinqInternals.Demo
{
    public class Address
    {
        public int Id { get; set; }
        public int CustomerId { get; set; }
        public string Street { get; set; }
        public string City { get; set; }
    }
}

Now, let us update the Main method to add a new Address array for the customer Jane. Purposefully we will not add any address for customer Jon to show the response containing only addresses for Jane.

var addresses = new[]
{
    new Address{Id=1, CustomerId=2, Street="123 Street", City="City1"},
    new Address {Id=2, CustomerId=2, Street="457 Street", City="City2"}
};

Now, we can join the Customer array with the Address array based on Customer ID.

The Join takes an inner IEnumerable, a Func for outer key selection, a Func for inner key selection, and finally a Func for result selector.

using System;
using System.Linq;

namespace LinqInternals.Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            var customers = new[] { 
                new Customer{
                    Id=1,
                    Name="Jon",
                    Phones=new []{
                        new Phone
                        { Number="123", PhoneType= PhoneType.Cell},
                        new Phone
                        { Number = "345", PhoneType= PhoneType.Home}
                    }
                },
                new Customer
                {
                    Id=2,
                    Name="Jane",
                    Phones=new[]
                    {
                        new Phone
                        { Number="345-345-3456", PhoneType= PhoneType.Cell},
                        new Phone
                        { Number="456-678-5678", PhoneType= PhoneType.Home}
                    }
                }
            };

            var addresses = new[]
            {
                new Address{Id=1, CustomerId=2, Street="123 Street", City="City1"},
                new Address {Id=2, CustomerId=2, Street="457 Street", City="City2"}
            };

            var customerwithaddress = customers.Join(addresses
                , c => c.Id
                , a => a.CustomerId
                , (c, a) => new { c.Name, a.Street, a.City });

            foreach (var item in customerwithaddress)
            {
                Console.WriteLine($"{item.Name} - {item.Street} - {item.City}");
            }
        }
    }
}

Now, if I run the application, I will see customer Jane and her addresses.

Our version of Join

Now, we will implement our own version of Join. This is of course the most complex one compared to the others we have implemented so far.

Just as before I will create a new method NewJoin inside of the class IEnumerableExtension. The NewJoin extension method will have four parameters, apart from the original collection on which this method will be called.

public static IEnumerable<TResult> NewJoin<T, TH, TKey, TResult>(
            this IEnumerable<T> items,
            IEnumerable<TH> innerItems,
            Func<T, TKey> outerKeySelector,
            Func<TH, TKey> innerKeySelector,
            Func<T, TH, TResult> resultSelector
            )
        {
        }

In the parameters, the IEnumerable innerItems is the second array with which we want to join. The Func outerKeySelector is to get the key from the outer array. The Func innerKeySelector is to get key from the second array. And finally the Func resultSelector for expressing the final result. In most cases, it will be a new type combining items from both arrays.

public static IEnumerable<TResult> NewJoin<T, TH, TKey, TResult>(
            this IEnumerable<T> items,
            IEnumerable<TH> innerItems,
            Func<T, TKey> outerKeySelector,
            Func<TH, TKey> innerKeySelector,
            Func<T, TH, TResult> resultSelector
            )
{
    foreach (var item in items)
    {
        foreach (var innerItem in innerItems)
        {
            if (outerKeySelector(item).Equals(innerKeySelector(innerItem)))
            {
                yield return resultSelector(item, innerItem);
            }
        }
    }
}

As you can see in the above implementation:

Firstly, we are looping through the outer array.
Secondly, we are looping through the inner array.
Thirdly, we are checking the equality between the item keys.
Fourthly, if they are equal, we are executing the resultSelector Func passing both outer and inner items.
Finally, we are yielding the result.

Now, if we replace Join with NewJoin in the Main method, we will see the exact same response in the console.

The implementation of Join is available in my YouTube video here.

Conclusion

The LINQ has changed the landscape of .Net since it was introduced. It has made dealing with an array for different operations a breeze.

I am using LINQ on daily basis and I am sure most of you are. But knowing the internals of LINQ will help us curving out our own LINQ methods, which can be used across multiple applications.

The source code for this blog is available here.