IEnumerable can be a performance bottleneck to your application if not used properly.
In this blog post, first, I’m going to show how implementing IEnumerable the incorrect way can cause a performance bottleneck.
And after that, I’m going to show, how the performance issue can be fixed easily by using IEnumerable the proper way.
The use case to show the performance issue of IEnumerable
For the demonstration, I will be using the use case, where I will read user information from a database through a data access class. And the method of the data access class will return an IEnumerable<User>.
And the name of the database is ECom, and I will use a single table User for this demonstration. The User table has three columns, Id, Name, and Address. And I already populated this table with 4 rows of data.
Data access class
The data access layer has an interface IUserReader. And this interface has one method ReadUsers, which will return a generic IEnumerable with the type User. And User class has three properties Id, Name, and Address to reflect the database table.
namespace CSharpBasics;
internal interface IUserReader
{
IEnumerable<User> ReadUsers();
}
public class User
{
public int Id { get; set; }
public string Name { get; set; }
public string Address { get; set; }
}
The UserReader class
Next I will create a new class UserReader which will implement the IUserReader interface. And inside the UserReader, first, we will create a database connection using the connection string passed as a dependency to the constructor.
Next, we will create a SqlCommand to select the user’s Id, Name, and Address from the table.
And after that, we will open the connection to the database.
Next, we will call the ExecuteReader on the SqlCommand object, which will return a SqlReader object.
And then, we will create a while loop to loop through the Read method of the SqlReader. And inside the while look we will yield return the user object.
The yield return is what causes this method to return an IEnumerable.
Anytime you create an IEnumerable yourself, you always have to do a yield return. Because that is what provides the main behavior of IEnumerable, which is deferred execution.
Hence in this method, the yield is causing a deferred execution, so it returns an IEnumerable.
using System.Data.SqlClient;
namespace CSharpBasics;
internal class UserReader : IUserReader
{
private readonly string _connectionString;
public UserReader(string connectionString)
{
_connectionString = connectionString;
}
public IEnumerable<User> ReadUsers()
{
using var connection = new SqlConnection(_connectionString);
var command = new SqlCommand(
"SELECT Id, Name, Address FROM dbo.[User]",
connection);
try
{
connection.Open();
var reader = command.ExecuteReader();
while(reader.Read())
{
yield return new User
{
Id = reader.GetInt32(0),
Name = reader.GetString(1),
Address = reader.GetString(2)
};
}
}
finally
{
connection?.Close();
}
}
}
Executing the data access
Now, that the data access class is ready to use, I will add code to the Program class to access the data. And inside the Program class, I will use the UserReader to show how it can cause a performance bottleneck in this implementation.
Firstly, I will create an instance of the UserReader class. And for the connection string, I will get it from the environment variable. I added it with a key ConnectionString.
Secondly, after creating the UserReader object, I will get the users using the ReadUsers method of the UserReader object. And this is going to return an IEnumerable<User>.
Now I am going to create a for each user in users, and I am going to output the name of the users to the console.
Now, let’s say I also want to print the ID of the user in the console, but I want to do it in a separate location, so for that, I will create another for each loop. And inside this, I will print the Id of the user in the console output.
using CSharpBasics;
var userReader = new UserReader(
Environment.GetEnvironmentVariable("ConnectionString"));
var users = userReader.ReadUsers();
foreach (var user in users)
{
Console.WriteLine($"User Name: {user.Name}");
}
foreach (var user in users)
{
Console.WriteLine($"User Id: {user.Id}");
}
The performance issue with IEnumerable
Now I will run the application, which will demonstrate the behavior of the performance bottleneck.
When I run this application, what is our usual expectation? The expectation is that the ReadUsers will execute and it will return all the users from the database. And then we will just go through the same users in both the for each loop.
But this is not going to be the true behavior. And it is due to the deferred execution behavior of the IEnumerable.
In the line where we made the ReadUsers call, the ReadUsers call did not execute. Only when we start looping through the IEnumerable<User>, that is when the code will go inside the ReadUsers method and execute the database command. So the code execution does not happen during the ReadUsers call, instead, it happens during the enumeration of the IEnumerable.
After this round of executing the application will print out the user information to the console.
The next logical expectation is that the user’s object already has all the user values, so when we loop through it again, it should just print the id out. But, we will see that again it is coming inside of the ReadUsers method of the UserReader class. And it is going through the database execution all over again. And in the console, it is printing the id of the users.
The analysis of performance issues with IEnumerable
So as you can understand, just returning an IEnumerable is causing us to connect to the database twice, which essentially is a performance issue. Plus, apart from just connecting to the database multiple times slowing down the service, it is also consuming more threads and adding load to the database.
So how do we solve this problem? How can use use the IEnumerable properly?
The solution to the performance issue with IEnumerable
Well, we can just call the ToArray or ToList on the returned IEnumerable. Which will force the iterator to execute as soon as this call is made, and return the result. And as a result, the users’ List will not have all the users in memory. Hence both the for each loop will print from the in-memory list, instead of going to the database twice.
using CSharpBasics;
var userReader = new UserReader(
Environment.GetEnvironmentVariable("ConnectionString"));
var users = userReader.ReadUsers().ToArray();
foreach (var user in users)
{
Console.WriteLine($"User Name: {user.Name}");
}
foreach (var user in users)
{
Console.WriteLine($"User Id: {user.Id}");
}
Now if I run the code again, I will see the double execution problem is gone. And when we call the ReadUsers, the database call happens at that time itself, instead of deferred execution.
And also both loops will get data from the in-memory array.
Conclusion
So this is the performance bottleneck we can face if we do not use the IEnumerable properly. And as a solution, we can either call ToArray or ToList on the IEnumerable to fix the issue.
A Youtube video for this implementation is available here.