Table of contents >> Strings And Text Processing > String Builder
Medium importance article

I was explaining at some point that string is an immutable type. That means that once you assign a value to a string variable, you cannot directly modify it anymore. This also means that any string operation using any function such as Trim(), Replace(), ToUpper(), etc, will actually create a new string in memory, where the resulting value will be stored, and it will delete the old, initial value. This behavior is a very complex one, involving pointers and references, and it has many advantages, but in some cases it can cause performance problems.

The worst example of bad performance I can think of is the concatenation of strings inside a loop. NEVER do that! We haven’t learn about the dynamic memory or the garbage collector yet, so I cannot fully explain the reasons why this results in such terrible performance, but still, let’s try to understand the reasons behind it. To understand them, we first need to understand what happens when we use the + or += operators on strings. Let’s consider the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
using System;
 
namespace HelloWorld
{
    class Program
    {
        static void Main(string[] args)
        {
            string surname = "John ";
            string name = "Doe";
            string completeName = surname + name;
            completeName = "Jane";
            
            Console.ReadLine();
        }
    }
}

What will happen in the memory? When we declare the surname and name variables, they will be stored in a special memory called the Heap. When we concatenate them, we assign the resulting value to a third variable. So, now we have three values in memory and three variables pointing to them, and this is the expected result. However, when we change the value of the already existing variable completeName, we are actually allocating a new memory area, store the new string in it, and delete the string value that was located in the previous location. This process can take time, specially when repeated many, many times, like in a loop.

In C#, we don’t have to worry about manually deleting the variable values that we no longer need, like in other languages, such as C or C++. There is a special component called Garbage Collector that automatically cleans up any unused resources, but this comes with a price: whenever it performs the cleaning, it takes quite some time and it overall slows down the execution speed. So, not only we force the GC to clean the memory all the time, we also make the program transfer characters from one place to another in memory (when string concatenation is executed), which is slow, especially if the strings are long.

Let’s demonstrate this. Let’s concatenate the numbers from 0 to 200,000 in a string. The usual way of doing this would be like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
using System;
 
namespace HelloWorld
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(DateTime.Now);
            string collector = "Numbers: ";
            for (int index = 1; index <= 200000; index++)
                collector += index;
            Console.WriteLine(collector.Substring(0, 1024));
            Console.WriteLine(DateTime.Now);
            
            Console.ReadLine();
        }
    }
}

We display the current time at the moment we start the concatenation (though we didn’t learn about the DateTime object yet), then we perform the joining of the string inside the loop, and finally display the current time again, to be able to compare the elapsed time.

21/04/2017 08:53:00 AM
Numbers: 123456789101112131415161718192021222324252627282930313233343536373839404142
434445464748495051525354555657585960616263646566676869707172737475767778798081828384
858687888990919293949596979899100101102103104105106107108109110111112113114115116117
118119120121122123124125126127128129130131132133134135136137138139140141142143144145
146147148149150151152153154155156157158159160161162163164165166167168169170171172173
174175176177178179180181182183184185186187188189190191192193194195196197198199200201
202203204205206207208209210211212213214215216217218219220221222223224225226227228229
230231232233234235236237238239240241242243244245246247248249250251252253254255256257
258259260261262263264265266267268269270271272273274275276277278279280281282283284285
286287288289290291292293294295296297298299300301302303304305306307308309310311312313
314315316317318319320321322323324325326327328329330331332333334335336337338339340341
342343344345346347348349350351352353354355356357358359360361362363364365366367368369
3703713723733743
 
21/04/2017 08:54:55 AM
 

As you can see, on an Intel Quad Core i5 4590 CPU, running at 3.3 GHz, this took almost two minutes. Some of you might say, “yeah, but still, there’s 200.000 operations to be performed! That has to take some time!”, and you would be wrong. Computers are VERY good at performing repeated, extremely fast operations, specially on modern nowadays CPU’s.

But most importantly, in 2017, making your users wait 2 minutes for an operation is almost unacceptable, and many will close it before this gets a chance to complete.

The problem with time-consuming loop processing is related to the way strings work in memory. Each iteration creates a new object in the Heap and point the reference to it, as I explained. This process requires a certain physical time.

Several things happen at each step:

1. An area of memory is allocated for recording the next number of concatenation result. This memory is used only temporarily while concatenating, and is called a buffer.
2. The old string is moved into the new buffer. If the string is long (say 500 KB, 5 MB or 50 MB), it can be quite slow!
3. Next number is concatenated to the buffer.
4. The buffer is converted to a string.
5. The old string and the temporary buffer become unused. Later they are destroyed by the Garbage Collector. This may also be a slow operation.

A much more elegant and appropriate way to concatenate strings in a loop is using the StringBuilder class. I know, we haven’t talked about classes yet, but don’t bother yourself with that. Let’s just see how it works. First, StringBuilder is a class that serves to build and change strings. It overcomes the performance problems that arise when concatenating strings of type string. The class is built in the form of an array of characters and what we need to know about it is that the information in it can be freely changed. Changes that are required in the variables of type StringBuilder, are carried out in the same area of memory (buffer), which saves time and resources. Changing the content does not create a new object but simply changes the current one. Let’s rewrite the above code above in which we concatenated strings in a loop. Notice that the StringBuilder type is declared in an external library called System.Text, so you will need to add another using directive. If you remember, the operation previously took 2 minutes. Let’s measure how long will take the same operation if we use StringBuilder:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
using System;
using System.Text;
 
namespace HelloWorld
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(DateTime.Now);
            StringBuilder sb = new StringBuilder();
            sb.Append("Numbers: ");
            for (int index = 1; index <= 200000; index++)
                sb.Append(index);
            Console.WriteLine(sb.ToString().Substring(0, 1024));
            Console.WriteLine(DateTime.Now);
            
            Console.ReadLine();
        }
    }
}

After running the code, we get this:

21/04/2017 08:59:59 AM
Numbers: 123456789101112131415161718192021222324252627282930313233343536373839404142
434445464748495051525354555657585960616263646566676869707172737475767778798081828384
858687888990919293949596979899100101102103104105106107108109110111112113114115116117
118119120121122123124125126127128129130131132133134135136137138139140141142143144145
146147148149150151152153154155156157158159160161162163164165166167168169170171172173
174175176177178179180181182183184185186187188189190191192193194195196197198199200201
202203204205206207208209210211212213214215216217218219220221222223224225226227228229
230231232233234235236237238239240241242243244245246247248249250251252253254255256257
258259260261262263264265266267268269270271272273274275276277278279280281282283284285
286287288289290291292293294295296297298299300301302303304305306307308309310311312313
314315316317318319320321322323324325326327328329330331332333334335336337338339340341
342343344345346347348349350351352353354355356357358359360361362363364365366367368369
3703713723733743
 
21/04/2017 08:59:59 AM
 

I don’t know about you, but 200.000 operations in less than a second, now, that’s what I call a performance increase! The required time is actually in the order of milliseconds!

The way we use StringBuilder is by creating a new instance of it, and then use the Append() method to concatenate strings to it. You will better understand this process when you will learn the next chapter. For the time being, just remember that StringBuilder is a MUCH more efficient way of concatenating strings.