We should almost never compare strings with ToLower

Let’s look at an example which is very common in code. It’s kind of a solution many coders find out by themselves, or get inspired by looking at others code.

It works, what’s wrong with it? Is there another way?

First, I’ll tell you what’s wrong with it 🙂

The code above will be pretty slow and use plenty of memory, unnecessarily.

Calling ToLower() returns a new string with the string’s contents in lower case.
For each of the fictional 10 000 persons e-mail addresses returned by GetAllAddresses() we allocate 2 new strings on the managed heap (more about managed heap in another post). This means we allocate 20 000 new strings, just to find the persons matching the e-mail address.

I’ve created a little table where I’ve calculated a little 🙂

What? How many/how much?
Number of e-mail addresses 10 000
Average e-mail address length 25
Avg. e-mail address length in bytes 50
Number of generated strings per compare 2
Total number of generates strings 20 000
Total size of allocated strings 1 000 000 bytes
977 KB (1 000 000/1 024)
0,95 MB (977/1 024)
Number of lookups/searches in example 1 000
Total memory used for searching 953,67 MB

To sum the above table: Each search creates 20 000 new strings with the average size of 50 bytes = 0,95 MB. Doing a 1 000 searches will allocate about 950 MB of memory. That seem a little excessive, don’t you think?

.NET manages memory with a great feature called the garbage collector. Among other things, a garbage collection (where .NET reclaims available memory) is triggered by the number of memory allocations and the size of data allocated.

If we have plenty of memory and the garbage collector do not collect (free) the allocated strings, GetPersonsMatchingEmailAddress could make the process memory usage grow with 950 MB. (It’s more likely though, that the garbage collector will collect memory a couple of times during the 1000 calls to preserve memory.)

Allocating new memory in .NET is really fast, but not entirely cost free. Copying the data to the newly allocated memory takes some time and Garbage collections, depending on the number of objects in memory and a couple of other factors like how long they have lived, can be costly.

Solution

This is the easiest solution (which this blog post is about), with minimal code changes.

Instead of 2x ToLower(), we use:

This will compare the two strings case insensitive, without allocating memory. So no memory allocation, no memory copying and no garbage collection. Have a look at String.Compare @ MSDN.

Here is the full example, including both the slower and the faster version of GerPersonsMatchinEmailAddress:

As a last note, I would like to take the opportunity to say that there are other ways to make this even faster:

  • If the contents of the addresses do not change very often, creating a Lookup (with ToLookup(.., StringComparer.OrdinalIgnoreCase)) or a Dictionary<string, Person>(StringComparer.OrdinalIgnoreCase). This will of course use memory for keeping track of the indexed data but the lookups are very fast
  • If you could ensure that all e-mail addresses where lower case from the beginning, would only need to call ToLower() on emailAddressToMatch, which would be faster than string.Compare.

There are many many more, but the purpose of this blog post is to promote the use of String.Compare instead of ToLower().
Thanks,
Erik

6 thoughts on “We should almost never compare strings with ToLower

  1. Pingback: Using String.Intern to save memory and increase performance – Erik Bergman's .NET blog

  2. Jeff LeBert

    Use string.Equals(s1, s2, StringComparison.OrdinalIgnoreCase) instead of string.Compare(…). String.Equals returns a Boolean so you don’t have to do the extra comparison. Much easier to read.

    The other big thing is to the Turkish “I” problem. In Turkish, the uppercase of the letter “i” is not an uppercase “I”. It looks like an uppercase “I” with a dot over it. Likewise the lowercase of “I” is what looks like a lowercase “i” without the dot over it. If you do
    “FILE”.ToLower() == “file”.ToLower()
    and your culture is Turkish then the above will return false.

    I always recommend that people read
    https://msdn.microsoft.com/en-us/library/ms973919.aspx
    It talks about all the different string related issues and specifically talks about the Turkish “I” problem above.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *