Using String.Intern to save memory and increase performance

[The post has been edited 2016-03-19 to include two more downsides to using string.Intern and a couple of code examples].

As you know, all strings in .NET are immutable. You can never change the contents of a string, only create new. I’ve written a post about allocating strings by accident in [Never compare with ToLower].

All unique literal strings in a .NET process are stored as a single instance. If we for instance in 10 places in our program use the literal string “pounds”, it is stored at one single place in memory and all usages are references to that instance which is stored in the string pool.

Consider the following code:

Now, what if you where to load some serious amounts of data into you processes memory, and some of that data where strings. It might be units for weights, first and last names, favorite color etc.

How about loading a million instances of the following class into memory:

Having a million order lines, we can almost guess that ArticleNumber and Unit will not be unique for every instance. If the data is stored in a database we can query the database and figure out. If we have 400 different article numbers and 2 different units (Pieces and Kilograms), we can actually save quite some memory plus speed up using them by using the string pool. Let’s assume that every article number averages to 10 characters and the unit averages to 7,5.

Size matters….

Without using the string pool, the article number and unit for this 1 000 000 order lines use 17 500 000 characters = 35 000 000 bytes / (1024*1024) = 33 MB memory. Not including the object references themselves (4 bytes per string for 32 bit processes and 8 bytes for 64 bit processes).

If we were to use the string pool, we would have 400 articles numbers * 10 characters * 2 (to get bytes) = 8000 + 30 bytes for the units / 1024 = 7,8 KB. WE HAVE SAVED ALMOST 33 MB OF MEMORY.

Now this is of course pretty cool. But you mentioned something about performance.

Yes I did, thank you for asking. If all articles numbers are in memory, and are the same instance, comparing two article numbers where both belong to the string pool is a matter of comparing two integers which is one processor instruction.

NOW TELL US HOW WE USE IT!

I thought you’d never ask…

Getting a string into the string pool / getting a reference to it if it already exists in the string pool is a matter of calling String.Intern(stringToHavePooled). You could also create your own string pool by using a hashset or dictionary if you like.

Check this example out, which is an extension of our first example using the .NET string pool:

If two strings have the same memory reference, the == or .Equals function will actually do an Object.ReferenceEquals check first, so your existing code can run faster if the strings you provide it with have the same reference, like interned strings.

This is an extract from the String class in the .NET framework:

Are there any Drawbacks of using String.Intern and the string pool?

Yes. You can not remove a string from the string pool. Once it is in there, it stays there for the lifetime of the process. The process will increase a little in memory permanently when “interning” a string.

This is generally not a problem, but if you think it is, how about implementing your own string pool?

[Edit 2016-03-19]

Joel W Kall pointed out two more drawbacks to using string.Intern which I should mention. Thanks Joel!

1. When interning, the runtime searches the string pool to look for a previously interned string that matches the one you are trying to intern. The documentation is not clear on how, but I would assume it uses some kind of hash table. This should generally be fast, but might have an impact in some scenarios.

Erik: Yes, it’s important to consider the possible performance impact of interning strings. Also, the hash table itself will add some overhead so it’s important to measure and make sure that you actually benefit from interning.

These two code blocks demonstrates the difference:

It takes about 1500 ms on my laptop to generate and add 10 million strings from integers to a list, consuming about 500 MB memory. If we instead intern them, it takes about 7500 ms and consumes 1200 MB memory:

2. The string to intern is still allocated, along with any new string created in the string pool. This means that each unique string will be allocated twice, although one of them will be garbage collected later. Thats another reason to be careful not to intern when there are many unique strings.

Erik: Only the strings that already exist in the string pool are allocated “unnecessarily”. Interning a string will insert that instance of the string into the string pool. If it already exists in the string pool, string.Intern will return the reference the interned string and the “old” string is garbage collected. Look at this example:

This writes “100000 IsInterned = false” and “Are the the same instance? true”.

string.IsInterned will return the string pool instance of the string if it exists, or null if does not.

To make an example of where string.intern will save us memory, I’ve made a small modification to the previous example. The code below will only consume 110 MB memory.

[End of edit]

That’s all for now folks!

6 thoughts on “Using String.Intern to save memory and increase performance

  1. Joel W Kall

    Great article! Although you omitted a couple of downsides:

    1. When interning, the runtime searches the string pool to look for a previously interned string that matches the one you are trying to intern. The documentation is not clear on how, but I would assume it uses some kind of hash table. This should generally be fast, but might have an impact in some scenarios.

    2. The string to intern is still allocated, along with any new string created in the string pool. This means that each unique string will be allocated twice, although one of them will be garbage collected later. Thats another reason to be careful not to intern when there are many unique strings.

    Reply
  2. Mark Waterman

    I’ve been playing with string interning a lot lately and this is the cost/benefit analysis I’ve seen so far… My conclusion has been that, in most cases, the costs that you (and Joel) have identified will outweigh the benefits unless you’re confident that your app has a predictable workload of repetitive strings.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *