Posts Tagged Hashtable

Another thoughts about hash tables.

As you probably understand from my previous posts, I working a lot with hash tables. During my work on the current project, I saw people doing so badly things with them, so I decided to write something on this topic.

In my atricle about hash table keys I discussed how is bad to use value types as keys (this is applied to non generic hash table). Today I saw another stupid thing, which is worth to mention. Consider the following code:

if(!_hashTable.ContainsKey(someKey))
{
  hashTable.Add(someKey, someValue);
}

You see, each time you calling to ContainsKey, it has to call GetHashCode(), find proper bucket in the hash table and then, in case of collisions, go over all entities of that bucket. After this you adding your value, and during this the same is doing again. Much cheaper is just to add value in try/catch block, and catch ArgumentException in case of duplication. Even more, you can use indexer – in this case the duplicate will be overwritten, without exception being thrown.

The same with removing values:

//Bad Code!!
if(_hashTable.ContainsKey(someKey))
{
  _hashTable.Remove(someKey);

In any case, no exception is thrown. Remove method returns true or false according to success of operation. The same with getting value from hash table. Instead of calling to ContainsKey method, much cheaper to do that in try/catch block, and catch KeyNotFoundException. You should use ContainsKey only when you have to do some action based on the result, but this action should not be done on hash table (or else you will pay twice).

Another point: are you sure you need to use hash table? Because if you adding and removing objects in high rate, probably you should consider to use another data structure. For many people I know, hash table looks as best data structure because of O(1) add/search/remove operations. However, this O(1) can take significant time. Each access to hashtable requires hash computation, which is relative slow operation (even if it done in constant number of steps). In case you have collisions, you will have to go over all entries of the bucket. And in case you reaching hash table capacity, you have to rebuild and rehash the whole table. Therefore, when the number of objects is not so big (I didn’t tested for exact limit) is better use other data structure, such as balanced tree (if n is not big, log(n) can be faster than O(1) – think about this). Anyway, if you going to use hashtable, make it BIG (at least twice bigger than you need) to avoid rebuilds and collisions as much as possible. By default, the size of hashtable (if I remember correctly) is 17.

Important remark: from people responses, I did a conclusion that I have to explain better the main point: you can actually improve performance by using try/catch block, only when GetHashCode is expensive operation. For example, GetHashCode from integer (in default implementation) is very fast – the hash code is the integer itself. In this case using try/catch block is bad option, it will perform slower. However, when you calling GetHashCode on reference type key (espessially on string) – it is better to use trycatch than call to ContainsKey. Today, when I tryed to explain that to some guy, I did test with string key (10-12 chars). Implementation with try/catch block performed 25% faster. This leads us to important conclusions: to achieve best performance you have to understand whats going on under the hood, and for each rule the is an exception.

Add comment March 4, 2008

Key’s Equality in Hashtables (or who equal to who?)

Consider the following simple class:

public class SomeClass
{
uint _id = 125;public SomeClass(uint id)
  {
  _id = id;
  }
public override int GetHashCode()
  {
return (int)_id;
  }
}

We want to add instance of this class to hashtable, where the key is object’s ID, and the value is object itself. If we doing the following, it works:

SomeClass someClass = new SomeClass(125);
//1.

//========
hashTable[someClass.GetHashCode()] = someClass;
object returnValue = hashTable[125];
 

The returnValue is exactly our someClass instance. However, using value types as keys as bad, due to boxing each time key is passed to the hashtable. How we can improve this? Well, as known, inside hashtable ‘GetHashCode’ method is being called to determine the key. Therefore we can pass as a key the object itself, and override the ‘GetHashCode’ method (which is exactly what we did). Let’s try this code:

//2.
//===========
hashTable[someClass] = someClass;
SomeClass key =
new SomeClass(125);
returnValue = hashTable[key];

If you will run this code, you will see the returnValue is null. So what is wrong? When you trying to retrieve your object from hashtable and passing the key, hashtable using ‘Equals’ method, to find the right key. By default, after confirming the hashcode is the same, it comparing references, so in the code above the result is false. Obvious, the solution is to override the ‘Equals’ method (here is the one possible solution):

public override bool Equals(object obj)
{
return (_id == obj.GetHashCode());
}

If we run this code, that works ok. Now, we want to retrieve the object by passing the ID directly, without creating wrapper around it (as I said before, this is bad because of boxing – don’t do that in production code):

hashTable[someClass] = someClass;
returnValue = hashTable[125];

We overrided the Equals method, so this will work? No, it will not. You see, the hashtable comparing the passed key to the stored keys, and not storing keys to passed one. As a result, the ‘Equals’ method is invoked on the passed key – in our example it’s an integer – and it never equal to SomeClass, ofc. The conlcusion is: if you using your custom classes as hashtable keys, always override ‘GetHashCode’ and ‘Equals’ methods. And remember to use only your classes for the lookup inside the hashtable.

2 comments January 15, 2008


Categories

Top Posts

Tags

.NET addin app.config ArrayList bug CAB Configuration ConfigurationManager ConfigurationSection ContentControl ContextMenu CTime; DateTime custom keys DataBinding DataContext Data templates debugging equals gethashcode GUI Hashtable interlocked Invoke lock lock free memcpy MFC multithreading multithreading; lock free override performance SCSF serialization Smart Client Software Factory Styles System.Configuration unsafe virtual functions Visual Studio wait free WinAPI WinForms WinForms\WPF Integration World of Warcraft World of Warcraft; Addon

Archives