I very rarely use dictionary keys other than built-in types such as strings or ints – there are not that many cases where a user defined class is necessary in the types of systems I work on (administrative and financial systems mostly). So I came upon a few fun challenges while implementing the NBEncode’s BDictionaries, which represent bencode’s dictionaries.

These dictionaries, or hash tables if you prefer, must use a BByteString object as key. BByteString objects are the library’s internal representation of bencode byte strings. They are really just arrays of bytes. A BByteString object may represent a text string, but this is entirely context dependent, and they should always be considered binary data when used as dictionary keys.

For more information on bencode, bencode dictionaries and byte string, check out the link to the
bencode specification at the end of this post.

Since the standard string implementation of GetHashCode goes out the window it is necessary to create a custom one for BByteStrings. I chose to go for a well-known hashing scheme based on a polynomial like this:

hash = c[b0] + c^2[b1] + c^3[b2] + ... + c^N[bN]

b0,b1 etc are the bytes at index zero, one and so on from the byte string used as dictionary key. The constant c is often chosen to be 37 or 31, because this gives a reasonably good spread of hash values.

The implementation of GetHashCode() for BByteString objects thus looks like this:

public class BByteString : BObject<byte[]>
{
	private const long maxBytesToUseInHash = 12;
	private const long hashCoeff = 37;
	// other code not shown in this example

	public override int GetHashCode()
	{
		long hashValue = 0;
		long maxTableSize = (long) UInt32.MaxValue;
		long bytesToHash = Value.LongLength < 12 ? Value.LongLength : maxBytesToUseInHash;

		for (long i = 0; i < bytesToHash; i++)
		{
			hashValue = (hashCoeff * hashValue + (long)Value[i]) % maxTableSize;
		}

		int returnValue = (int)(hashValue - (long)Int32.MaxValue) - 1;
		return returnValue;
	}
}

The loop that calculates the hash value uses Horner’s rule for calculating those kinds of polynomials. It is also known as Horner’s scheme. The number of bytes to hash (“bytesToHash”) is capped to prevent an unecessarily long calculation. Horner’s rule is often taught in undergraduate Algorithms and Data Structure courses.

Close, but no cigar..

The hashing scheme for BByteStrings worked well when inserting key-value pairs in the BDictionary objects. However finding and fetching objects from the BDictionaries would fail. Debugging the hashing algorithm did not reveal anything, as different instances of the same byte strings always produced the same hash value. So it seemed that I forgot something else.

After a little headscratching ILSpy was brought in to have a look at the .NET Dictionary<K,V> implementation. (ILSpy is available at http://wiki.sharpdevelop.net/ILSpy.ashx).
Beginning with the Dictionary’s ContainsKey-method, I ended up at the FindEntry-method shown below:

// System.Collections.Generic.Dictionary<TKey, TValue>
private int FindEntry(TKey key)
{
	if (key == null)
	{
		ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
	}
	if (this.buckets != null)
	{
		int num = this.comparer.GetHashCode(key) & 2147483647;
		for (int i = this.buckets[num % this.buckets.Length]; i >= 0; i = this.entries[i].next)
		{
			if (this.entries[i].hashCode == num && this.comparer.Equals(this.entries[i].key, key))
			{
				return i;
			}
		}
	}
	return -1;
}

In the inner if-test it does a check of the keys’s hash code, but it also checks for object equality between the existing key and the passed in key. Since the default implementation of object.Equals just checks for referential equality this would not work. No keys other than the original key object would ever be found equal!

As often is the case the solution was simple enough. The BByteString Equals method ended up looking like this:


public class BByteString : BObject<byte[]>
{
	// other code omitted

	public override bool Equals(object obj)
	{
		bool equals = false;
		if (obj is BByteString)
		{
			var byteString = obj as BByteString;
			equals = Value.IsEqualWith(byteString.Value);
		}
		return equals;
	}
}

Since BByteStrings are really byte arrays, it is necessary to compare the both object’s byte arrays to each other to determine if they are equals. The method uses a byte array extension method called IsEqualWith, which is also included in the NBEncode library. It simply loops over both arrays, checking for differences. A possible future optimization on this could be to calculate and store a hash values of the byte arrays, and compare that instead – since those values could be stored and reused.

All in all solving little mysteries like this make it interesting and satisfying to code, and you learn something in the process. I think this does it for NBEncode for now, and I am looking forward to doing other hobby projects.

Resources

A concise overview of the “bencode” encoding can be found here:
The BEncode specification (and more)

NBEncode is available from this link:
NBencode, a bencode library for .NET

Advertisements