How to Hash String into GUID in C#
Eternal problem: how to turn Hello World! into 2ed51d39-b424-8f5d-bc12-4be4f3fbc6fc?
I made a Word Search game, and part of the code is saving the words which were guessed.
data:image/s3,"s3://crabby-images/e8a02/e8a020d26ab7752d7834ef60bf117201f71a713f" alt="Hello World"
To save memory and disk space (there are around 5000 words in 100 categories kept in memory during the game), I used a HashSet with word IDs.
The issue was that it was still marked as guessed whenever I fixed the word spelling, and when I changed its ID, it became unguessed.
I've decided to use GUIDs generated directly from the word (string).
I needed something which generates GUIDs fast and makes unique GUIDs for each word.
I came up with the following extension function, which is also available on GitHub.
public static class StringExtensions
{
private const ulong KnuthInitValue = 3074457345618258791ul;
private const ulong KnuthMultiValue = 3074457345618258799ul;
// Based on https://stackoverflow.com/questions/9545619/a-fast-hash-function-for-string-in-c-sharp
public static Guid ToGuid(this string str)
{
// Double Knuth hash
ulong hashedValueEven = KnuthInitValue;
ulong hashedValueOdd = KnuthInitValue;
for (int i = 0; i < str.Length; i++)
{
if (i % 2 == 0)
{
hashedValueEven += str[i];
hashedValueEven *= KnuthMultiValue;
}
else
{
hashedValueOdd += str[i];
hashedValueOdd *= KnuthMultiValue;
}
}
return new Guid(new byte[] {
(byte)(0xff & hashedValueEven),
(byte)(0xff & hashedValueEven >> 8),
(byte)(0xff & hashedValueEven >> 16),
(byte)(0xff & hashedValueEven >> 24),
(byte)(0xff & hashedValueEven >> 32),
(byte)(0xff & hashedValueEven >> 40),
(byte)(0xff & hashedValueEven >> 48),
(byte)(0xff & hashedValueEven >> 56),
(byte)(0xff & hashedValueOdd),
(byte)(0xff & hashedValueOdd >> 8),
(byte)(0xff & hashedValueOdd >> 16),
(byte)(0xff & hashedValueOdd >> 24),
(byte)(0xff & hashedValueOdd >> 32),
(byte)(0xff & hashedValueOdd >> 40),
(byte)(0xff & hashedValueOdd >> 48),
(byte)(0xff & hashedValueOdd >> 56),
});
}
}
It uses two primes to shuffle things around. It generates a 64-bit hash for odd letters, separate 64-bit-hash for even letters, and squeezes them into a GUID.
Console.WriteLine("Hello World!".ToGuid());
> 2ed51d39-b424-8f5d-bc12-4be4f3fbc6fc
To test uniqueness, I've generated GUIDs for over 28 million English words and word-like strings from the Google Books project, using Google Books Reducer data.
There were no collisions, which is not surprising, given that there are 3.4e+38 combinations in a 128-bit GUID.
I've also tested how fast it was. On my beefy, Windows-11-unworthy Xeon E3, it generated 12M (millions) hashes per second.
I compared it with MD5 and SHA256. It is 4.4 times faster than MD5 and 6.7 times faster than SHA256.
That is hardly a surprise; it trades security (which I didn't need) for speed, simplicity and a small hash size.