commandline screenshot (van een ander programma)

Static HASH algorithm implementation in C

C does not allow strings to be used in switch/case constructions. One logical solution is to use an if else if construction with string compares. But that are very slow and in my application (an embedded system that parse XML data) this overhead was a problem.

I searched for a static HASH algorithm, but could not found an algorithm that could be used in a switch case construction. Basically because C expect a constant value behind a case, not a function call. On one site I found an almost static HASH algorithm: http://lolengine.net/blog/2011/12/20/cpp-constant-string-hash, and the inspiration how to solve the issue.

The static HASH function defined here could only be used with character strings in its call. For example: HASH_S16(“test”) the value could not be a pointer to an array, or just a plain pointer. When the hash value from a generic C string is required, a separate function is available to calculate the HASH.

Example how to use:

swich (hash_string(*p))
{
case HASH_S16("test"):
    ...
    break;
case HASH_S256("An example with a long text!!!!!!!!!!!!!!!!!"):
    ...
    break;
}

Simple HASH functions like this one have a serious risk on collisions. I wrote a small C# test program to test if this will be a big issue. I load a couple of lists with words, 144928 different words in total, and found this collisions (I represent the words in the iso_8859_1 character set):

APPOINTEE – FORELEGS

HOOP – referents

REFLATION – sociable

Lists:

http://www.mieliestronk.com/corncob_caps.txt

http://www.mieliestronk.com/corncob_lowercase.txt

And 4 lists from (in English, Dutch, German and French):

http://www.streetsmartlanguagelearning.com/2008/12/top-10000-words-in-dutch-english-french.html

So collisions with normal text strings are possible but happen not too often. In my case, no problem at all because the switch case construction give compile time errors in case of equal values that could be fixed by doing a final string compare.

The HASH_S256 macro could work with strings up to 256 bytes, but my pre-processor work terrible slow with this macro. So I decide to also make a smaller one, HASH_S16 that take up to 16 byte. That work much, much faster.

Download the the C file and the header file (bsd license).