**Q:**
What is hashing?

**A:**
Hashing is the process of
mapping
strings
to integers,
usually in a relatively small range.
A ``hash function'' maps a string
(or
some
other data structure)
to
a bounded number
(the ``hash bucket'')
which
can more easily
be
used
as an index in an array,
or for performing repeated comparisons.
(Obviously,
a mapping from a potentially huge set of strings
to a small set of integers
will not be unique.
Any
algorithm using hashing
therefore has to deal with the possibility of
``collisions.'')

Many hashing functions and related algorithms have been developed; a full treatment is beyond the scope of this list. An extremely simple hash function for strings is simply to add up the values of all the characters:

unsigned hash(char *str) { unsigned int h = 0; while(*str != '\0') h += *str++; return h % NBUCKETS; }A somewhat better hash function is

unsigned hash(char *str) { unsigned int h = 0; while(*str != '\0') h = (256 * h + *str++) % NBUCKETS; return h; }which actually treats the input string as a large binary number (

When the set of strings is known in advance, it is also possible to devise ``perfect'' hashing functions which guarantee a collisionless, dense mapping.

References:
K&R2 Sec. 6.6

Knuth Sec. 6.4 pp. 506-549 Volume 3

Sedgewick Sec. 16 pp. 231-244

about this FAQ list about eskimo search feedback copyright