JMU
Hashmaps
An Introduction


Prof. David Bernstein
James Madison University

Computer Science Department
bernstdh@jmu.edu


Motivation
Objectives
A Definition of a Map

In addition, one could add an isEmpty() operation, an isFull() operation (if there is a size limit), a remove() operation, and a makeEmpty() operation.

Hashing
Categories of Hash Functions
Perfect Each key is converted into a unique integer.
Imperfect Multiple keys are converted into the same integer.
Minimal Perfect In addition to being perfect, the set of \(n\) keys is converted into the set of integers \(\{ 0, 1, 2, ... n-1 \}\)
Minimal Perfect Hash Functions

An Example: We need to "index" a collection of books written by Sue Grafton because we are developing an "instant printing" service that charges by the page

Title Pages
A Is for Alibi 214
B Is for Burglar 211
C Is for Corpse 212
D Is for Deadbeat 239
E Is for Evidence 199
F Is for Fugitive 308
G Is for Gumshoe 341
H Is for Homicide 287
I Is for Innocent 343
J Is for Judgment 376
K Is for Killer 307
L Is for Lawless 322
M Is for Malice 337
N Is for Noose 322
O Is for Outlaw 318
Minimal Perfect Hash Functions (cont.)

We would like a hash function that converts each of these titles into a unique integer in the closed interval [0, 15]. One such function is:

    int graftonHash(char *key)
    {
      int result;

      result = key[0] - 'A';
      return result;
    }
   
Minimal Perfect Hash Functions (cont.)

General Minimal Perfect Hash Functions

Cichelli, R.J. (1980) "Minimal Perfect Hash Functions Made Simple", Communications of the ACM, Vol. 23, pp. 17-19.

Czech, Z.J. and Majewski, B.S. (1993) "A Linear Time Algorithm for Finding Minimal Perfect Hash Functions", Computer Journal, Vol. 36, pp. 579-587.

Fox, E.A., Heath, L.S., Lenwood, S., Chen, Q.F. and Daoud, A.M. (1992) Practical Minimal Perfect Hash Functions for Large Databases, Communications of the ACM, Vol. 35, pp. 105-121.

Haggard, G. and Karplus, K. (1986) "Finding Minimal Perfect Hash Functions", SIGCSE Bulletin, Vol. 18, pp. 191-193.

Perfect Hash Functions

An Example: We have a collection of holidays and we want to "index them" by date (where each date is in the form MMDD). We could use MM*100 + DD but...

    // Hash a date into the closed interval [0, 365]
    int hashDate(char *mm, char *dd)
    {
      int   d, result, m;
      int   days[12] = {0,31,60,91,121,152,182,213,244,274,305,335};


      m = atoi(mm);   // Convert a string to an int
      d = atoi(dd);

      result = days[m-1] + d;
      return result;
    }
    
Imperfect Hash Functions

The Date Example (cont.)

    int hashDate2(char *mm, char *dd)
    {
      int   d, result, m;


      m = atoi(mm);   // Convert a string to an int
      d = atoi(dd);

      result = d * m - 1;
      return result;
    }
    

This is an imperfect hash function because more than one key can generate the same integer. For example, March 10 hashes to 29. and October 3 also hashes to 29.

Imperfect Hash Functions (cont.)
Imperfect Hash Functions (cont.)
Imperfect Hash Functions (cont.)
Imperfect Hash Functions (cont.)
Imperfect Hash Functions (cont.)

The ELFhash (Used by the UNIX Executable and Linking Format)

    unsigned long ELFhash(const char *key) {

      unsigned long  h, g;

      while (*key) {

        h = (h << 4) + *key++;
        if (g = h & 0xF0000000) h ^= g >> 24; // This is = not ==

        h &= ~g; // This is sometimes incorrectly listed as h&=g
      }

      return h;
    }
   

where & is the bitwise logical AND operator, ^ is the bitwise XOR opertor, ~ is the bitwise NOT operator, >> is the shift right operator, and << is the shift left operator.

Hashmaps with Known (a priori) Keys
Hashmaps with Known Keys (cont.)
Linear Probing
Chaining
Rehashing
Hashmaps with Unknown Keys
Efficiency/Performance
Efficiency/Performance (cont.)

Typical Results

Linear Probing Double Hashing
\(a\) Successful Unsuccessful Successful Unsuccessful
0.10 1.1 1.1 1.1 1.1
0.20 1.1 1.3 1.1 1.2
0.30 1.2 1.5 1.2 1.4
0.40 1.3 1.9 1.3 1.7
0.50 1.5 2.5 1.4 2.0
0.60 1.8 3.6 1.5 2.5
0.70 2.2 6.1 1.7 3.3
0.80 3.0 13.0 2.0 5.0
0.90 5.5 50.5 2.6 10.0
Choosing a Size