Hashmaps
An Introduction |
Prof. David Bernstein |
Computer Science Department |
bernstdh@jmu.edu |
int
indexint
and aren't
sequentialint
Values: | Homogeneous elements of any type | |||||||||||||||||||||
Operations: |
|
In addition, one could add an isEmpty()
operation, an
isFull()
operation (if there is a size limit), a
remove()
operation, and a
makeEmpty()
operation.
Perfect | Each key is converted into a unique integer. |
Imperfect | Multiple keys are converted into the same integer. |
Minimal Perfect | In addition to being perfect, the set of \(n\) keys is converted into the set of integers \(\{ 0, 1, 2, ... n-1 \}\) |
An Example: We need to "index" a collection of books written by Sue Grafton because we are developing an "instant printing" service that charges by the page
Title | Pages |
A Is for Alibi | 214 |
B Is for Burglar | 211 |
C Is for Corpse | 212 |
D Is for Deadbeat | 239 |
E Is for Evidence | 199 |
F Is for Fugitive | 308 |
G Is for Gumshoe | 341 |
H Is for Homicide | 287 |
I Is for Innocent | 343 |
J Is for Judgment | 376 |
K Is for Killer | 307 |
L Is for Lawless | 322 |
M Is for Malice | 337 |
N Is for Noose | 322 |
O Is for Outlaw | 318 |
We would like a hash function that converts each of these titles into a unique integer in the closed interval [0, 15]. One such function is:
int graftonHash(char *key) { int result; result = key[0] - 'A'; return result; }
Cichelli, R.J. (1980) "Minimal Perfect Hash Functions Made Simple", Communications of the ACM, Vol. 23, pp. 17-19.
Czech, Z.J. and Majewski, B.S. (1993) "A Linear Time Algorithm for Finding Minimal Perfect Hash Functions", Computer Journal, Vol. 36, pp. 579-587.
Fox, E.A., Heath, L.S., Lenwood, S., Chen, Q.F. and Daoud, A.M. (1992) Practical Minimal Perfect Hash Functions for Large Databases, Communications of the ACM, Vol. 35, pp. 105-121.
Haggard, G. and Karplus, K. (1986) "Finding Minimal Perfect Hash Functions", SIGCSE Bulletin, Vol. 18, pp. 191-193.
An Example: We have a collection of holidays and we want to "index them" by date (where each date is in the form MMDD). We could use MM*100 + DD but...
// Hash a date into the closed interval [0, 365] int hashDate(char *mm, char *dd) { int d, result, m; int days[12] = {0,31,60,91,121,152,182,213,244,274,305,335}; m = atoi(mm); // Convert a string to an int d = atoi(dd); result = days[m-1] + d; return result; }
The Date Example (cont.)
int hashDate2(char *mm, char *dd) { int d, result, m; m = atoi(mm); // Convert a string to an int d = atoi(dd); result = d * m - 1; return result; }
This is an imperfect hash function because more than one key can generate the same integer. For example, March 10 hashes to 29. and October 3 also hashes to 29.
unsigned long ELFhash(const char *key) { unsigned long h, g; while (*key) { h = (h << 4) + *key++; if (g = h & 0xF0000000) h ^= g >> 24; // This is = not == h &= ~g; // This is sometimes incorrectly listed as h&=g } return h; }
where &
is the bitwise logical AND operator,
^
is the bitwise XOR opertor, ~
is the bitwise NOT operator, >>
is the
shift right operator, and <<
is the
shift left operator.
%
operator) until an empty
slot is found\(A_2\) | \(A_3\) | \(A_5\) |
\(A_2\) | \(A_3\) | \(A_5\) | \(B_5\) |
\(A_2\) | \(A_3\) | \(B_2\) | \(A_5\) | \(B_5\) |
\(A_2\) | \(A_3\) | \(B_2\) | \(A_5\) | \(B_5\) | \(C_2\) |
Linear Probing | Double Hashing | |||
\(a\) | Successful | Unsuccessful | Successful | Unsuccessful |
0.10 | 1.1 | 1.1 | 1.1 | 1.1 |
0.20 | 1.1 | 1.3 | 1.1 | 1.2 |
0.30 | 1.2 | 1.5 | 1.2 | 1.4 |
0.40 | 1.3 | 1.9 | 1.3 | 1.7 |
0.50 | 1.5 | 2.5 | 1.4 | 2.0 |
0.60 | 1.8 | 3.6 | 1.5 | 2.5 |
0.70 | 2.2 | 6.1 | 1.7 | 3.3 |
0.80 | 3.0 | 13.0 | 2.0 | 5.0 |
0.90 | 5.5 | 50.5 | 2.6 | 10.0 |
Number | Closest Prime |
100 | 97 |
250 | 241 |
500 | 499 |
1000 | 997 |
1500 | 1499 |
2000 | 1999 |
5000 | 4999 |
7500 | 7499 |
10000 | 9973 |