|
Hashmaps
An Introduction |
|
Prof. David Bernstein |
| Computer Science Department |
| bernstdh@jmu.edu |
int indexint and aren't
sequentialint
| Values: | Homogeneous elements of any type | |||||||||||||||||||||
| Operations: |
|
In addition, one could add an isEmpty() operation, an
isFull() operation (if there is a size limit), a
remove() operation, and a
makeEmpty() operation.
| Perfect | Each key is converted into a unique integer. |
| Imperfect | Multiple keys are converted into the same integer. |
| Minimal Perfect | In addition to being perfect, the set of \(n\) keys is converted into the set of integers \(\{ 0, 1, 2, ... n-1 \}\) |
An Example: We need to "index" a collection of books written by Sue Grafton because we are developing an "instant printing" service that charges by the page
| Title | Pages |
| A Is for Alibi | 214 |
| B Is for Burglar | 211 |
| C Is for Corpse | 212 |
| D Is for Deadbeat | 239 |
| E Is for Evidence | 199 |
| F Is for Fugitive | 308 |
| G Is for Gumshoe | 341 |
| H Is for Homicide | 287 |
| I Is for Innocent | 343 |
| J Is for Judgment | 376 |
| K Is for Killer | 307 |
| L Is for Lawless | 322 |
| M Is for Malice | 337 |
| N Is for Noose | 322 |
| O Is for Outlaw | 318 |
We would like a hash function that converts each of these titles into a unique integer in the closed interval [0, 15]. One such function is:
int graftonHash(char *key)
{
int result;
result = key[0] - 'A';
return result;
}
Cichelli, R.J. (1980) "Minimal Perfect Hash Functions Made Simple", Communications of the ACM, Vol. 23, pp. 17-19.
Czech, Z.J. and Majewski, B.S. (1993) "A Linear Time Algorithm for Finding Minimal Perfect Hash Functions", Computer Journal, Vol. 36, pp. 579-587.
Fox, E.A., Heath, L.S., Lenwood, S., Chen, Q.F. and Daoud, A.M. (1992) Practical Minimal Perfect Hash Functions for Large Databases, Communications of the ACM, Vol. 35, pp. 105-121.
Haggard, G. and Karplus, K. (1986) "Finding Minimal Perfect Hash Functions", SIGCSE Bulletin, Vol. 18, pp. 191-193.
An Example: We have a collection of holidays and we want to "index them" by date (where each date is in the form MMDD). We could use MM*100 + DD but...
// Hash a date into the closed interval [0, 365]
int hashDate(char *mm, char *dd)
{
int d, result, m;
int days[12] = {0,31,60,91,121,152,182,213,244,274,305,335};
m = atoi(mm); // Convert a string to an int
d = atoi(dd);
result = days[m-1] + d;
return result;
}
The Date Example (cont.)
int hashDate2(char *mm, char *dd)
{
int d, result, m;
m = atoi(mm); // Convert a string to an int
d = atoi(dd);
result = d * m - 1;
return result;
}
This is an imperfect hash function because more than one key can generate the same integer. For example, March 10 hashes to 29. and October 3 also hashes to 29.
unsigned long ELFhash(const char *key) {
unsigned long h, g;
while (*key) {
h = (h << 4) + *key++;
if (g = h & 0xF0000000) h ^= g >> 24; // This is = not ==
h &= ~g; // This is sometimes incorrectly listed as h&=g
}
return h;
}
where & is the bitwise logical AND operator,
^ is the bitwise XOR opertor, ~
is the bitwise NOT operator, >> is the
shift right operator, and << is the
shift left operator.
% operator) until an empty
slot is found| \(A_2\) | \(A_3\) | \(A_5\) |
| \(A_2\) | \(A_3\) | \(A_5\) | \(B_5\) |
| \(A_2\) | \(A_3\) | \(B_2\) | \(A_5\) | \(B_5\) |
| \(A_2\) | \(A_3\) | \(B_2\) | \(A_5\) | \(B_5\) | \(C_2\) |
| Linear Probing | Double Hashing | |||
| \(a\) | Successful | Unsuccessful | Successful | Unsuccessful |
| 0.10 | 1.1 | 1.1 | 1.1 | 1.1 |
| 0.20 | 1.1 | 1.3 | 1.1 | 1.2 |
| 0.30 | 1.2 | 1.5 | 1.2 | 1.4 |
| 0.40 | 1.3 | 1.9 | 1.3 | 1.7 |
| 0.50 | 1.5 | 2.5 | 1.4 | 2.0 |
| 0.60 | 1.8 | 3.6 | 1.5 | 2.5 |
| 0.70 | 2.2 | 6.1 | 1.7 | 3.3 |
| 0.80 | 3.0 | 13.0 | 2.0 | 5.0 |
| 0.90 | 5.5 | 50.5 | 2.6 | 10.0 |
| Number | Closest Prime |
| 100 | 97 |
| 250 | 241 |
| 500 | 499 |
| 1000 | 997 |
| 1500 | 1499 |
| 2000 | 1999 |
| 5000 | 4999 |
| 7500 | 7499 |
| 10000 | 9973 |