Professor Abdul-Quader
Lesson 9 (Sets / Hashtable Intro)
The following question was posed to me on a phone interview.
Given a list of 2,000,000 unique 7-digit phone numbers, output them in order using no more than 4MB of additional memory.
Each integer requires 4 bytes of space. Can we store them all in memory? If not, how do we sort them if we can’t store all at once?
Discuss. Design your best solution. (5-10 minutes.)
Prove (by induction) that there are \(2^{h+1} - 1\) nodes in a complete binary tree of height \(h\), assuming the last level is completely filled.
To prove a statement for every number \(n\):
A set is a gathering together into a whole of distinct objects of our thought – which are called elements of the set.
(Georg Cantor)
Mathematically, a set is a collection of distinct objects (no repetitions). Two sets are considered equal if they have the same elements.
For our perspective, we consider the following operations on a set:
Sometimes “insert” / “remove” are boolean methods, to indicate whether you were able to successfully modify the set (returning false, for example, if you tried to insert a duplicate).
Exercise: Using an ArrayList, implement the Set interface. Analyze the time complexity of contains / insert / remove.
table
, size \(s\)table[f(item)]
table[f(item)] == null
Suppose we have a large array of size \(s\).
Simple idea: think of a word as a number in base 27 (“a” represents the digit 1, and “z” represents the digit 27), then convert it to an integer.
Let’s look at the built-in hash function for Strings in Java.
The math here is a little bit of a challenge, let’s do some examples!
Insert \(N\) items into a table of size \(s\). Suppose the hash function is uniform. What’s the probability of a collision?
Two main strategies for collision resolution:
As we learn about these, keep this question in mind: in what scenarios (in terms of the table size \(s\) and the number of insertions \(N\)) might we prefer one scheme over the other?
contains
: compute the hash function, check the relevant list.