Data Structures

Professor Abdul-Quader

Lesson 11 (Shortest Path Problems)

Exercise

Given a list of data, remove all duplicates from the data. That is, if your list is [1, 2, 1, 4, 3, 1, 3, 4, 1, 2], you should return [1, 2, 4, 3].

Notice that the list returned should have the elements listed in the order that they first appeared in the original list.

public List<Integer> removeDuplicates(List<Integer> list) {


}

Algorithms

  • Simple \(O(n^2)\) solution using nested for loops.
  • Pseudocode?
  • Improve it using a data structure?
  • Pseudocode and running time of improved solution?

Idea

  • Breadth-first search for the node (“Level order traversal”), using a queue.
  • Keep track of which node enqueued the previous node.

Pseudocode

void pathToNode(Node n):
  parents[root] = null
  queue.enqueue(root)
  Node current = queue.dequeue()
  while (current != n) {
    queue.enqueue(current.left)
    queue.enqueue(current.right)
    parents[current.left] = current
    parents[current.right] = current
    current = queue.dequeue()
  }

Path?

Now backtrack:

LinkedList<T> list;
Node current = n;
while (current != null) {
  list.addFirst(current.data)
  current = parents[current]
}
print(list)

Shortest Path Problems

The same algorithm can be used in more general “shortest path” problems. For example: given two positions on a chessboard, find the shortest path a knight can take to go from the starting to the ending position. Recall that knights can either move two steps horizontally and one step vertically, or two steps vertically and one horizontally.

  • Differences with the tree problem?
  • Implementation considerations?
  • Pseudocode?

Exercise

Let’s implement the shortest path for the knight problem! (Code)

Given two positions on a chessboard, output the shortest path a knight can take to get from one to the other.

DFS vs BFS?

  • Alternative: depth-first search?
  • What would that look like?
  • Doesn’t always provide the shortest path!

Video

  • Explains the BFS algorithm in another context.
  • Explains the issue with using DFS

Synonymous Search Queries

Suppose we have a list of synonymous words (each just a pair of Strings), and a list of pairs of search queries. Describe an algorithm which determines, for each pair of queries, if the queries are “synonymous”.

Example

If our synonyms list contains:

  • hello, hi
  • world, earth

Then the queries “hello world” and “hi earth” are synonymous. The queries “world hello” and “hi earth” are not synonymous (order matters). Your algorithm should return a list of booleans (true / false), which correspond to whether the given pair of queries is “synonymous”.

Questions

  • Running time of your algorithm? Assume you have 1 pair of queries: \(N\) words in each query, synonym list has \(M\) pairs of words.
  • Pseudocode for this problem?

Pseudocode

// checking two queries against each other
boolean queriesMatch = true;
String[] firstWords = query.first.split(" ");
String[] secondWords = query.second.split(" ");
if (firstWords.length != secondWords.length) {
  returns.add(false); // add to list we will be returning
}
for (int i = 0; i < firstWords.length; i++) {
    if (wordsAreNotEqual && wordsAreNotSynonymous) {
      queriesMatch = false;
      break;
    }
}
returns.add(queriesMatch);

Synonymous?

  • We are given a list of pairs of synonymous words.
  • How do we check if words are synonymous or not?
    • Check: if the pair (firstWords[i], secondWords[i]) is in the list of synonyms?
    • Also check if the pair (secondWOrds[i], firstWords[i]) is in the list?
    • Running time of checking if something is in a list?
    • Improvement?
  • Code / pseudocode

Project 2

Implement the Set interface in three ways: using an ArrayList, using an AVL tree (use the solution code, and using a regular BST (not a self balancing tree). Write a Benchmark class which has the following methods:

(continued)

Benchmark Class

  • insertRandoms(Set<Integer> set, int n) : this method should insert n random numbers into the set, and output the amount of time it takes to do this (use System.nanoTime() before and after you do the n insertions)
  • insertInOrder(Set<Integer> set, int n) : this method should insert all the numbers from \(0\) to \(n − 1\) into the set, and output the amount of time it takes to do this.

(continued)

Benchmark Class

  • containsRandom(Set<Integer> set, int size, int n, int numSearches): this method should first insert size many random numbers into the set. Then it should do numSearches many calls to set.contains(n), and output the amount of time that takes.
  • containsInOrder(Set<Integer> set, int size, int n, int numSearches) : similar to above, but first insert the numbers \(0\) to \(size − 1\) in order into the set.

(continued)

Benchmark Class

These methods just benchmark the calls to set.contains(n), after the inserts already happened.

Write up

After writing the Benchmark code, you must use that class to test out your different Set implementations. You then must do a write-up explaining your results. We assume that an AVL tree and a BST should be much better than an ArrayList. Does this actually happen for different types of insertions / different sizes of trees? Organize your results into tables and provide your analysis of what’s happening.

Questions to explore

You should explore the following questions:

  • Is the AVL tree or BST faster for random insertions? What about for in-order insertions?
  • Does the ArrayList ever perform well? For which sizes do we start to see its performance degrade?
  • Do you ever get a StackOverflowError for the BST? What about for the AVL tree? Which methods trigger this?

This project is due Friday, March 21 at 11:59 PM.

Upcoming

  • Now: Quiz 1
  • Next time: wrap up on hashtables.
  • Think about the problem on the next slide

OS Job Scheduling

Operating systems need to solve a scheduling problem:

  • Some number of processes come in at various times.
  • Each process runs for a certain length.
  • Only one process can run at a time.

Suppose we are running one process and it ends. We have some list of processes that have been waiting. How do we decide which process to run next?

Video

  • In this video: choose the shortest job on that list, not necessarily the one which has been waiting the longest (so not FIFO)!
  • How? Need a different data structure: not a stack / queue.
// reveal.js plugins