Data Structures Lesson 18

Professor Abdul-Quader

5 April 2021

Sorting

  • Bubblesort: repeatedly compare consecutive elements, moving the bigger one to the right.
  • Selection sort: for each \(i\) from \(0\) to \(n - 2\), find the minimum of the elements from \(i\) to \(n - 1\) and swap it with the element at position \(i\).
  • Insertion sort: keep the array sorted from \(0\) up to \(i\). Then put \(a_{i+1}\) in the correct place.

What are the running times of these algorithms?

MergeSort

Recursive algorithm:

procedure mergesort(array, tmpArray, left, right):
    if (left >= right) return
    mid = (left + right)  / 2
    mergesort(array, tmpArray, left, mid)
    mergesort(array, tmpArray, mid + 1, right)
    merge(array, tmpArray, left, mid + 1, right)

How do we implement the merge algorithm? (This is why we have a tmpArray)

Analysis

  1. How many comparisons does this algorithm make for an array of size \(4\)? How many would the insertion sort algorithm make?
  2. What is the worst case running time of this algorithm? Average case?
  3. A sorting algorithm is called stable if whenever multiple elements of the input array are equal, they are left in their original order. Is this algorithm stable?
  4. What is the space complexity of this algorithm?

Stability

Why is “stability” important? Suppose we have a list of Employees. Each employee has a startDate and a job title.

sortByStartDate(employees)
sortByJobTitle(employees)

What does the employees list look like after this?

Heap, Tree

Recall: we have seen other sorting algorithms which use data structures.

  • Heapsort. \(O(n \log(n))\) running time, \(O(1)\) space complexity (use an “in-place” heap).
  • “TreeSort”: \(O(n \log(n))\) running time. What is the space complexity?

Are these algorithms stable?

QuickSort

Pick a pivot. Partition the array: the left part should only contain elements smaller than the pivot. Then you know exactly where the pivot should be, and then recursively quicksort the left and right parts.

procedure quicksort(list, start, end):
    if (start >= end) return
    pivot = list[end]
    pivotIndex = partition(list, start, end, pivot)
    quicksort(list, start, pivotIndex)
    quicksort(list, pivotIndex + 1, end)

Let’s write pseudocode for partition together.

Analysis

  • Is this algorithm stable? (Think about what happens if there are multiple elements equal to the pivot).
  • What is the space complexity?
  • What is the worst case time complexity? What choice of pivot gives us this worst case?
  • What is the average case running time?

Interview

Suppose we have \(16\) GB of data to sort on disk. We can store, in memory, \(4\) GB of data at a time. How can we sort all of this data? In particular, we cannot possibly hope to keep it all in memory, so the goal is to store the data on disk in sorted order.

Project 3

Due April 18 (two weeks). Three parts:

  1. Given (as input) a list of course rosters (each line is list of student IDs), form a graph.

    • The nodes are the courses (use numbers \(0, 1, \ldots, numCourses - 1\))
      • \(i\) and \(j\) share an edge if there is at least one student in both courses
      • Output the edges as pairs of nodes.

Part 2

Output a schedule for the courses (assign each course a time slot).

  • Valid if no two courses which share a student have the same time slot
  • Output the list of courses scheduled in time slot 0, then the list of courses scheduled for time slot 1, etc.
  • Ideally: minimal number of time slots.
  • Should be able to run Parts 1 and 2 independently. Two main methods!

Part 3 (Write-up)

  • Try to break your algorithm.

    • Describe a graph for which your algorithm does not output the minimal coloring for that graph.
    • Hint: start with a coloring (ie, nodes 0, 2, 5 are red, 1, 3, 6 are green, 4 are blue) and work backwards.
  • Describe the coloring algorithm you implemented.

  • This write-up should be in a text file or a PDF. Save it in your src folder.

Presentation 3

// reveal.js plugins