Data Structures Lesson 18

Professor Abdul-Quader

Sorting

Presentations

Finish presentations today

Sorting

Sorting problem: given an array of integers, re-arrange the elements to be in increasing order.

CS2:

  • Bubble Sort
  • Merge Sort

Remember these?

Bubble Sort

  • Repeatedly compare consecutive elements, moving the bigger one to the right.
  • Pseudocode?
  • Running time?

Selection Sort

  • for each \(i\), repeatedly search for the smallest element in the list in between positions \(i\) and \(n - 1\), and swap it with the element at position \(i\).
  • Pseudocode?
  • Running time?

Insertion sort

  • Keep the beginning of the list (first \(i\) elements) sorted, and then figure out where the element \(a_{i+1}\) goes.
  • Pseudocode?
  • Running time?

Video

Merge Sort

Recursive algorithm:

procedure mergesort(array, tmpArray, left, right):
    if (left >= right) return
    mid = (left + right)  / 2
    mergesort(array, tmpArray, left, mid)
    mergesort(array, tmpArray, mid + 1, right)
    merge(array, tmpArray, left, mid + 1, right)

How do we implement the merge algorithm? (This is why we have a tmpArray)

Analysis

  1. How many comparisons does this algorithm make for an array of size \(4\)? How many would the insertion sort algorithm make?
  2. What is the worst case running time of this algorithm? Average case?
  3. A sorting algorithm is called stable if whenever multiple elements of the input array are equal, they are left in their original order. Is this algorithm stable?
  4. What is the space complexity of this algorithm?

Stability

A sorting algorithm is called stable if whenever multiple elements of the input array are equal, they are left in their original order. Is the merge sort algorithm stable?

Stability

Why is “stability” important? Suppose we have a list of Employees. Each employee has a startDate and a job title.

sortByStartDate(employees)
sortByJobTitle(employees)

What does the employees list look like after this?

Others

Recall: we have seen other sorting algorithms which use data structures.

  • Heapsort. \(O(n \log(n))\) running time, \(O(1)\) space complexity (use an “in-place” heap).
  • “TreeSort”: \(O(n \log(n))\) running time. What is the space complexity?

Are these algorithms stable?

Heapsort stability

Video:

QuickSort

Pick a pivot. Partition the array: the left part should only contain elements smaller than the pivot. Then you know exactly where the pivot should be, and then recursively quicksort the left and right parts.

procedure quicksort(list, start, end):
  if (start >= end) return
  pivot = list[end]
  pivotIndex = partition(list, start, end, pivot)
  quicksort(list, start, pivotIndex)
  quicksort(list, pivotIndex + 1, end)

Let’s write pseudocode for partition together.

Video

Analysis

  • Is this algorithm stable? (Think about what happens if there are multiple elements equal to the pivot).
  • What is the space complexity?
  • What is the worst case time complexity? What choice of pivot gives us this worst case?
  • What is the average case running time?

Video

(Stability / space / worst case)

Interview

Suppose we have \(16\) GB of data to sort on disk. We can store, in memory, \(4\) GB of data at a time. How can we sort all of this data? In particular, we cannot possibly hope to keep it all in memory, so the goal is to store the data on disk in sorted order.

Project 3

Due April 21 (two weeks). Three parts:

Given (as input) a list of course rosters (each line is list of student IDs), form a graph.

  • The nodes are the courses (use numbers \(0, 1, \ldots, numCourses - 1\))
  • \(i\) and \(j\) share an edge if there is at least one student in both courses
  • Output the edges as pairs of nodes.

Part 2

Output a schedule for the courses (assign each course a time slot).

  • Valid if no two courses which share a student have the same time slot
  • Output the list of courses scheduled in time slot 0, then the list of courses scheduled for time slot 1, etc.
  • Ideally: minimal number of time slots.
  • Should be able to run Parts 1 and 2 independently. Two main methods!

Part 3 (Write-up)

  • Try to break your algorithm.
    • Describe a graph for which your algorithm does not output the minimal coloring for that graph.
    • Hint: start with a coloring (ie, nodes 0, 2, 5 are red, 1, 3, 6 are green, 4 are blue) and work backwards.
  • Describe the coloring algorithm you implemented.
  • This write-up should be in a text file or a PDF. Add it to the replit space.

Video

Getting started:

Presentation 3

  • See description from presentation 1
  • Due Friday, April 25 on VoiceThread.

Final Project

  • Groups of 3
  • Topic of your choice, related to data structures.
  • Examples: “data structures used in operating systems”, “data structures used in databases”, other kinds of data structures not covered in our class (trie, red-black trees, etc), implementation issues that come up, etc.
  • Presentation on the last day of class
  • Paper due the following week.

Upcoming

  • Thursday: asynchronous (Quicksort analysis). No in-person meeting.
    • Will host office hours as usual.
  • Next week: quickselect, bucket/radix sorting, intro to concurrency
// reveal.js plugins