Data Structures Lesson 24

Professor Abdul-Quader

Java Miscellany

Last time

  • CopyOnWriteArrayList
  • ConcurrentHashMap

System Design

3 types of technical interview questions:

  1. Coding
  2. Algorithms
  3. System design: “How would you design Twitter?”

Producer-Consumers

Common asynchronous design pattern.

  • Producer produces data
  • Consumer consumes data
  • Data stored in some buffer

Example

Alerting system.

  • Something bad happens
  • Security alert message is “produced”.
  • Anyone in the “911 hotline” can “consume” that message
  • They then dispatch to the correct system.

Data structure?

Queues

  • Is there a max number of messages that can be stored at a time? Use ArrayBlockingQueue
  • If not, do the consumers need to wait until a message is on the queue? (Probably!) Use LinkedBlockingQueue.
  • ConcurrentLinkedQueue? What do consumers do if there’s nothing there?

Programming Paradigms

Quick poll:

Have you heard of “imperative” and/or “declarative” programming?

Paradigms

  • Imperative programming: what you’re used to
    • Procedural: C
    • Object-oriented: Java, C++
    • Everything: Python
  • Declarative programming: just say what needs to happen
    • SQL, HTML (I guess?)
    • Functional programming: Lisp, Haskell, F#, …

Functional Programming

  • Functions are first-class objects
  • Complex operations: apply / compose functions
  • Functions are pure

Pure functions

  • Think mathematically
  • Result of a function is the same every time it is called on the same input
  • No side effects
  • Don’t change any state (memory)
    • Objects are immutable (variables are constant?)
  • Loops? (Recursion!)

Why?

  • Getting more popular nowadays
  • Reason: parallelism
  • Functions are easy to run in parallel!
  • No concurrency issues!
  • Concurrency issues usually come from state

Can’t change mutable state / if you don’t have any variables

Computability Theory

  • Imperative programming: based on Turing machines
    • Several “states” in your program, arbitrary large memory space
    • Program: based on current state, memory, change your state / memory
  • Functional programming: based on lambda calculus
    • Formal model of computation based on functions

Java miscellany

  • Lots of features in “newer” versions of Java
  • Maybe we can talk about two: streams and enums
  • Enums: not so new!
  • Mandatory reading: Effective Java (2017)

Streams

  • Java 8: Streams, lambda expressions, method references
  • Making Java “functional”

Streams and pipelines

  • Stream: sequence of data elements
  • Pipeline: multistage computation on these elements
  • Apply function 1, then function 2, then function 3, etc.

Example

String list = "[1, 3, 5, 7, 9]";
int[] intList = Arrays.stream(list.split(",")).map(String::trim).mapToInt(Integer::valueOf).toArray();

Yikes!

What’s going on

list.split(",")

Part 2

Arrays.stream(array).map(String::trim)
  • Create a “pipeline” using the objects in the array.
  • map: apply a function to the elements of the stream
    • Pass in a “method reference” (String::trim) as a parameter
  • returns a new stream (pipeline)

Part 3

// have a stream of Strings now
stream.mapToInt(Integer::valueOf)
  • maps from Strings to Integers
  • Now we have an IntStream instead of Stream
  • Could use Stream instead (boxing / slower)

Parallelizing?

Array.stream(array).parallel().map(String::trim).mapToInt(Integer::valueOf).toArray();
  • Careful when parallelizing streams
  • Might not always be worth it.

MapReduce / Hadoop

MapReduce

  • “Big Data” on large, distributed servers
  • map: apply an operation to each element of the data set
  • reduce: summarize
  • Developed at Google, open sourced: Hadoop
  • Parallelized: data not all in one place!
  • maps can be parallel.
  • reductions can be parallel (somewhat)
  • Final project topic!

Example

Project 3:

  • Suppose you’ve colored your graph (with an int[] colors array)
  • How do we map from a color to the list of all courses with that color?

Traditional

Map<Integer, List<Integer>> colorToCourses = new HashMap<>();
for (int i = 0; i < colors.length; i++) {
  List<Integer> list = colorToCourses.get(color[i]);
  if (list == null) {
    list = new ArrayList<>();
    colorToCourses.put(list);
  }
  list.add(i);
}

Streamy

Map<Integer, List<Integer>> colorToCourseMap = 
      IntStream.range(0, colors.length).boxed()
      .collect(Collectors.groupingBy(i -> colors[i]));

Part 1

IntStream.range(0, colors.length).boxed()
  • Creates an IntStream of \([0, 1, \ldots, colors.length - 1]\).
  • Then turns that into a `Stream``

Part 2

stream.collect(Collectors.groupingBy(i -> colors[i]))
  • collect: “collects” the elements in the Stream according to some “Collector”
  • Collectors.groupingBy(mappingFunction)
    • Creates a Map<Integer, List<Integer>>
    • Puts all the \(i\) that share the same \(color[i]\) in the same list!
    • Exactly what we wanted!

Enums

  • Special class that represents a group of constants
  • In C++: really just integers
  • In Java: actual objects.

Java Enums

public enum Operator {
    PLUS, MINUS, TIMES, DIVIDES;
}

Why?

  • Pre-set list of objects
  • Might want to have variables / methods

More fancy

public enum Operator {
  PLUS(Integer::sum), MINUS((x, y) -> x - y),
  TIMES((x, y) -> x * y), DIVIDES ( (x, y) -> x / y);

  private IntBinaryOperator operator;

  Operator(IntBinaryOperator o) {
    operator = o;
  }

  public int operate(int num1, int num2) {
    return operator.applyAsInt(num1, num2);
  }
}

What?

(x, y) -> x - y
  • Lambda expressions
  • Functions-as-objects!

Lambdas

  • Lambda expressions are ubiquitous.
  • Can make code more expressive
  • Have to think about these in a different way.

Example

Project 3:

  • Given a list of rosters, pre-process it to form a map
  • Map from students to courses.
  • How?

forEach method

  • Each collection has a forEach method.
  • Does some work on each element
  • for each student in a course, add that course to their corresponding map
for (int i = 0; i < rosters.size(); i++) {
  rosters.get(i).forEach(student -> ???);
}

What’s the right lambda expression?

merge method

  • Map has a merge method
  • Takes in a key, value, and a method reference
  • (key, value) -> { }

Code

for (int i = 0; i < rosters.size(); i++) {
  final int course = i;
  rosters.get(i).forEach(student -> studentToCourseMap.putIfAbsent(student, new ArrayList<>()));
  rosters.get(i).forEach(student -> studentToCourseMap.merge(student, Collections.singletonList(course), 
  (oldCourses, newCourses) -> {
    oldCourses.addAll(newCourses);
    return oldCourses;
  }));
}

Simpler?

for (int i = 0; i < rosters.size(); i++) {
  final int course = i;
  for (student : rosters.get(i)) {
    List<Integer> courses = studentToCourseMap.getOrDefault(student, new ArrayList<>());
    courses.add(course);
    studentToCourseMap.put(student, courses);
  }
}

Upcoming

  • Groups: pick a topic ASAP!
  • Presentations Monday.
  • Papers due the following Monday.
  • I owe project 3 + HW 4 grades.
  • Make up work?
// reveal.js plugins