Topics in Advanced Computing Lesson 0: Welcome!

Incomplete map of CS
This course
Functional Programming
1. Shared Mutable State
Haskell
Readings

Incomplete map of CS

After the “fundamentals”, computer science research can be in any of a number of topics. The below are very incomplete, but give an idea of how things might break down. These categories are not rigid: research in one of these areas may affect all the other categories, for example.

Foundations

Computability
Complexity
Analysis of algorithms
Graph theory
Learning theory
Logic
…

Software systems

Programming Languages
Compilers
Operating Systems
Networks
Databases
…

Intelligent Systems

Artificial Intelligence
Machine Learning
Natural Language Processing
Computer Vision

Application Development

Graphics
UI Design
Software architecture
…

Computer Engineering

Computer architecture
Digital Logic
…

All of these assume the basic fundamentals: programming, algorithms, the basics of running time and Big Oh analysis, data structures, etc. As mentioned before, these are all interconnected. This is incomplete, but it’s a decent approximation. You can, for example, see some of the “tracks” available in the Columbia University CS curriculum or the Cornell University CS curriculum.

This course

Interleaved lectures (by me) and student presentations
3 presentations per student (roughly 15-20 minutes per student)
2 short papers (2-5 pages and 5-8 pages)

Lecture portion

Functional Programming.
Haskell programming language
Won’t cover everything, but get a broad introduction to the topic and major research trends.

Student presentations and papers

Pick any CS topic. We’ve seen the possible list above and on the syllabus, but there are many other things that one can talk about.

Talks and papers should be pedagogical. We should learn from them. Give us a glimpse of what that field or topic is about, why it’s important, etc.
Let me know what topic you are interested in via email by Friday, September 9. If you have no idea, come to office hours this week or schedule a Zoom call with me next week.
Come to office hours at least one week before your talks! That will give me time to evaluate whether you have the appropriate resources for the talk, and to suggest other resources if you need them.

Note for the seniors: your research topic here may overlap with your senior project, but the focus is different. Here, we are not focusing on producing original work, but on learning and teaching something deeply. For example, if you are working on an idea about quantum computing, your senior project might be a new quantum algorithm, while your papers / talks here might focus on “What is quantum computing?” from a big picture perspective.

Tentative schedule for workload:

Pick a topic: weeks 1-2
Problem set 1: week 2-3
Presentation 1: weeks 4-5
Problem set 2: weeks 6-7
Presentation 2: weeks 8-9
Midterm paper: week 10
Problem set 3: weeks 11-12
Presentation 3: last week of the semester
Problem set 4: last week of the semester
Final paper: due during finals week

Functional Programming

Alternative to imperative-style programming.
Imperative programming: do this, then do this, then do this.
- Object-oriented programming (Java) is often imperative
- Java has added functional features more recently.
- Many languages (Python, C++) can be used with multiple paradigms.
Functional programming:
- Functions are first-class objects
- Complex operations: apply/compose functions
- View your program as a series of operations from the input to the output.
- Functions are pure: no side effects!
- Examples: Haskell, OCaml (Jane Street), Scala (on the JVM, Morgan Stanley), F# (.NET platform), …

Pure functions?

Think mathematically
Result of a function is the same every time it is called on the same input
No side effects
Don’t change any state (memory)
- Objects are immutable (variables are constant?)
Loops? (Recursion!)

Why would anyone use this paradigm?

Shared Mutable State

In software, I often heard the proverb “Shared mutable state is the root of all evil.”

State? Think data.
Mutable? Data that can be changed.
Shared? Data that can be accessed in parallel, by multiple threads / processes!

Example: Imagine that you have 100 in the bank. Payday comes and you will receive a 100 deposit. You also wish to withdraw 100. What happens if you do these things at the same time?

In reality: one of those transactions will be forced to finish before the other. But in a program?

Bank b = new Bank(); // initializes with $100
ExecutorService pool = Executors.newCachedThreadPool();
for (int i = 0; i < 10000; i++) {
  if (i % 2 == 0) {
    // withdraw $100 in one thread
    pool.execute(b::withdraw); 
  } else {
    // deposit $100 in another
      pool.execute(b::payday); 
  }
}

This program simulates doing 5000 withdrawals and 5000 deposits in parallel (across different threads). Running this a few times, I get 100 (correct answer), -400, -1600, 400, …

Why does this give different answers when I run it different times? What is actually happening? Imagine that two threads are trying to access the bank at the same time. If they are literally executing at the same time, you could encounter a scenario where both see $100 in the bank, and then it just depends on the randomness of which thread finishes last. Example:

Thread 1: Sees $100 in the bank.
Thread 2: Sees $100 in the bank.
Thread 1: Add 100, now there is $200 in the bank.
Thread 2: Subtract 100, now there is $0 in the bank.

Now after both threads finish, there is nothing left in the bank. (How would we fix this? We would make sure that no two threads could access the bank at the same time. Or…?)

Note: this is a very simplified example of an issue with shared mutable state. These issues can be very unpredictable. Many have taken to functional programming as a way of avoiding these issues: after all, you cannot have shared mutable state if you do not have any mutable state!

Haskell

We will be using the Haskell language in this course. We will need GHCup and Visual Studio Code.

(We may need to get set up on GitHub Classroom as well.)

Set-up

GHCup installer Follow the instructions. Make sure to install HLS.
VSCode
After installing both: Haskell extension for VSCode Or: open VSCode, go to Extensions (View -> Extensions), and search for Haskell.

(May also need to update directions here for GitHub Classroom.)

Starting out

We will mostly use the “interactive” Haskell compiler. This allows us to input in Haskell expressions and the console immediately outputs the result. To do this, we can open a terminal (or use a terminal in VSCode) and type in the command “ghci”. (Eventually we might start using “stack ghci”).

GHC stands for “Glasgow Haskell Compiler”; ghci is the interactive shell mode for this compiler. This is a common environment called a REPL: read-evaluate-print-loop.

Now let’s follow along some of the examples from Chapter 2.

Arithmetic

ghci> 2 + 15
17

ghci> 8 + 3*(-2)
2

(What happens if you forget the parentheses around -2?)

Boolean algebra

ghci> True && False
False

ghci> True || not True
True

ghci> not (True || not True)
False

ghci> 5 == 2
False

ghci> 5 /= 2
True

Function application

Invoke functions by writing the function name, and then the list of parameters, separated by spaces (no commas). (succ function is the successor function, which just adds one).

ghci> succ 2
3

Precedence?

ghci> succ 3 * 5
20

(It’s $4 \times 5$, not $15 + 1$.)

More than one parameter: again, just use spaces to separate

ghci> max 5 2
5

(What happens if you use parentheses and commas? It thinks the parameter is a tuple!)

Defining functions

Follow along the examples from Chapter 2.

We should be able to:

Define functions in a haskell file (.hs extension) and load them into ghci
Define functions which use other functions we have defined
Use the if-else expression

Type checking

Haskell is statically typed. This means that the type of every expression is known at compile-time.
Haskell has type inference. This means that the compiler can infer the type of an expression, so we do not always need to declare these.
- Good practice to do so though!
To check the type of an expression, use :t.

ghci> :t True
True :: Bool

This says that the type of the expression True is Bool.

More complicated example: consider the “anonymous function” \x -> x + 3. This is the function which takes in a parameter x, and returns x + 3.

ghci> :t \x -> x + 3
\x -> x + 3 :: Num a => a -> a

This is harder to parse. First, it introduces a type variable $a$ and states that $a$ has the Num typeclass. That is, “Num a” puts a class constraint on the type variable $a$. We read the above line as: “The function takes in a parameter of one type and returns a value of the same type. That type must be a member of the Num class.”

We will say more about type variables and classes later. Two quick notes:

Type classes are kind of like interfaces in Java.
Type variables are kind of like generics in Java (if you don’t know what generics are, think about the type in angled brackets in ArrayList<Integer>).

Readings

Read Chapters 1 and 2 of the textbook. Get set up with Haskell / VSCode / (GitHub Classroom?).