Algorithms and Data Structures

Algorithmic complexity / Big-O / Asymptotic analysis nothing to implement  Harvard CS50 – Asymptotic Notation (video)  Big O Notations (general quick tutorial) (video)  Big O Notation (and Omega and Theta) – best mathematical explanation (video)  Skiena: video slides  A Gentle Introduction to Algorithm Complexity Analysis  Orders of Growth (video)  Asymptotics (video)  UC Berkeley Big O (video)  UC Berkeley Big Omega (video)  Amortized Analysis (video)  Illustrating “Big O” (video)  TopCoder (includes recurrence relations and master theorem): Computational Complexity: Section 1 Computational Complexity: Section 2  Cheat sheet If some of the lectures are too mathy, you can jump down to the bottom and watch the discrete mathematics videos to get the background knowledge. Data Structures Arrays Implement an automatically resizing vector.  Description: Arrays (video) UCBerkley CS61B – Linear and Multi-Dim Arrays (video) Basic Arrays (video) Multi-dim (video) Dynamic Arrays (video) Jagged Arrays (video) Jagged Arrays (video) Resizing arrays (video)  Implement a vector (mutable array with automatic resizing):  Practice coding using arrays and pointers, and pointer math to jump to an index instead of using indexing.  new raw data array with allocated memory can allocate int array under the hood, just not use its features start with 16, or if starting number is greater, use power of 2 – 16, 32, 64, 128  size() – number of items  capacity() – number of items it can hold  is_empty()  at(index) – returns item at given index, blows up if index out of bounds  push(item)  insert(index, item) – inserts item at index, shifts that index’s value and trailing elements to the right  prepend(item) – can use insert above at index 0  pop() – remove from end, return value  delete(index) – delete item at index, shifting all trailing elements left  remove(item) – looks for value and removes index holding it (even if in multiple places)  find(item) – looks for value and returns first index with that value, -1 if not found  resize(new_capacity) // private function when you reach capacity, resize to double the size when popping an item, if size is 1/4 of capacity, resize to half  Time O(1) to add/remove at end (amortized for allocations for more space), index, or update O(n) to insert/remove elsewhere  Space contiguous in memory, so proximity helps performance space needed = (array capacity, which is >= n) * size of item, but even if 2n, still O(n) Linked Lists  Description:  Singly Linked Lists (video)  CS 61B – Linked Lists (video)  C Code (video) – not the whole video, just portions about Node struct and memory allocation.  Linked List vs Arrays: Core Linked Lists Vs Arrays (video) In The Real World Linked Lists Vs Arrays (video)  why you should avoid linked lists (video)  Gotcha: you need pointer to pointer knowledge: (for when you pass a pointer to a function that may change the address where that pointer points) This page is just to get a grasp on ptr to ptr. I don’t recommend this list traversal style. Readability and maintainability suffer due to cleverness. Pointers to Pointers  implement (I did with tail pointer & without):  size() – returns number of data elements in list  empty() – bool returns true if empty  value_at(index) – returns the value of the nth item (starting at 0 for first)  push_front(value) – adds an item to the front of the list  pop_front() – remove front item and return its value  push_back(value) – adds an item at the end  pop_back() – removes end item and returns its value  front() – get value of front item  back() – get value of end item  insert(index, value) – insert value at index, so current item at that index is pointed to by new item at index  erase(index) – removes node at given index  value_n_from_end(n) – returns the value of the node at nth position from the end of the list  reverse() – reverses the list  remove_value(value) – removes the first item in the list with this value  Doubly-linked List Description (video) No need to implement Stack  Stacks (video)  Using Stacks Last-In First-Out (video)  Will not implement. Implementing with array is trivial. Queue  Using Queues First-In First-Out(video)  Queue (video)  Circular buffer/FIFO  Priority Queues (video)  Implement using linked-list, with tail pointer: enqueue(value) – adds value at position at tail dequeue() – returns value and removes least recently added element (front) empty()  Implement using fixed-sized array: enqueue(value) – adds item at end of available storage dequeue() – returns value and removes least recently added element empty() full()  Cost: a bad implementation using linked list where you enqueue at head and dequeue at tail would be O(n) because you’d need the next to last element, causing a full traversal each dequeue enqueue: O(1) (amortized, linked list and array [probing]) dequeue: O(1) (linked list and array) empty: O(1) (linked list and array) Hash table  Videos:  Hashing with Chaining (video)  Table Doubling, Karp-Rabin (video)  Open Addressing, Cryptographic Hashing (video)  PyCon 2010: The Mighty Dictionary (video)  (Advanced) Randomization: Universal & Perfect Hashing (video)  (Advanced) Perfect hashing (video)  Online Courses:  Understanding Hash Functions (video)  Using Hash Tables (video)  Supporting Hashing (video)  Language Support Hash Tables (video)  Core Hash Tables (video)  Data Structures (video)  Phone Book Problem (video)  distributed hash tables: Instant Uploads And Storage Optimization In Dropbox (video) Distributed Hash Tables (video)  implement with array using linear probing hash(k, m) – m is size of hash table add(key, value) – if key already exists, update value exists(key) get(key) remove(key) More Knowledge Binary search  Binary Search (video)  Binary Search (video)  detail  Implement: binary search (on sorted array of integers) binary search using recursion Bitwise operations  Bits cheat sheet – you should know many of the powers of 2 from (2^1 to 2^16 and 2^32)  Get a really good understanding of manipulating bits with: &, |, ^, ~, >>, <<  words  Good intro: Bit Manipulation (video)  C Programming Tutorial 2-10: Bitwise Operators (video)  Bit Manipulation  Bitwise Operation  Bithacks  The Bit Twiddler  The Bit Twiddler Interactive  2s and 1s complement Binary: Plusses & Minuses (Why We Use Two’s Complement) (video) 1s Complement 2s Complement  count set bits 4 ways to count bits in a byte (video) Count Bits How To Count The Number Of Set Bits In a 32 Bit Integer  round to next power of 2: Round Up To Next Power Of Two  swap values: Swap  absolute value: Absolute Integer Trees Trees – Notes & Background  Series: Core Trees (video)  Series: Trees (video) basic tree construction traversal manipulation algorithms BFS (breadth-first search) MIT (video) level order (BFS, using queue) time complexity: O(n) space complexity: best: O(1), worst: O(n/2)=O(n) DFS (depth-first search) MIT (video) notes: time complexity: O(n) space complexity: best: O(log n) – avg. height of tree worst: O(n) inorder (DFS: left, self, right) postorder (DFS: left, right, self) preorder (DFS: self, left, right) Binary search trees: BSTs  Binary Search Tree Review (video)  Series (video) starts with symbol table and goes through BST applications  Introduction (video)  MIT (video) C/C++:  Binary search tree – Implementation in C/C++ (video)  BST implementation – memory allocation in stack and heap (video)  Find min and max element in a binary search tree (video)  Find height of a binary tree (video)  Binary tree traversal – breadth-first and depth-first strategies (video)  Binary tree: Level Order Traversal (video)  Binary tree traversal: Preorder, Inorder, Postorder (video)  Check if a binary tree is binary search tree or not (video)  Delete a node from Binary Search Tree (video)  Inorder Successor in a binary search tree (video)  Implement:  insert // insert value into tree  get_node_count // get count of values stored  print_values // prints the values in the tree, from min to max  delete_tree  is_in_tree // returns true if given value exists in the tree  get_height // returns the height in nodes (single node’s height is 1)  get_min // returns the minimum value stored in the tree  get_max // returns the maximum value stored in the tree  is_binary_search_tree  delete_value  get_successor // returns next-highest value in tree after given value, -1 if none Heap / Priority Queue / Binary Heap visualized as a tree, but is usually linear in storage (array, linked list)  Heap  Introduction (video)  Naive Implementations (video)  Binary Trees (video)  Tree Height Remark (video)  Basic Operations (video)  Complete Binary Trees (video)  Pseudocode (video)  Heap Sort – jumps to start (video)  Heap Sort (video)  Building a heap (video)  MIT: Heaps and Heap Sort (video)  CS 61B Lecture 24: Priority Queues (video)  Linear Time BuildHeap (max-heap)  Implement a max-heap:  insert  sift_up – needed for insert  get_max – returns the max item, without removing it  get_size() – return number of elements stored  is_empty() – returns true if heap contains no elements  extract_max – returns the max item, removing it  sift_down – needed for extract_max  remove(i) – removes item at index x  heapify – create a heap from an array of elements, needed for heap_sort  heap_sort() – take an unsorted array and turn it into a sorted array in-place using a max heap note: using a min heap instead would save operations, but double the space needed (cannot do in-place). Sorting  Notes: Implement sorts & know best case/worst case, average complexity of each: no bubble sort – it’s terrible – O(n^2), except when n <= 16  stability in sorting algorithms (“Is Quicksort stable?”) Sorting Algorithm Stability Stability In Sorting Algorithms Stability In Sorting Algorithms Sorting Algorithms – Stability  Which algorithms can be used on linked lists? Which on arrays? Which on both? I wouldn’t recommend sorting a linked list, but merge sort is doable. Merge Sort For Linked List For heapsort, see Heap data structure above. Heap sort is great, but not stable.  Sedgewick – Mergesort (5 videos)  1. Mergesort  2. Bottom up Mergesort  3. Sorting Complexity  4. Comparators  5. Stability  Sedgewick – Quicksort (4 videos)  1. Quicksort  2. Selection  3. Duplicate Keys  4. System Sorts  UC Berkeley:  CS 61B Lecture 29: Sorting I (video)  CS 61B Lecture 30: Sorting II (video)  CS 61B Lecture 32: Sorting III (video)  CS 61B Lecture 33: Sorting V (video)  Bubble Sort (video)  Analyzing Bubble Sort (video)  Insertion Sort, Merge Sort (video)  Insertion Sort (video)  Merge Sort (video)  Quicksort (video)  Selection Sort (video)  Merge sort code:  Using output array (C)  Using output array (Python)  In-place (C++)  Quick sort code:  Implementation (C)  Implementation (C)  Implementation (Python)  Implement:  Mergesort: O(n log n) average and worst case  Quicksort O(n log n) average case Selection sort and insertion sort are both O(n^2) average and worst case For heapsort, see Heap data structure above.  Not required, but I recommended them:  Sedgewick – Radix Sorts (6 videos)  1. Strings in Java  2. Key Indexed Counting  3. Least Significant Digit First String Radix Sort  4. Most Significant Digit First String Radix Sort  5. 3 Way Radix Quicksort  6. Suffix Arrays  Radix Sort  Radix Sort (video)  Radix Sort, Counting Sort (linear time given constraints) (video)  Randomization: Matrix Multiply, Quicksort, Freivalds’ algorithm (video)  Sorting in Linear Time (video) If you need more detail on this subject, see “Sorting” section in Additional Detail on Some Subjects Graphs Graphs can be used to represent many problems in computer science, so this section is long, like trees and sorting were. Notes from Yegge: There are three basic ways to represent a graph in memory: objects and pointers matrix adjacency list Familiarize yourself with each representation and its pros & cons BFS and DFS – know their computational complexity, their tradeoffs, and how to implement them in real code When asked a question, look for a graph-based solution first, then move on if none.  Skiena Lectures – great intro:  CSE373 2012 – Lecture 11 – Graph Data Structures (video)  CSE373 2012 – Lecture 12 – Breadth-First Search (video)  CSE373 2012 – Lecture 13 – Graph Algorithms (video)  CSE373 2012 – Lecture 14 – Graph Algorithms (con’t) (video)  CSE373 2012 – Lecture 15 – Graph Algorithms (con’t 2) (video)  CSE373 2012 – Lecture 16 – Graph Algorithms (con’t 3) (video)  Graphs (review and more):  6.006 Single-Source Shortest Paths Problem (video)  6.006 Dijkstra (video)  6.006 Bellman-Ford (video)  6.006 Speeding Up Dijkstra (video)  Aduni: Graph Algorithms I – Topological Sorting, Minimum Spanning Trees, Prim’s Algorithm – Lecture 6 (video)  Aduni: Graph Algorithms II – DFS, BFS, Kruskal’s Algorithm, Union Find Data Structure – Lecture 7 (video)  Aduni: Graph Algorithms III: Shortest Path – Lecture 8 (video)  Aduni: Graph Alg. IV: Intro to geometric algorithms – Lecture 9 (video)  CS 61B 2014 (starting at 58:09) (video)  CS 61B 2014: Weighted graphs (video)  Greedy Algorithms: Minimum Spanning Tree (video)  Strongly Connected Components Kosaraju’s Algorithm Graph Algorithm (video) Full Coursera Course:  Algorithms on Graphs (video) Yegge: If you get a chance, try to study up on fancier algorithms:  Dijkstra’s algorithm – see above – 6.006  A*  A Search Algorithm  A* Pathfinding Tutorial (video)  A* Pathfinding (E01: algorithm explanation) (video) I’ll implement:  DFS with adjacency list (recursive)  DFS with adjacency list (iterative with stack)  DFS with adjacency matrix (recursive)  DFS with adjacency matrix (iterative with stack)  BFS with adjacency list  BFS with adjacency matrix  single-source shortest path (Dijkstra)  minimum spanning tree DFS-based algorithms (see Aduni videos above):  check for cycle (needed for topological sort, since we’ll check for cycle before starting)  topological sort  count connected components in a graph  list strongly connected components  check for bipartite graph You’ll get more graph practice in Skiena’s book (see Books section below) and the interview books Even More Knowledge Recursion  Stanford lectures on recursion & backtracking:  Lecture 8 | Programming Abstractions (video)  Lecture 9 | Programming Abstractions (video)  Lecture 10 | Programming Abstractions (video)  Lecture 11 | Programming Abstractions (video) when it is appropriate to use it how is tail recursion better than not?  What Is Tail Recursion Why Is It So Bad?  Tail Recursion (video) Dynamic Programming This subject can be pretty difficult, as each DP soluble problem must be defined as a recursion relation, and coming up with it can be tricky. I suggest looking at many examples of DP problems until you have a solid understanding of the pattern involved.  Videos: the Skiena videos can be hard to follow since he sometimes uses the whiteboard, which is too small to see  Skiena: CSE373 2012 – Lecture 19 – Introduction to Dynamic Programming (video)  Skiena: CSE373 2012 – Lecture 20 – Edit Distance (video)  Skiena: CSE373 2012 – Lecture 21 – Dynamic Programming Examples (video)  Skiena: CSE373 2012 – Lecture 22 – Applications of Dynamic Programming (video)  Simonson: Dynamic Programming 0 (starts at 59:18) (video)  Simonson: Dynamic Programming I – Lecture 11 (video)  Simonson: Dynamic programming II – Lecture 12 (video)  List of individual DP problems (each is short): Dynamic Programming (video)  Yale Lecture notes:  Dynamic Programming  Coursera:  The RNA secondary structure problem (video)  A dynamic programming algorithm (video)  Illustrating the DP algorithm (video)  Running time of the DP algorithm (video)  DP vs. recursive implementation (video)  Global pairwise sequence alignment (video)  Local pairwise sequence alignment (video) Object-Oriented Programming  Optional: UML 2.0 Series (video)  Object-Oriented Software Engineering: Software Dev Using UML and Java (21 videos): Can skip this if you have a great grasp of OO and OO design practices. OOSE: Software Dev Using UML and Java  SOLID OOP Principles:  Bob Martin SOLID Principles of Object Oriented and Agile Design (video)  SOLID Design Patterns in C# (video)  SOLID Principles (video)  S – Single Responsibility Principle | Single responsibility to each Object more flavor  O – Open/Closed Principal | On production level Objects are ready for extension for not for modification more flavor  L – Liskov Substitution Principal | Base Class and Derived class follow ‘IS A’ principal more flavor  I – Interface segregation principle | clients should not be forced to implement interfaces they don’t use Interface Segregation Principle in 5 minutes (video) more flavor  D –Dependency Inversion principle | Reduce the dependency In composition of objects. Why Is The Dependency Inversion Principle And Why Is It Important more flavor Design patterns  Quick UML review (video)  Learn these patterns:  strategy  singleton  adapter  prototype  decorator  visitor  factory, abstract factory  facade  observer  proxy  delegate  command  state  memento  iterator  composite  flyweight  Chapter 6 (Part 1) – Patterns (video)  Chapter 6 (Part 2) – Abstraction-Occurrence, General Hierarchy, Player-Role, Singleton, Observer, Delegation (video)  Chapter 6 (Part 3) – Adapter, Facade, Immutable, Read-Only Interface, Proxy (video)  Series of videos (27 videos)  Head First Design Patterns I know the canonical book is “Design Patterns: Elements of Reusable Object-Oriented Software”, but Head First is great for beginners to OO.  Handy reference: 101 Design Patterns & Tips for Developers Combinatorics (n choose k) & Probability  Math Skills: How to find Factorial, Permutation and Combination (Choose) (video)  Make School: Probability (video)  Make School: More Probability and Markov Chains (video)  Khan Academy: Course layout:  Basic Theoretical Probability Just the videos – 41 (each are simple and each are short):  Probability Explained (video) NP, NP-Complete and Approximation Algorithms Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem, and be able to recognize them when an interviewer asks you them in disguise. Know what NP-complete means.  Computational Complexity (video)  Simonson:  Greedy Algs. II & Intro to NP Completeness (video)  NP Completeness II & Reductions (video)  NP Completeness III (Video)  NP Completeness IV (video)  Skiena:  CSE373 2012 – Lecture 23 – Introduction to NP-Completeness (video)  CSE373 2012 – Lecture 24 – NP-Completeness Proofs (video)  CSE373 2012 – Lecture 25 – NP-Completeness Challenge (video)  Complexity: P, NP, NP-completeness, Reductions (video)  Complexity: Approximation Algorithms (video)  Complexity: Fixed-Parameter Algorithms (video) Peter Norvik discusses near-optimal solutions to traveling salesman problem: Jupyter Notebook Pages 1048 – 1140 in CLRS if you have it. Caches  LRU cache:  The Magic of LRU Cache (100 Days of Google Dev) (video)  Implementing LRU (video)  LeetCode – 146 LRU Cache (C++) (video)  CPU cache:  MIT 6.004 L15: The Memory Hierarchy (video)  MIT 6.004 L16: Cache Issues (video) Processes and Threads  Computer Science 162 – Operating Systems (25 videos): for processes and threads see videos 1-11 Operating Systems and System Programming (video) What Is The Difference Between A Process And A Thread? Covers: Processes, Threads, Concurrency issues difference between processes and threads processes threads locks mutexes semaphores monitors how they work deadlock livelock CPU activity, interrupts, context switching Modern concurrency constructs with multicore processors Process resource needs (memory: code, static storage, stack, heap, and also file descriptors, i/o) Thread resource needs (shares above (minus stack) with other threads in the same process but each has its own pc, stack counter, registers, and stack) Forking is really copy on write (read-only) until the new process writes to memory, then it does a full copy. Context switching How context switching is initiated by the operating system and underlying hardware  threads in C++ (series – 10 videos)  concurrency in Python (videos):  Short series on threads  Python Threads  Understanding the Python GIL (2010) reference  David Beazley – Python Concurrency From the Ground Up: LIVE! – PyCon 2015  Keynote David Beazley – Topics of Interest (Python Asyncio)  Mutex in Python Papers These are Google papers and well-known papers. Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections.  1978: Communicating Sequential Processes implemented in Go Love classic papers?  2003: The Google File System replaced by Colossus in 2012  2004: MapReduce: Simplified Data Processing on Large Clusters mostly replaced by Cloud Dataflow?  2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)  2012: Google’s Colossus paper not available  2012: AddressSanitizer: A Fast Address Sanity Checker: paper video  2013: Spanner: Google’s Globally-Distributed Database: paper video  2014: Machine Learning: The High-Interest Credit Card of Technical Debt  2015: Continuous Pipelines at Google  2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads  2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems  2015: How Developers Search for Code: A Case Study  2016: Borg, Omega, and Kubernetes Testing To cover: how unit testing works what are mock objects what is integration testing what is dependency injection  Agile Software Testing with James Bach (video)  Open Lecture by James Bach on Software Testing (video)  Steve Freeman – Test-Driven Development (that’s not what we meant) (video) slides  TDD is dead. Long live testing.  Is TDD dead? (video)  Video series (152 videos) – not all are needed (video)  Test-Driven Web Development with Python  Dependency injection:  video  Tao Of Testing  How to write tests Scheduling in an OS, how it works can be gleaned from Open