More efficient version of symbol-table where the keys are strings.

# 1. R-way Tries

Two implementations of symbol tables that we've seen:

when keys are strings:
(`L`=string length, `N`=number of strings, `R`=radix)

for string keys ⇒ do better by avoiding examing the entire key.

goal: faster than hashtable ...

This week: string sort.

# 1. Strings in Java

### char data type

• char in C

8-bit integer, 256 characters, 7-bit ASCII code

• char in Java

16-bit Unicode

### String data type

`String`: immutable sequence of characters
operations: lengthe, ith char, substring, concatenate

implementation: using a `char[]`, maintain a `length` and an `offset ...`

# 1. Introduction to Maxflow

### Min-cut pb

• input: edge-weighted digraph G, each edge `e` has weight("capacity") `c[e]`>=0, a source vertex `s`, a target vertex `t`.
• def. an st-cut `(A,B)` is a partition of vertices into 2 disjoint sets A and B, with `s` in set `A` and ...

# 1. Shortest Paths APIs

context: directe, weighted graphs.

### shortest path variants

in terms of vertices:

• source-sink: form one vertex to another
• single source: from one vertex to all others (considered in this lecture)
• all pairs

constraints on edge weights:

• nonnegative weights
• arbitary weights
• eculidean

cycles:

• no directed cycles
• no negative ...

# 1. Introduction to MSTs

Given: undirected connecte graph `G` with positive edge weights.
def. Spanning tree `T`
is a subgraph of `G`, that is both tree (connected, acyclic) and spanning(all vertices are included).

⇒ Goal: find a spanning tree with minimum weight sum.

# 2. Greedy Algorithm

assumptions for simplification:

• edge ...

# 1. Intro to digraphs

Has profound differences wrt undirected graphs.

def: digraph
edges: have directions
vertex: distinguish indeg and outdeg

digraph pbs:

• path/shortest path
• topological sort: Can you draw a digraph so that all edges point upwards?
• strong connectivity: Is there a directed path between all pairs of vertices ...

# 1. Intro to graphs

Graph: vertices connected by edges.

terminology:

• path: sequence of vertices connected by edges
• cycle: path with same starting and ending vertex
• two vertices are connected: if there is a path between

ex of graph problems:

• path: or connectivity
• shortest path
• cycle
• Euler tour (ouii..)
• Hamilton tour ...

## simHash

simHash的方法听上去比minHash还要简单:

1. 对一个文档d中的每一个term(ngram, shingle) t, 计算其hashcode(比如用java内建的`Object.hashCode()`函数) hash(t).
2. 把d中所有term的hash(t)合成为一个hashcode作为d的hashcode simHash(d): simHash(d)的长度与hash(t)相同, simHash(d)的第k个bit的取值为所有hash(t)第k个bit的众数.

approximate retrieval(相似搜索)这个问题之前实习的时候就经常遇到: 如何快速在大量数据中如何找出相近的数据.

## assumptions in Bayes viewpoint

(btw, 对一个随机变量建模, 一般来说, 连续随机变量就用高斯 ...