1. Introduction to Maxflow

Min-cut pb

  • input: edge-weighted digraph G, each edge e has weight("capacity") c[e]>=0, a source vertex s, a target vertex t.
  • def. an st-cut (A,B) is a partition of vertices into 2 disjoint sets A and B, with s in set A and ...

1. Shortest Paths APIs

context: directe, weighted graphs.

shortest path variants

in terms of vertices:

  • source-sink: form one vertex to another
  • single source: from one vertex to all others (considered in this lecture)
  • all pairs

constraints on edge weights:

  • nonnegative weights
  • arbitary weights
  • eculidean

cycles:

  • no directed cycles
  • no negative ...

1. Introduction to MSTs

Given: undirected connecte graph G with positive edge weights.
def. Spanning tree T
is a subgraph of G, that is both tree (connected, acyclic) and spanning(all vertices are included).

⇒ Goal: find a spanning tree with minimum weight sum.

2. Greedy Algorithm

assumptions for simplification:

  • edge ...

1. Intro to digraphs

Has profound differences wrt undirected graphs.

def: digraph
edges: have directions
vertex: distinguish indeg and outdeg

digraph pbs:

  • path/shortest path
  • topological sort: Can you draw a digraph so that all edges point upwards?
  • strong connectivity: Is there a directed path between all pairs of vertices ...

1. Intro to graphs

Graph: vertices connected by edges.

terminology:

  • path: sequence of vertices connected by edges
  • cycle: path with same starting and ending vertex
  • two vertices are connected: if there is a path between

ex of graph problems:

  • path: or connectivity
  • shortest path
  • cycle
  • Euler tour (ouii..)
  • Hamilton tour ...

weighted graph的最短路径问题有三个非常有名的算法, 分别以三个大牛的名字命名, 今天用尽量简洁的篇幅一一介绍.

简单起见(这回只写伪代码好了), 对于图的定义如下:

  • node index = {1,2,...,n}
  • e[u,v] = distance of edge(u,v); if (u,v) is not an edge, e[u,v]=INF
  • 令N=点的数量, M=边的数量

任意两点最短路径: Floyd-Warshall

Floyd算法解决的问题是: 对于任意两个节点s(source)和t(target), 求s到达t的最短路径.

Floyd算法的思想是动态规划:

  • 定义d ...

今天总结一下也许是搜索问题里最重要的算法: DFS !

由于树可以看成是一个graph, 这里还是只写对于graph的DFS算法. Graph类的定义还是用每一个节点保存邻居信息:

public class GraphNode{      
    int val;      
    List<GraphNode> neighbors;      
}

为了防止重复, 仍然用一个HaseSet记录走过的节点:

HasheSet<GraphNode> visited = new HasheSet<GraphNode>();

Recursive DFS

首先写递归版本的DFS, DFS就是一条路走到底, 不撞南墙不回头, 所以递归写起来很自然: 每到一个节点, 标记其已经访问过了, 然后对于邻居里面没有访问的节点继续递归进行DFS.

递归的DFS代码非常简洁:

public void DFS(GraphNode nd){      
    System.out.println(nd.val);    
    visited.add(nd);   
    for(GraphNode next: nd.neighbors){   
        if ...

除了上次介绍的minhash方法以外, 还有一种常见的hash方法, 叫做simHash. 这里做简要介绍.
这个hash函数的背景和上次一样, 还是考虑把文本抽象为ngram的集合:

然后相似度依旧是Jaccard similarity:

simHash

simHash的方法听上去比minHash还要简单:

  1. 对一个文档d中的每一个term(ngram, shingle) t, 计算其hashcode(比如用java内建的Object.hashCode()函数) hash(t).
  2. 把d中所有term的hash(t)合成为一个hashcode作为d的hashcode simHash(d): simHash(d)的长度与hash(t)相同, simHash(d)的第k个bit的取值为所有hash(t)第k个bit的众数.

写成数学表达式很吓人, 其实只不过不断在{0,1}和{-1 ...

今天总结一下广度优先搜索(BFS). BFS是树/图的遍历的常用算法之一, 对于没有边权重的图来说可以计算最短路径.
由于树的BFS只是图的BFS的一种特殊情况, 而且比较简单不需要visited标记, 这里只写一下图的BFS好了.
先定义一个Graph类, 这里在每一个节点保存邻居信息:

public class GraphNode{   
    int val;   
    List<GraphNode> neighbors;   
}

BFS for trees/graphs

图的遍历需要注意不走重复节点, 所以需要一个HashSet(名字叫visited)来保存哪些节点已经访问过了. 需要注意的是, 在把一个节点放进队列queue的时刻就要把它放进visited, 而不是在队列里取出来的时刻再放.

public void BFS(GraphNode start){   
    LinkedList<GraphNode> q = new LinkedList<GraphNode>();   
    HasheSet<GraphNode> visited = new HasheSet<GraphNode>();   
    q.push(start);   
    visited ...

Here is a mindmap of the common algorithms and data structures, it can give an overview of the algorithmic terms.

I shall update its content later on. And maybe write some blog entries on some of the items.

This mindmap is drawn using xmind.