(maybe best algorithm for sorting.)
1. Quicksort
Idea:
- shuffle the array
- Partition the array into two subarrays to left and right of pivot (*now pivot is *in its final position)
no larger entry to the left of pivot
no smaller entry to the right of pivot
- sort each subarray recursively
Implemetation
The partition process:
这个方法也比较巧妙.
Use 2 pointers i and j (个人觉得用hi, lo, pivot更好...) :
→ a[i]>=a[lo], a[j]<=a[lo] (注意是大于等于/小于等于)
⇒ exchange i and j
→ Scan until i and j cross (ie. j<=i)
⇒ finally exchange lo with j
函数的签名定义的好: 把lo到hi部分的数组分成两部分, 并返回分割点的index.
private static int partition(Comparable[] a, int lo, int hi){
int i=lo+1, j=hi;
while(i<j){
while( i<=hi && less(a[i],a[lo]) ) i++; //a[i]>=a[lo]
while( j>=lo && less(a[lo],a[j]) ) j--; //a[j]<=a[lo]
if(i<j) exch(a,i++,j--);
}
exch(a,lo,j); //exchange pivot with j
return j; //j in its final position
}
这个函数其实并不好写对:
- test for cross pointers is not trival (ex. edge case: the pivot is the smallest/largest entry in the range)
- i<=hi is necessary !
- for keys equal to a[lo]: better to stop at them
invariance:
Quicksort:
使用partition函数和辅助sort函数(recursive). 注意在整个流程开始以前先shuffle一下.
private static void sort(Comparable[] a, int lo, int hi){
if(hi<=lo) return;
int pivot = partition(a, lo, hi);
sort(a,lo,pivot-1);
sort(a,pivot+1,hi);
return;
}
public static void sort(Comparable[] a){
StdRandom.shuffle(a); // don't forget to shuffle the array
sort(a,0,a.length-1);
}
The randomness is preserved: the subarrays after partitionning is still randomly ordered.
Analysis
Performance: ~40% faster than mergesort.
Best case
compares = NlgN
(each partition will divide the array in half)
Worst case
compares = 1/2*N^2
N+(N-1)+...+1
if the array is already in order, each partition will have one subarray of length=0
Average case
proposition
On average, for array with N distinct keys, the #compares = ~2NlnN, #exchanges = ~1/2NlnN.
Proof.*
C(N) := # compares for N entries
pivot 在N个数离的排名是uniform的
接下来的数学推到很漂亮(不过可能没啥用..)
(上面最后一行写错了... 是2NlnN...orz) random shuffle: probalistic guarantee against worst case.
Pitfalls
implementations will get quadratic performance if array:
- is sorted or reverse sorted
- has many duplicates (even if randomized)
Staility
Quicksort is NOT stable.
partitionning can make long range exchanges
Practical improvements
- cutoff to insertion sort for <10 items
→ ~20% improvement
Or we can leave the small subarrays unsorted and sort them at last using insertion sort
- estimate median by sampling 3 items
→ 10% improvement
2. Selection
Goal: given un array, find the kth largest item.
- Upper bound for this problem: NlgN (just sort the array)
- for small k (ex k=1,2,3), the upper bound is N (one-pass/two-pass)
- Lower bound is N: at least have to look at everything
Quick select
Algo proposed also by Hoare:
- partition the array into two arrays left of pivot and right of pivot.
- if pivot==k: return
- continue the partition for just one of the subarrays
类似于二分查找的过程....
注意这里是不用递归的! 因为partition函数返回的直接就是pivot在整个数组里的位置!
Implementation
privater static int partition(Comparable[] a, int lo, int hi){...}
public static Comparable select(Comparable[] a, int k){
StdRandom.shuffle(a);
int lo=0,hi=a.length-1;
while(true){
int j = partition(a,lo,hi);
if(j<k) lo=j+1;
if(j==k) return a[j];
else hi=j-1;
if(hi<=lo) break;
}
return a[k];//这里不太理解为什么会在hi<lo的时候直接返回a[k]
}
Analysis
Proposition
Quick selection takes linear time on average.
*proof *
intuitively, each partition will ct the subarray size in half:
N+N/2+N/4+... = 2N
formal analysis 略...
worst case: quadratic (but very rare to happen)
Theoretical results
3. Duplicate keys
if array contains many duplicate keys.
- huge array
- small number of distinct keys
for mergesort
insensitive... always ~NlgN compares.
for quicksort
Will get quadratic time if not stop on equal keys. (found in 1990s)
mistake: put all items equal to pivot *to just one side *
→ N^2 compares if all keys are equal from lo to hi.
correct: put all items equal to pivot in their final place.
3-way partitionning
(by Dijkstra)
partition the array into 3 parts:
Dijkstra's 3-way partition algo:
使用3个指针: lt指向中间部分的左边界, gt指向右边界; i指针从左向右扫描, 算法很subtle:
- lt=lo, gt=hi, i=lo
- if a[i]==v : i++
- if a[i]<v: exch(i,lt), i++, lt++
- if a[i]>v: exch(i,gt), gt--
- repeat until i and gt cross (i>gt)
invariance:
[lo, lt)
all < v[lt, i)
all == v(gt, hi]
all >v
Implementation: 3-way quick sort
不必再写partition函数, 直接在sort递归函数里面.
private static void sort(Comparable[] a, int lo, int hi){
if(hi<=lo) return; // 递归函数别忘了先写终止条件...
int lt=lo, gt = hi;
Comparable v = a[lo];
for(int i=lo;i<=gt;){ //不能写 i++
if( less(a[i],v) )
exch(a, i++, lt++);
else if ( less(v,a[i]) )
exch(a,i,gt--);
else // v==a[i]
i++;
}
sort(a, lo, lt-1);
sort(a, gt+1, hi);
}
当N个数有很多重复的时候, lower bound可以变小于NlgN:
And Sedgewick proved that the 3-wy partition is propotional to the lower bound....
4. System Sorts
Arrays.sort() in java:
import java.util.Arrays;
quicksort for primitive arrays, mergesort for objects: java设计者认为如果用obj array表示空间不是问题...
Pb in java's system sort: killer input exsit (havn't shuffle)...
总结一下学过的5/6种排序:
Part 7 of series «Algorithms Princeton MOOC I»:
- [Algorithms I] Week 1-1 Union-Find
- [Algorithms I] Week 1-2 Analysis of Algorithms
- [Algorithms I] Week1-Lab: Percolation
- [Algorithms I] Week 2-1 Stacks and Queues
- [Algorithms I] Week 2-2 Elementary Sorts
- [Algorithms I] Week 3-1 Mergesort
- [Algorithms I] Week 3-2 Quicksort
- [Algorithms I] Week 4-1 Priority Queue
- [Algorithms I] Week 4-2a Elementry Symbol Tables
- [Algorithms I] Week 4-2b Binary Search Trees
- [Algorithms I] Week 5-1 Balanced Search Trees
- [Algorithms I] Week 5-2 Geometric Applications of BSTs
- [Algorithms I] Week 6 Hash Tables
Disqus 留言