I-Introduction to ML Strategy
Why ML Strategy
A lot of ideas of improving ML performance: strategy on how to choose.
 
→ how to figure out which ones to pursue and which ones to discard ?
Orthogonalization
How to tune hyperparams & what to expect.
TV tuning example: each knob does only one thing.
 
Chain of assumptions in ML:
training set performance → dev set → test set → real world  
- "one knob for each chain"
 - Will go through these "knobs" in this course.
 - Don't use early stopping: this is not orthogonalized enough
 
 
II-Setting up your goal
Single number evaluation metric
Faster progress if only have one single real number evaluation metric. → more efficient in making decisions.
example:
Using both percision and recall as metric is not good → difficule to pick the best model to keep on iterating from.
→ Use F1 score instead.  
Satisficing and Optimizing metric
When it's difficult to pick a single real number eval metric → set up satisficing and optimizing metrics.
example: accuracy & running time trade-off
Instead of doing a linear combination of the two, use this:
    maximize accuracy
    subject to running time <= 100 ms
In this case: accuracy is optimizing metric, running time is satisficing metric.  
In general:
if having N metrics → pick 1 as optimizing metric, the N-1 rest as satisficing metric.  
example: assistant wake-up word accuracy VS false positive.
 
Train-dev-test distributions
How to setup dev/test sets.
Idea: dev/test sets come from the same distribution.  
example: cat classification in different regions.
This is bad dev/test setup:

Good practice: random shuffle data and split into dev/test sets.  
 
Size of the dev and test sets
Pre-DL era, old way of splitting data: 70/30 or 60/30/10 split.
→ resonable when datasets are small. (100~10k examples)  
In DL era: much more training (~1M) examples.
⇒  98/1/1 split is more resonable.  
- Size of test/dev set: big enough to give high confidence in system's performance.
 - OK to not have a test set, but not recommended.
 
When to change dev/test sets and metrics
example 1
cat classification: algo A has pornographic false positives.

→ change metric to penalize heavily pornographic FPs.

to implement this weighting, need to go through dev/test sets.  
example 2
Cat classification: user's upload is blury while trained on high quality images.
 
III-Comparing to human-level performance
Why human-level performance?
Workflow ML can be more efficient when trying to match human level performance.
Bayes optional error: best possible error, theoritical optimal.

ML progress usually slows down after surpassing human-level performance:  
- usually human-level is not far from Bayes optimal
 - as long as ML performace < human, there are tools to improve performance.
 
 
Avoidable bias
Using human rating can prevent overfitting on training set.
exapmle: compare training set error with human performance.  
- err_train > err_human ⇒ focus on reducing bias (e.g. bigger NN)
 - err_train ~= err_human ⇒ focus on reducing variance (e.g. regularize, more training data)
 

→ Use human-level error as a proxy for Bayes error.  
Terminology:
- 
Avoidable bias is the gap between training err and Bayes err.
(interpretataion: some errors are inavoidable because Bayes err is not 0.) - 
Variance: gap between training err and dev err.
 
Understanding human-level performance
"Human level error as proxy for Bayes error"
example: Medical image classification.

⇒ Should pick lowest human error as an estimate (upper bound) of Bayes error.  
Error analysis example (which human-err to pick to estimate avoidable bias):
 
Surpassing human-level performance
What's the avoidable bias when err_train and err_dev are smaller than err_human ?
→ less clear in choosing directions.  
examples of tasks where ML >> human performance:

⇒ all these tasks are:  
- learned from structured data
 - are not natural perception tasks
 - have processed huge amount of data
 
Improving your model performance
Recall: two fundamental assumptions of supervised learning:
- You can fit training set well (achieve to ~= avoidable bias)
 - The performance on training set generalize well to dev/test sets. (achieve low variance)
 
The big roadmap:
 
Part 8 of series «Andrew Ng Deep Learning MOOC»:
- [Neural Networks and Deep Learning] week1. Introduction to deep learning
 - [Neural Networks and Deep Learning] week2. Neural Networks Basics
 - [Neural Networks and Deep Learning] week3. Shallow Neural Network
 - [Neural Networks and Deep Learning] week4. Deep Neural Network
 - [Improving Deep Neural Networks] week1. Practical aspects of Deep Learning
 - [Improving Deep Neural Networks] week2. Optimization algorithms
 - [Improving Deep Neural Networks] week3. Hyperparameter tuning, Batch Normalization and Programming Frameworks
 - [Structuring Machine Learning Projects] week1. ML Strategy (1)
 - [Structuring Machine Learning Projects] week2. ML Strategy (2)
 - [Convolutional Neural Networks] week1. Foundations of Convolutional Neural Networks
 - [Convolutional Neural Networks] week2. Deep convolutional models: case studies
 - [Convolutional Neural Networks] week3. Object detection
 - [Convolutional Neural Networks] week4. Special applications: Face recognition & Neural style transfer
 - [Sequential Models] week1. Recurrent Neural Networks
 - [Sequential Models] week2. Natural Language Processing & Word Embeddings
 - [Sequential Models] week3. Sequence models & Attention mechanism
 
Disqus 留言