week1

Created Friday 02 February 2018

Why sequence models

examples of seq data (either input or output):

  • speech recognition
  • music generation
  • sentiment classification
  • DNA seq analysis
  • Machine translation
  • video activity recognition
  • name entity recognition (NER)

→ in this course: learn models applicable to these different settings.

Notation

motivating example: NER (Each ...

This week: two special application of ConvNet.

I-Face Recognition

What is face recognition

Face verification & face recognition

  • verification: input = image and ID → output whether the image and ID are the same.
  • recognition: database = K persons, input = image → output = ID of the image among the K person or "not recognized".

→ the ...

Object Localization

Classification VS. Localization VS. Detection

classification with localization
Apart from softmax output (for classification), add 4 more outputs of bounding box: b_x, b_y, b_h, b_w.

Defining target label y in localization
label format:
P_c indicating if there's any object
bounding box: b_x, b_y, b_h, b_w
class proba ...



I-Case studies

Why look at case studies?

Good way to get intuition of different component of CNN: case study & reading paper.
Outline

  • classic networks:
    • LeNet-5
    • AlexNet
    • VGG
  • ResNet (152-layer NN)
  • Inception

Classic Networks

LeNet-5(1998)

Goal: recognize hand-written digits.
image → 2 CONV-MEANPOOL layers, all CONV are valid (without padding) → 2 ...


I-Error Analysis

Carrying out error analysis

"Error analysis": manually examine the mistakes → get insight of what's next.

"ceiling on performance"

example:
cat classification, found some false-positives of dog pictures. → should you try to make ML system better on dog or not ?
→ error analysis:

  • get ~100 false positive examples
  • count ...

I-Introduction to ML Strategy

Why ML Strategy

A lot of ideas of improving ML performance: strategy on how to choose.

→ how to figure out which ones to pursue and which ones to discard ?

Orthogonalization

How to tune hyperparams & what to expect.

TV tuning example: each knob does only one thing ...

This week: optimization algos to faster train NN, on large dataset.

Mini-batch gradient descent

batch v.s. mini-batch GD

Compute J on m examples: vectorization, i.e. stacking x(i) y(i) horizontally.
X = [x(1), ..., x(m)]
Y = [y(1), ..., y(m)]
→ still slow or impossible with large m ...


Hyperparameter parameters

Tips for hyperparam-tuning.

Tuning process

Many hyperparams to tune, mark importance by colors (red > yellow > purple):

How to select set of values to explore ?

  • Do NOT use grid search (grid of n * n)

— this was OK in pre-DL era.

  • try random values.

reason: difficule to know which hyperparam ...

Setting up your Maching Learning Application

Train / Dev / Test sets

Applied ML: highly iterative process. idea-code-exp loop

splitting data
splitting data in order to speed up the idea-code-exp loop:
*training set / dev(hold-out/cross-validataion) set / test set *

split ratio:

  • with 100~10000 examples: 70/30 or 60/20/20
  • with ...