This week: two special application of ConvNet.

I-Face Recognition

What is face recognition

Face verification & face recognition

  • verification: input = image and ID → output whether the image and ID are the same.
  • recognition: database = K persons, input = image → output = ID of the image among the K person or "not recognized …


Object Localization

Classification VS. Localization VS. Detection

classification with localization
Apart from softmax output (for classification), add 4 more outputs of bounding box: b_x, b_y, b_h, b_w.

Defining target label y in localization
label format:
P_c indicating if there's any object
bounding box: b_x, b_y, b_h, b_w
class proba …


I-Case studies

Why look at case studies?

Good way to get intuition of different component of CNN: case study & reading paper.

  • classic networks:
    • LeNet-5
    • AlexNet
    • VGG
  • ResNet (152-layer NN)
  • Inception

Classic Networks


Goal: recognize hand-written digits.
image → 2 CONV-MEANPOOL layers, all CONV are valid (without padding …


I-Error Analysis

Carrying out error analysis

"Error analysis": manually examine the mistakes → get insight of what's next.

"ceiling on performance"

cat classification, found some false-positives of dog pictures. → should you try to make ML system better on dog or not ?
→ error analysis:

  • get ~100 false positive examples
  • count …


I-Introduction to ML Strategy

Why ML Strategy

A lot of ideas of improving ML performance: strategy on how to choose.

→ how to figure out which ones to pursue and which ones to discard ?


How to tune hyperparams & what to expect.

TV tuning example: each knob does only one …


This week: optimization algos to faster train NN, on large dataset.

Mini-batch gradient descent

batch v.s. mini-batch GD

Compute J on m examples: vectorization, i.e. stacking x(i) y(i) horizontally.
X = [x(1), ..., x(m)]
Y = [y(1), ..., y(m)]
→ still slow or impossible with large …


Hyperparameter parameters

Tips for hyperparam-tuning.

Tuning process

Many hyperparams to tune, mark importance by colors (red > yellow > purple):

How to select set of values to explore ?

  • Do NOT use grid search (grid of n * n)

— this was OK in pre-DL era.

  • try random values.

reason: difficule to know which …


Setting up your Maching Learning Application

Train / Dev / Test sets

Applied ML: highly iterative process. idea-code-exp loop

splitting data
splitting data in order to speed up the idea-code-exp loop:
*training set / dev(hold-out/cross-validataion) set / test set *

split ratio:

  • with 100~10000 examples: 70/30 or 60/20/20 …


Deep L-layer neural network

Layer counting:

  • input layer is not counted as a layer, "layer 0"
  • last layer (layer L, output layer) is counted.

notation: layer 0 = input layer L = number of layers n^[l] = size of layer l a^[l] = activation of layer l = g[l]( z[l …