This week: seq2seq.

I-Various sequence to sequence architectures

Basic Models

e.g. Machine translation
encoder network: many-to-one RNN
decoder network: one-to-many RNN

This architecture also works for image captioning: use ConvNet as encoder

Difference between seq2seq and generating new text with language model: seq2seq don't randomly choose a translation, but …

I - Introduction to Word Embeddings

Word representation
So far: representing words with one-hot encoding → word relationships are not generalized.
⇒ want to learn a featurized representatin for each word as a high-dim vector

→ visualize word embeddings in 2-dim space, e.g. via t-SNE

Using word embeddings

example: NER
transfer learning: using …



Created Friday 02 February 2018

Why sequence models

examples of seq data (either input or output):

  • speech recognition
  • music generation
  • sentiment classification
  • DNA seq analysis
  • Machine translation
  • video activity recognition
  • name entity recognition (NER)

→ in this course: learn models applicable to these different settings.


motivating example: NER …


This week: two special application of ConvNet.

I-Face Recognition

What is face recognition

Face verification & face recognition

  • verification: input = image and ID → output whether the image and ID are the same.
  • recognition: database = K persons, input = image → output = ID of the image among the K person or "not recognized …


Object Localization

Classification VS. Localization VS. Detection

classification with localization
Apart from softmax output (for classification), add 4 more outputs of bounding box: b_x, b_y, b_h, b_w.

Defining target label y in localization
label format:
P_c indicating if there's any object
bounding box: b_x, b_y, b_h, b_w
class proba …


I-Case studies

Why look at case studies?

Good way to get intuition of different component of CNN: case study & reading paper.

  • classic networks:
    • LeNet-5
    • AlexNet
    • VGG
  • ResNet (152-layer NN)
  • Inception

Classic Networks


Goal: recognize hand-written digits.
image → 2 CONV-MEANPOOL layers, all CONV are valid (without padding …


I-Error Analysis

Carrying out error analysis

"Error analysis": manually examine the mistakes → get insight of what's next.

"ceiling on performance"

cat classification, found some false-positives of dog pictures. → should you try to make ML system better on dog or not ?
→ error analysis:

  • get ~100 false positive examples
  • count …


I-Introduction to ML Strategy

Why ML Strategy

A lot of ideas of improving ML performance: strategy on how to choose.

→ how to figure out which ones to pursue and which ones to discard ?


How to tune hyperparams & what to expect.

TV tuning example: each knob does only one …


This week: optimization algos to faster train NN, on large dataset.

Mini-batch gradient descent

batch v.s. mini-batch GD

Compute J on m examples: vectorization, i.e. stacking x(i) y(i) horizontally.
X = [x(1), ..., x(m)]
Y = [y(1), ..., y(m)]
→ still slow or impossible with large …