problems with text:

  1. often very rare word is important, e.g. retinopathy
  2. ambiguity: e.g. cat and kitty

→ need a lot of labeled data ⇒ not realistic.
unsupervised learning

similar words appear in similar context.
embedding: map words to small vectors

measure the closeness by cosine distance:

word2vec

initial: random vector ...

statistical invariance → weight sharing
e.g. image colors, translation invariance...

convnet

is NNs that share their weights across space.

convolution: slide a small patch of NN over the image to produce a new "image"

convnet forms a pyramid, each "stack of pincake" get larger depth and smaller area.

convolutional lingo ...

Linear models

matrix multiplication: fast with GPU
numerically stable
cannot cocatenate linear units → equivalent to one big matrix...

⇒ add non-linear units in between

rectified linear units (RELU)

chain rule: efficient computationally

back propagation

easy to compute the gradient as long as the function Y(X) is made of simple blocks ...

这是udacity上deeplearning的笔记, 做得非常粗糙, 而且这门课也只是介绍性质的... https://www.udacity.com/course/deep-learning--ud730

Softmax function

socres yi ⇒ probabilities pi

property: smaller scores ⇒ less certain about result

Onehot encoding

Cross entropy

measure how well the probability vector S corresponds to the label vector L. ⇒ cross entropy D(S,L)( D>=0, the smaller the better ...