Linear models
matrix multiplication: fast with GPU
numerically stable
cannot cocatenate linear units → equivalent to one big matrix...
⇒ add non-linear units in between
rectified linear units (RELU)
chain rule: efficient computationally
back propagation
easy to compute the gradient as long as the function Y(X) is made of simple blocks ...