ref: http://rnduja.github.io/2015/10/05/deep_learning_with_torch_step_3_nn_criterions/ 
doc: https://github.com/torch/nn/blob/master/doc/criterion.md 
Criterion: abstract class, given input and target(true label), a Criterion can compute the gradient according to a certain loss function.   
Criterion class
important methods:
forward(input, target): compute the loss function, theinputis usually the prediction/log-probability prediction of the network,targetis the truth label of training data.backward(input, target): compute gradient of the loss function.
subclasses of Criterion:  
- classification critierions: cross-entropy, neg loglikelihood, ...
 - regression criterions: MSE, Abs, KL divergence, ...
 - embedding criterions
 - misc criterions
 
Classification criterion examples
ClassNLLCriterion
negative log likelihood criterion
https://github.com/torch/nn/blob/master/doc/criterion.md#nn.ClassNLLCriterion
crt = nn.ClassNLLCriterion([weights]) 
optional argument weights is to assign class weights (1D tensor), which is useful for unbalanced dataset.   
For NLL criterion, the input given through a forward(input, target) is expected to be the log-probabilities of each class. The target is expected to be a class index (1 to n).   
The probabilities of each class can be computed by applying softmax on logits,  the log-proba is just to take the log of the probabilities. Can use directly logsoftmax layer to achieve this (ex. add nn.LogSoftMax as last layer of a sequential container).   
If the input x is log-proba of each class, the loss is just:   
loss = forward(x, target) = -x[target_class] 
CrossEntropyCriterion
https://github.com/torch/nn/blob/master/doc/criterion.md#nn.CrossEntropyCriterion
This combines a logsoftmax and a NLLcriterion, so the input is expected to be logits (scores)  
MarginCriterion
https://github.com/torch/nn/blob/master/doc/criterion.md#margincriterion
computes hinge loss of binary classification problem.
input x is expected to be svm scores, target y is expected to be ±1 labels.   
Regression criterion examples
MSECriterion
https://github.com/torch/nn/blob/master/doc/criterion.md#nn.MSECriterion
criterion = nn.MSECriterion() 
the loss is just MSE, input and target both have n elements:
loss = forward(x,y) = sum[ (xi-yi)^2 ] / n 
AbsCriterion
L1 distance between x and y.
DistKLDivCriterion
KL divergence for class probabilities
A Complete Example
updating function
First write a function for  grad-desc updating for a model, input to the model is x, truth label is y.   
    function gradientUpdate(model, x, y, criterion, learningRate)  
        local pred = model:forward(x) -- assumes pred is what criterion expects as input  
        local loss = criterion:forward(pred, y)  
        model:zeroGradParameters()  
        local grad_cri = criterion:backward(pred, y)  
        model:backward(x, grad_cri)  
        model:updateParameters(learningRate)  
    end  
This function implements an update step, given a training sample (x,y):  
- the model computes its output by 
model:forward(x) - criterion takes model's output, and computes loss by
criterion:forward(pred, y), note: the output of model shall be what criterion expects, e.g. pred=log-class-proba for NLL criterion. - criterion gives the gradient of loss function wrt the model output by 
cri:backward(pred, y) - model computes the gradient of its parameters using the gradient from criterion by 
model:backward(x, grad_cri) - the model do a gradient descent step to modify its parameters by 
model:updateParameters(learningRate) 
This is the function that we should pass to an optimizer.
model, criterion and data
- the model is just a linear layer (5 inputs, 1 output ), output = Ax+b
 
lua
        model = nn.Sequential()  
        model:add(nn.Linear(5,1))
- the criterion is just hinge loss:
 
criterion = nn.MarginCriterion(1) 
- For the data, just use 2 datapoints:
 
lua
    x1 = torch.rand(5)  
    y1 = torch.Tensor({1})  
    x2 = torch.rand(5)  
    y2 = torch.Tensor({-1})
training
To train the model, we run the update funcion on the data points 1000 times (epochs):
lua
    for i = 1,1000 do  
        gradientUpdate(model, x1, y1, criterion, 0.01)  
        gradientUpdate(model, x1, y1, criterion, 0.01)  
    end
evaluating
to see the prediciton, just use model:forward(x) 
lua
    print('prediction for x1='..model:forward(x1)[1]..' expected value='..y1[1])  
    print('prediction for x2='..model:forward(x2)[1]..' expected value='..y2[1]) 
to see loss, use criterion:forward(model_out, y) 
lua
    print('loss after training for x1 = ' .. criterion:forward(model:forward(x1), y1))  
    print('loss after training for x2 = ' .. criterion:forward(model:forward(x2), y2))
Disqus 留言