Why inconsistensy in backward propagation (pytorch)?
As long as one uses data type and operations in pytorch, the tool will be able to automatically do the backward propagation. All you need to do is to add the parameters you want to train.
In most cases, as long as all methods and data types are provided pytorch, loss.backward() can be used to calculate pytorch algorithm. However, there are some situations we find gradient vanishes, or not behaves as expected. For instance, the default gradient of torch.round() gives 0. Pytorch provides such backward propagation method because quantization is mathematically inconsistent and cannot be defined in a proper way. Similarly, torch.clamp(), a method that put the an constraint on range of input, has the same problem.
In this case, we need to override the original backward function.
We can easily check the gradient function by printing the results of backward propagation. Notice we can only print gradient of Parameter, not arbitrary torch variable. Pytorch is so designed to save memory.
There is another data type which seems to have the same power as torch.nn.Parameter( ): torch.autograd.Variable( ). I am not sure the difference between these two functions though. As a safe type, torch.nn.Parameter( ) always works.
Last updated
Was this helpful?