How to support parallel computing with constant tensor in a self-defined layer?
In a self-defined layer, sometimes we want like to keep some constant local tensor. An intuitive way to do this is too add them as cuda tensor.
However, this makes self.rgb2ycbcr automatically allocated to gpu 0. Since we would like to update each layer with this tensor, the program stucked.
Now take a look at official tutorial. We find the layered get replication.
The behavior of nn.parallel.replicate(network, devices, detach=False) is to replicate each parameter inside the network. Therefore we can add the local constant tensor to local parameter with requires_nograd set to False.
Finally, when creating optimizer, we can filter the parameter to those only requires gradient calculation.
Last updated
Was this helpful?