How to support parallel computing with constant tensor in a self-defined layer?
In a self-defined layer, sometimes we want like to keep some constant local tensor. An intuitive way to do this is too add them as cuda tensor.
class MyLayer(torch.nn.Module):
def __init__(self):
self.rgb2ycbcr = torch.cuda.FloatTensor([[.299,.587,.114],
[-0.168735892 ,- 0.331264108, 0.5],
[.5,- 0.418687589, - 0.081312411]])
However, this makes self.rgb2ycbcr automatically allocated to gpu 0. Since we would like to update each layer with this tensor, the program stucked.
Now take a look at official tutorial. We find the layered get replication.
def data_parallel(module, input, device_ids, output_device=None):
if not device_ids:
return module(input)
if output_device is None:
output_device = device_ids[0]
replicas = nn.parallel.replicate(module, device_ids)
inputs = nn.parallel.scatter(input, device_ids)
replicas = replicas[:len(inputs)]
outputs = nn.parallel.parallel_apply(replicas, inputs)
return nn.parallel.gather(outputs, output_device)
The behavior of nn.parallel.replicate(network, devices, detach=False) is to replicate each parameter inside the network. Therefore we can add the local constant tensor to local parameter with requires_nograd set to False.
class MyLayer(torch.nn.Module):
def __init__(self):
self.rgb2ycbcr = torch.nn.Parameter(torch.FloatTensor([[.299,.587,.114],
[-0.168735892 ,- 0.331264108, 0.5],
[.5,- 0.418687589, - 0.081312411]]))
Finally, when creating optimizer, we can filter the parameter to those only requires gradient calculation.
param_require_grad = filter(lambda p: p.requires_grad, module.parameters())
Copy link