tissue333
  • Introduction
  • JPEG and Neural Network
    • Progress
    • TodoList
  • Findings
    • Why inconsistensy in backward propagation (pytorch)?
    • GOMP_4.0 not found
    • How to support parallel computing with constant tensor in a self-defined layer?
  • Seashell
    • Review
    • Spatial
    • Dahlia Programmability
    • FuTil
    • FuTil Paper Review
  • Configurable DNN to FPGA
    • General Idea
    • Event Queue Dialect (MLIR)
  • Progress Log
  • Random
    • ubuntu
Powered by GitBook
On this page

Was this helpful?

  1. Findings

How to support parallel computing with constant tensor in a self-defined layer?

PreviousGOMP_4.0 not foundNextSeashell

Last updated 6 years ago

Was this helpful?

In a self-defined layer, sometimes we want like to keep some constant local tensor. An intuitive way to do this is too add them as cuda tensor.

class MyLayer(torch.nn.Module):
    def __init__(self):
        self.rgb2ycbcr = torch.cuda.FloatTensor([[.299,.587,.114],
                  [-0.168735892 ,- 0.331264108, 0.5],
                  [.5,- 0.418687589, - 0.081312411]])

However, this makes self.rgb2ycbcr automatically allocated to gpu 0. Since we would like to update each layer with this tensor, the program stucked.

Now take a look at . We find the layered get replication.

def data_parallel(module, input, device_ids, output_device=None):
    if not device_ids:
        return module(input)

    if output_device is None:
        output_device = device_ids[0]

    replicas = nn.parallel.replicate(module, device_ids)
    inputs = nn.parallel.scatter(input, device_ids)
    replicas = replicas[:len(inputs)]
    outputs = nn.parallel.parallel_apply(replicas, inputs)
    return nn.parallel.gather(outputs, output_device)
class MyLayer(torch.nn.Module):
    def __init__(self):
        self.rgb2ycbcr = torch.nn.Parameter(torch.FloatTensor([[.299,.587,.114],
                  [-0.168735892 ,- 0.331264108, 0.5],
                  [.5,- 0.418687589, - 0.081312411]]))

Finally, when creating optimizer, we can filter the parameter to those only requires gradient calculation.

param_require_grad = filter(lambda p: p.requires_grad, module.parameters())

The of nn.parallel.replicate(network, devices, detach=False) is to replicate each parameter inside the network. Therefore we can add the local constant tensor to local parameter with requires_nograd set to False.

official tutorial
behavior