utils.init

Functions to calculate values for different initialization strategies

borch.utils.init.calculate_gain(nonlinearity, param=None)

Return the recommended gain value for the given nonlinearity function. The values are as follows:

nonlinearity

gain

Linear / Identity

:math:1

Conv{1,2,3}D

:math:1

Sigmoid

:math:1

Tanh

:math:\frac{5}{3}

ReLU

:math:\sqrt{2}

Leaky Relu

:math:\sqrt{\frac{2}{1 + \text{negative_slope}^2}}

Parameters
  • nonlinearity – the non-linear function (nn.functional name)

  • param – optional parameter for the non-linear function

Examples

>>> gain = calculate_gain('leaky_relu')
borch.utils.init.kaiming_normal_std(shape, slope=None, mode='fan_in', nonlinearity='linear')

Callculates the standard deviation of a gaussian distribution according to the method described in in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values

sampled from :math:\mathcal{N}(0, \text{std}) where

\[\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan_in}}}\]

Also known as He initialization.

Parameters
  • shape (tuple) – a tuple with ints of shape of the tensor where the xavier init method will be used.

  • slope – the negative slope of the rectifier used after this layer (sqrt(5) for leaky_relu by default)

  • mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_out preserves the magnitudes in the backwards pass.

  • nonlinearity – the non-linear function (nn.functional name), recommended to use only with ‘relu’ or ‘leaky_relu’ (default).

Returns

float, the std to be used in a Gaussian Distribution to achieve a

xavier initialization.

Examples

>>> kaiming_normal_std((100, 100),nonlinearity="leaky_relu")
0.057735026918962574...
borch.utils.init.xavier_normal_std(shape, gain=1)

Callculates the standard deviation of a gaussian distribution according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010). The std can be used to construct a :math:\mathcal{N}(0, \text{std}) distribution, where

\[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}\]

Also known as Glorot initialization.

Parameters
  • shape (tuple) – a tuple with ints of shape of the tensor where the xavier init method will be used.

  • gain (float) – an optional scalingfan_out factor

Returns

float, the std to be used in a Gaussian Distribution to achieve a xavier

initialization.

Examples

>>> xavier_normal_std((1000,))
0.044699...