utils.init¶

Functions to calculate values for different initialization strategies

borch.utils.init.calculate_gain(nonlinearity, param=None)¶

Return the recommended gain value for the given nonlinearity function. The values are as follows:

nonlinearity	gain
Linear / Identity	:math:`1`
Conv{1,2,3}D	:math:`1`
Sigmoid	:math:`1`
Tanh	:math:`\frac{5}{3}`
ReLU	:math:`\sqrt{2}`
Leaky Relu	:math:`\sqrt{\frac{2}{1 + \text{negative_slope}^2}}`

Parameters

nonlinearity – the non-linear function (nn.functional name)
param – optional parameter for the non-linear function

Examples

>>> gain = calculate_gain('leaky_relu')

borch.utils.init.kaiming_normal_std(shape, slope=None, mode='fan_in', nonlinearity='linear')¶

Callculates the standard deviation of a gaussian distribution according to the method described in in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values

sampled from :math:\mathcal{N}(0, \text{std}) where

\[\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan_in}}}\]

Also known as He initialization.

Parameters

shape (tuple) – a tuple with ints of shape of the tensor where the xavier init method will be used.
slope – the negative slope of the rectifier used after this layer (sqrt(5) for leaky_relu by default)
mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_out preserves the magnitudes in the backwards pass.
nonlinearity – the non-linear function (nn.functional name), recommended to use only with ‘relu’ or ‘leaky_relu’ (default).

Returns

float, the std to be used in a Gaussian Distribution to achieve a: xavier initialization.

Examples

>>> kaiming_normal_std((100, 100),nonlinearity="leaky_relu")
0.057735026918962574...

borch.utils.init.xavier_normal_std(shape, gain=1)¶

Callculates the standard deviation of a gaussian distribution according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010). The std can be used to construct a :math:\mathcal{N}(0, \text{std}) distribution, where

\[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}\]

Also known as Glorot initialization.

Parameters

shape (tuple) – a tuple with ints of shape of the tensor where the xavier init method will be used.
gain (float) – an optional scalingfan_out factor

Returns

float, the std to be used in a Gaussian Distribution to achieve a xavier: initialization.

Examples

>>> xavier_normal_std((1000,))
0.044699...