utils.init¶
Functions to calculate values for different initialization strategies
-
borch.utils.init.
calculate_gain
(nonlinearity, param=None)¶ Return the recommended gain value for the given nonlinearity function. The values are as follows:
nonlinearity
gain
Linear / Identity
:math:
1
Conv{1,2,3}D
:math:
1
Sigmoid
:math:
1
Tanh
:math:
\frac{5}{3}
ReLU
:math:
\sqrt{2}
Leaky Relu
:math:
\sqrt{\frac{2}{1 + \text{negative_slope}^2}}
- Parameters
nonlinearity – the non-linear function (
nn.functional
name)param – optional parameter for the non-linear function
Examples
>>> gain = calculate_gain('leaky_relu')
-
borch.utils.init.
kaiming_normal_std
(shape, slope=None, mode='fan_in', nonlinearity='linear')¶ Callculates the standard deviation of a gaussian distribution according to the method described in in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values
sampled from :math:
\mathcal{N}(0, \text{std})
where\[\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan_in}}}\]Also known as He initialization.
- Parameters
shape (tuple) – a tuple with ints of shape of the tensor where the xavier init method will be used.
slope – the negative slope of the rectifier used after this layer (sqrt(5) for leaky_relu by default)
mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing
fan_in
preserves the magnitude of the variance of the weights in the forward pass. Choosingfan_out
preserves the magnitudes in the backwards pass.nonlinearity – the non-linear function (
nn.functional
name), recommended to use only with ‘relu’ or ‘leaky_relu’ (default).
- Returns
- float, the std to be used in a Gaussian Distribution to achieve a
xavier initialization.
Examples
>>> kaiming_normal_std((100, 100),nonlinearity="leaky_relu") 0.057735026918962574...
-
borch.utils.init.
xavier_normal_std
(shape, gain=1)¶ Callculates the standard deviation of a gaussian distribution according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010). The std can be used to construct a :math:
\mathcal{N}(0, \text{std})
distribution, where\[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}\]Also known as Glorot initialization.
- Parameters
shape (tuple) – a tuple with ints of shape of the tensor where the xavier init method will be used.
gain (float) – an optional scalingfan_out factor
- Returns
- float, the std to be used in a Gaussian Distribution to achieve a xavier
initialization.
Examples
>>> xavier_normal_std((1000,)) 0.044699...