Infer

Bayesian inference this commonly involves computing an approximate posterior distribution for the paramaters of the model. There are two common approaches, markov sampling and variational inference

Infer is a collection of inference algortihms and loss functions to train models, it has support for Variational Inference trough loss functions and Monete Carlo estimation using classical algorithms like HMC and NUTS.

It also extends to more excostic traings scheems like Weight smoothing of networks(Mean teacher).

The training loop for normal VI looks like:
>>> import torch
>>> import torch.distributions as dist
>>> params = [torch.ones(1, requires_grad=True) for i in range(10)]
>>> p_dists = [dist.Cauchy(0,1) for _ in range(5)]
>>> observed = [False for _ in range(5)]
>>> opt = torch.optim.Adam(params)
>>> for i in range(2):
...     opt.zero_grad()
...     q_dists  = [dist.Normal(params[2*i],params[2*i+1]) for i in range(5)]
...     values = [q_dist.rsample() for q_dist in q_dists]
...     loss =  vi_loss(p_dists, q_dists, values, observed, 1)
...     loss.backward()
...     opt.step()
The No-U-Turn Sampler (NUTS) algorithm can be used like:
>>> import torch
>>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)]
>>> def closure():
...     return -sum([-(pp).pow(2).sum() for pp in parameters])
>>> param_value =[]
>>> inital_epsilon, epsilon_bar, h_bar =  find_reasonable_epsilon(
...                                         parameters, closure)
>>> epsilon = inital_epsilon
>>> for i in range(1, 10):
...     accept_prob = nuts_step(.1, parameters, closure)
...     if i < 5:
...         epsilon, epsilon_bar, h_bar = dual_averaging(accept_prob, i,
...                                                 inital_epsilon,
...                                                 epsilon_bar, h_bar)
...     else:
...         epsilon = epsilon_bar
...     param_value.append([par.detach().clone() for par in parameters])
borch.infer.analytical_kl_divergence_loss(p_dist, q_dist, value=None, backup_loss_fn=None)

Calculates the analytical kl divergence if it is available, and falls back to the backup_loss_fn. If no backup_loss_fn is provided, elbo_loss will be used.

\[\mathbb{KL}(q(\textbf{x}|\textbf{z})|| p(\textbf{x}))\]
Parameters
  • p_dist – torch.distributions.Distribution

  • q_dist – torch.distributions.Distribution

  • value – torch.tensor

  • backup_loss_fn – callable, with the args p_dist, q_dist, value

Returns

torch.tensor

Example

>>> import torch
>>> import torch.distributions as dist
>>> analytical_kl_divergence_loss(
...     dist.Normal(0,1),
...     dist.Normal(1,1),
...     torch.ones(1))
tensor(0.5000)
borch.infer.dual_averaging(accept_prob, m, initial_epsilon, epsilon_bar, h_bar, delta=0.8, gamma=0.05, t0=10, kappa=0.75)

Dual Averaging is a scheme to sole optimization problems, this implemenation is addapted to be used with nuts_step and hmc_step.

It implements the Dual algorithm parts that is outlined in algorithm 6 in [Hoffman2011]. .. bibliography:: references.bib

Parameters
  • accept_prob (float) – the acceptance probability of the MCMC step

  • m (int) – what dual averaging step this is

  • initial_epsilon (float) – the the epsilon the scheme is initialized with.

  • epsilon_bar (float) – the epsilon_bar from the previous step, usualy init to 1.

  • h_bar (float) – parameter in the scheme to do the updating, good initial value is 0.

  • delta (float, default .8) – parameter that specifies the desired accept_prob

  • gamma (float, default 0.05) –

  • t0 (float, default 10) – parameter that stabilizes the initial steps of the scheme.

  • kappa (float, default .75) – parameter that controls the weights of steps of the scheme.

Returns

a tuple with the new epsilon, epsilon_bar, h_bar

borch.infer.elbo_loss(p_dist, q_dist, value)

Calculates the elbo:

\[-(\log p(\textbf{x}) - \log q(\textbf{x}|\textbf{z}))\]

where :math:p is p_dist, :math:q is q_dist and :math:x is value.

Parameters
  • p_dist – torch.distributions.Distribution, the prior

  • q_dist – torch.distributions.Distribution, the approximating distribution

  • value – torch.tensor, the value where the elbo is evaluated at.

Returns

torch.tensor the elbo

Example

>>> import torch
>>> import torch.distributions as dist
>>> elbo_loss(dist.Cauchy(0,1), dist.Normal(1,1), torch.ones(1))
tensor(0.9189)
borch.infer.elbo_path_derivative_loss(p_dist, q_dist, value)

Calculates the elbo with path derivative:

\[-(\log p(\textbf{x}) - \log q(\textbf{x.detach()}|\textbf{z}))\]

where :math:p is p_dist, :math:q is q_dist and :math:x is value.

Parameters
  • p_dist – torch.distributions.Distribution, the prior

  • q_dist – torch.distributions.Distribution, the approximating distribution

  • value – torch.tensor, the value where the elbo is evaluated at.

Returns

torch.tensor the elbo

Example

>>> import torch
>>> import borch.distributions as dist
>>> elbo_path_derivative_loss(
...     dist.Cauchy(0,1),
...     dist.Normal(1,1),
...     torch.ones(1))
tensor(0.9189)
borch.infer.find_reasonable_epsilon(parameters, closure)

Implements a Heuristic for choosing an initial value of epsilon for HMC and NUTS.

It implements algorithm 4 in [Hoffman2011]. .. bibliography:: references.bib

Parameters
  • parameters – list, with torch.tensors. The parameters to use in the hmc step

  • closure – python callable, that takes no arguments. Usually the calculation of the log joint of the model

Returns

the suggested epsilon float, epsilon_bar: the initial value of epsilon_bar for dual_averaging float, h_bar: the initial value of h_bar for dual_averaging

Return type

float, epsilon

Examples

>>> import torch
>>> torch.manual_seed(7)
<...>
>>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)]
>>> def closure():
...     return sum([-(pp).pow(2).sum() for pp in parameters])
>>> epsilon, epsilon_bar, h_bar = find_reasonable_epsilon(parameters, closure)
>>> epsilon
0.5
borch.infer.hard_negative_mining(losses, labels, neg_pos_ratio)

Suppress the presence of a large number of negative predictions.

For any example, it keeps all the positive predictions and cuts the number of negative predictions so the ratio between the negative examples and positive examples is no more the given ratio.

Parameters
  • losses (N, M) – Predicted class probabilities for each example.

  • labels (N, M) – The class labels as one-hot encodings.

  • neg_pos_ratio – The maximum ratio between negative and positive examples.

Returns

Mask for applying to the loss.

Example

>>> import torch
>>> from borch.utils.torch_utils import one_hot
>>>
>>> losses = torch.rand(10, 5)
>>> labels = one_hot(torch.randint(0, 5, (10,)), n_classes=5)
>>> mask = hard_negative_mining(losses, labels, 3)
>>> loss = losses[mask].sum()
borch.infer.hmc_step(epsilon, L, parameters, closure)

Performs one Hamiltonian Monte carlo step, the .data of the parameters will be updated with the result of the step.

Notes

For some random number generation numpy is used, this means that in order to make seeded runs both torch and numpy needs to be seeded.

hmc_step needs to be given the negative log likelihood and not the log likelihood as outlined in the paper. This is done to be consistent with the optimizer interface.

The HMC implementations have not gone trough rigorous use and is considered experimental.

Parameters
  • epsilon (float) – float, the step size

  • L (int) – int, the number of leapfrog steps

  • parameters (iterable) – with torch.tensors. The parameters to use in the hmc step

  • closure (callable) – A closure that reevaluates the model and returns the loss.

Returns

float, The acceptance probability of the step.

Example

>>> import torch
>>> import numpy
>>> torch.manual_seed(7)
<...>
>>> numpy.random.seed(7)
>>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)]
>>> def closure():
...     return -sum([-(pp).pow(2).sum() for pp in parameters])
>>> hmc_step(.1, 10, parameters, closure)
1.0
borch.infer.negative_log_prob(p_dists, values, **kwargs)
Calculates the negative log_prob the provided distributions at supplied

values.

Parameters
  • p_dists – list with torch.distributions.Distribution

  • values – list with torch.tensor where the log_prob volume adjustments is evaluated.

Returns

torch.tensor

Examples

>>> import borch.distributions as dist
>>> import torch
>>> p_dists = [dist.Cauchy(2,2) for _ in range(5)]
>>> values = [torch.tensor(float(ii)) for ii in range(1,6)]
>>> negative_log_prob(p_dists, values)
tensor(11.5075)
borch.infer.negative_log_prob_loss(p_dist, value)

Returns the negative log_prob of the p_dist evaluated at value

\[-log dist(value | **)\]
Parameters
  • P_dist – a torch.distributions.Distribution

  • value – torch.tensor

Returns

torch.tensor

Example

>>> import torch
>>> import borch.distributions as dist
>>> negative_log_prob_loss(
...     dist.Normal(loc = torch.ones(1), scale = torch.ones(1)),
...     torch.zeros(1))
tensor(1.4189)
borch.infer.nuts_step(epsilon, parameters, closure, max_tree_depth=10, delta_energy_max=1000)

Performers one step using the ‘The No-U-Turn Sampler’. The No-U-Turn Sampler adaptively sets the path lengths in Hamiltonian Monte Carlo, typically resulting in lower auto correlation between the samples.

It implements algorithm 3 in [Hoffman2011]. .. bibliography:: references.bib

Notes

For some random number generation numpy is used, this means that in order to make seeded runs both torch and numpy needs to be seeded.

Nuts_step needs to be given the negative log likelihood and not the log likelihood as outlined in the paper. This is done to be consistent with the optimizer interface.

The NUTS implementations have not gone trough rigorous use and is considered experimental.

Parameters
  • epsilon (float) – the step size

  • parameters – (iterable): list with torch.tensor’s

  • closure (callable) – A closure that reevaluates the model and returns the loss.

  • max_tree_depth (int) – The maximum allowed tree depth.

  • delta_energy_max (float) – larges allowed energy change.

Returns

a float, the acceptance probability of the step

Example

>>> import torch
>>> import numpy
>>> torch.manual_seed(7)
<...>
>>> numpy.random.seed(7)
>>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)]
>>> def closure():
...     return -sum([-(pp).pow(2).sum() for pp in parameters])
>>> nuts_step(.1, parameters, closure)
1.0
borch.infer.vi_loss(p_dists, q_dists, values, observed, kl_scaling=1, div_fn=<function elbo_loss>)

Calculates a regularization term for VI, it checks if rsample is avalible for all q_dists and applies div_fn. If not it uses elbo_score_function as a backup.

Note in some cases distributions that does not support analytical KL divergence and rsample is used, it can be better to use the function elbo_rb_score_function which utilizes sub samples to reduce variance.

Parameters
  • p_dists (iterable) – list with torch.distributions.Distribution, the prior distributions

  • q_dists (iterable) – list with torch.distributions.Distribution, the approximating distributions

  • values (iterable) – list with torch.tensors, where the distributions are evaluated.

  • observed (iterable) – list with booleans, if True that index in all of the lists will be treated as observed.

  • kl_scaling (float) – sets the scale of of the ELBO term.

  • div_fn (callable) – function to be used to calculate the divergance term

Returns

torch.tensor

Example

>>> import torch
>>> import torch.distributions as dist
>>> p_dists = [dist.Cauchy(0,1) for _ in range(5)]
>>> q_dist  = [dist.Normal(0,1) for _ in range(5)]
>>> values = [torch.tensor(float(ii)) for ii in range(5)]
>>> observed = [False for _ in range(5)]
>>> vi_loss(p_dists, q_dist, values, observed, 1)
tensor(-6.4327)
borch.infer.vi_regularization(p_dists, q_dists, values, div_fn=<function elbo_loss>)

Calculates a regularization term, given the provided div_fn

Parameters
  • p_dists – list with torch.distributions.Distribution, the prior distributions

  • q_dists – list with torch.distributions.Distribution, the approximating distributions

  • values – list with torch.tensors

  • div_fn – Python callable (Default value = elbo_loss)

Returns

torch.tensor

Example

>>> import torch
>>> import torch.distributions as dist
>>> import borch.infer as infer
>>> p_dists = [dist.Cauchy(0,1) for _ in range(5)]
>>> q_dist  = [dist.Normal(0,1) for _ in range(5)]
>>> values = [torch.tensor(float(ii)) for ii in range(5)]
>>> vi_regularization(p_dists, q_dist, values, infer.elbo_loss)
tensor(-6.4327)