Infer¶
Bayesian inference this commonly involves computing an approximate posterior distribution for the paramaters of the model. There are two common approaches, markov sampling and variational inference
Infer is a collection of inference algortihms and loss functions to train models, it has support for Variational Inference trough loss functions and Monete Carlo estimation using classical algorithms like HMC and NUTS.
It also extends to more excostic traings scheems like Weight smoothing of networks(Mean teacher).
- The training loop for normal VI looks like:
>>> import torch >>> import torch.distributions as dist >>> params = [torch.ones(1, requires_grad=True) for i in range(10)] >>> p_dists = [dist.Cauchy(0,1) for _ in range(5)] >>> observed = [False for _ in range(5)] >>> opt = torch.optim.Adam(params) >>> for i in range(2): ... opt.zero_grad() ... q_dists = [dist.Normal(params[2*i],params[2*i+1]) for i in range(5)] ... values = [q_dist.rsample() for q_dist in q_dists] ... loss = vi_loss(p_dists, q_dists, values, observed, 1) ... loss.backward() ... opt.step()
- The No-U-Turn Sampler (NUTS) algorithm can be used like:
>>> import torch >>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)]
>>> def closure(): ... return -sum([-(pp).pow(2).sum() for pp in parameters])
>>> param_value =[] >>> inital_epsilon, epsilon_bar, h_bar = find_reasonable_epsilon( ... parameters, closure) >>> epsilon = inital_epsilon >>> for i in range(1, 10): ... accept_prob = nuts_step(.1, parameters, closure) ... if i < 5: ... epsilon, epsilon_bar, h_bar = dual_averaging(accept_prob, i, ... inital_epsilon, ... epsilon_bar, h_bar) ... else: ... epsilon = epsilon_bar ... param_value.append([par.detach().clone() for par in parameters])
-
borch.infer.
analytical_kl_divergence_loss
(p_dist, q_dist, value=None, backup_loss_fn=None)¶ Calculates the analytical kl divergence if it is available, and falls back to the backup_loss_fn. If no backup_loss_fn is provided,
elbo_loss
will be used.\[\mathbb{KL}(q(\textbf{x}|\textbf{z})|| p(\textbf{x}))\]- Parameters
p_dist – torch.distributions.Distribution
q_dist – torch.distributions.Distribution
value – torch.tensor
backup_loss_fn – callable, with the args p_dist, q_dist, value
- Returns
torch.tensor
Example
>>> import torch >>> import torch.distributions as dist >>> analytical_kl_divergence_loss( ... dist.Normal(0,1), ... dist.Normal(1,1), ... torch.ones(1)) tensor(0.5000)
-
borch.infer.
dual_averaging
(accept_prob, m, initial_epsilon, epsilon_bar, h_bar, delta=0.8, gamma=0.05, t0=10, kappa=0.75)¶ Dual Averaging is a scheme to sole optimization problems, this implemenation is addapted to be used with nuts_step and hmc_step.
It implements the Dual algorithm parts that is outlined in algorithm 6 in [Hoffman2011]. .. bibliography:: references.bib
- Parameters
accept_prob (float) – the acceptance probability of the MCMC step
m (int) – what dual averaging step this is
initial_epsilon (float) – the the epsilon the scheme is initialized with.
epsilon_bar (float) – the epsilon_bar from the previous step, usualy init to 1.
h_bar (float) – parameter in the scheme to do the updating, good initial value is 0.
delta (float, default .8) – parameter that specifies the desired accept_prob
gamma (float, default 0.05) –
t0 (float, default 10) – parameter that stabilizes the initial steps of the scheme.
kappa (float, default .75) – parameter that controls the weights of steps of the scheme.
- Returns
a tuple with the new epsilon, epsilon_bar, h_bar
-
borch.infer.
elbo_loss
(p_dist, q_dist, value)¶ Calculates the elbo:
\[-(\log p(\textbf{x}) - \log q(\textbf{x}|\textbf{z}))\]where :math:
p
isp_dist
, :math:q
isq_dist
and :math:x
isvalue
.- Parameters
p_dist – torch.distributions.Distribution, the prior
q_dist – torch.distributions.Distribution, the approximating distribution
value – torch.tensor, the value where the elbo is evaluated at.
- Returns
torch.tensor the elbo
Example
>>> import torch >>> import torch.distributions as dist >>> elbo_loss(dist.Cauchy(0,1), dist.Normal(1,1), torch.ones(1)) tensor(0.9189)
-
borch.infer.
elbo_path_derivative_loss
(p_dist, q_dist, value)¶ Calculates the elbo with path derivative:
\[-(\log p(\textbf{x}) - \log q(\textbf{x.detach()}|\textbf{z}))\]where :math:
p
isp_dist
, :math:q
isq_dist
and :math:x
isvalue
.- Parameters
p_dist – torch.distributions.Distribution, the prior
q_dist – torch.distributions.Distribution, the approximating distribution
value – torch.tensor, the value where the elbo is evaluated at.
- Returns
torch.tensor the elbo
Example
>>> import torch >>> import borch.distributions as dist >>> elbo_path_derivative_loss( ... dist.Cauchy(0,1), ... dist.Normal(1,1), ... torch.ones(1)) tensor(0.9189)
-
borch.infer.
find_reasonable_epsilon
(parameters, closure)¶ Implements a Heuristic for choosing an initial value of epsilon for HMC and NUTS.
It implements algorithm 4 in [Hoffman2011]. .. bibliography:: references.bib
- Parameters
parameters – list, with torch.tensors. The parameters to use in the hmc step
closure – python callable, that takes no arguments. Usually the calculation of the log joint of the model
- Returns
the suggested epsilon float, epsilon_bar: the initial value of epsilon_bar for dual_averaging float, h_bar: the initial value of h_bar for dual_averaging
- Return type
float, epsilon
Examples
>>> import torch >>> torch.manual_seed(7) <...> >>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)] >>> def closure(): ... return sum([-(pp).pow(2).sum() for pp in parameters]) >>> epsilon, epsilon_bar, h_bar = find_reasonable_epsilon(parameters, closure) >>> epsilon 0.5
-
borch.infer.
hard_negative_mining
(losses, labels, neg_pos_ratio)¶ Suppress the presence of a large number of negative predictions.
For any example, it keeps all the positive predictions and cuts the number of negative predictions so the ratio between the negative examples and positive examples is no more the given ratio.
- Parameters
losses (N, M) – Predicted class probabilities for each example.
labels (N, M) – The class labels as one-hot encodings.
neg_pos_ratio – The maximum ratio between negative and positive examples.
- Returns
Mask for applying to the loss.
Example
>>> import torch >>> from borch.utils.torch_utils import one_hot >>> >>> losses = torch.rand(10, 5) >>> labels = one_hot(torch.randint(0, 5, (10,)), n_classes=5) >>> mask = hard_negative_mining(losses, labels, 3) >>> loss = losses[mask].sum()
-
borch.infer.
hmc_step
(epsilon, L, parameters, closure)¶ Performs one Hamiltonian Monte carlo step, the .data of the parameters will be updated with the result of the step.
Notes
For some random number generation numpy is used, this means that in order to make seeded runs both torch and numpy needs to be seeded.
hmc_step needs to be given the negative log likelihood and not the log likelihood as outlined in the paper. This is done to be consistent with the optimizer interface.
The HMC implementations have not gone trough rigorous use and is considered experimental.
- Parameters
epsilon (float) – float, the step size
L (int) – int, the number of leapfrog steps
parameters (iterable) – with torch.tensors. The parameters to use in the hmc step
closure (callable) – A closure that reevaluates the model and returns the loss.
- Returns
float, The acceptance probability of the step.
Example
>>> import torch >>> import numpy >>> torch.manual_seed(7) <...> >>> numpy.random.seed(7) >>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)] >>> def closure(): ... return -sum([-(pp).pow(2).sum() for pp in parameters]) >>> hmc_step(.1, 10, parameters, closure) 1.0
-
borch.infer.
negative_log_prob
(p_dists, values, **kwargs)¶ - Calculates the negative log_prob the provided distributions at supplied
values.
- Parameters
p_dists – list with torch.distributions.Distribution
values – list with torch.tensor where the log_prob volume adjustments is evaluated.
- Returns
torch.tensor
Examples
>>> import borch.distributions as dist >>> import torch >>> p_dists = [dist.Cauchy(2,2) for _ in range(5)] >>> values = [torch.tensor(float(ii)) for ii in range(1,6)] >>> negative_log_prob(p_dists, values) tensor(11.5075)
-
borch.infer.
negative_log_prob_loss
(p_dist, value)¶ Returns the negative log_prob of the p_dist evaluated at value
\[-log dist(value | **)\]- Parameters
P_dist – a
torch.distributions.Distribution
value – torch.tensor
- Returns
torch.tensor
Example
>>> import torch >>> import borch.distributions as dist >>> negative_log_prob_loss( ... dist.Normal(loc = torch.ones(1), scale = torch.ones(1)), ... torch.zeros(1)) tensor(1.4189)
-
borch.infer.
nuts_step
(epsilon, parameters, closure, max_tree_depth=10, delta_energy_max=1000)¶ Performers one step using the ‘The No-U-Turn Sampler’. The No-U-Turn Sampler adaptively sets the path lengths in Hamiltonian Monte Carlo, typically resulting in lower auto correlation between the samples.
It implements algorithm 3 in [Hoffman2011]. .. bibliography:: references.bib
Notes
For some random number generation numpy is used, this means that in order to make seeded runs both torch and numpy needs to be seeded.
Nuts_step needs to be given the negative log likelihood and not the log likelihood as outlined in the paper. This is done to be consistent with the optimizer interface.
The NUTS implementations have not gone trough rigorous use and is considered experimental.
- Parameters
epsilon (float) – the step size
parameters – (iterable): list with torch.tensor’s
closure (callable) – A closure that reevaluates the model and returns the loss.
max_tree_depth (int) – The maximum allowed tree depth.
delta_energy_max (float) – larges allowed energy change.
- Returns
a float, the acceptance probability of the step
Example
>>> import torch >>> import numpy >>> torch.manual_seed(7) <...> >>> numpy.random.seed(7) >>> parameters = [torch.ones(1, requires_grad=True) for ii in range(3)] >>> def closure(): ... return -sum([-(pp).pow(2).sum() for pp in parameters]) >>> nuts_step(.1, parameters, closure) 1.0
-
borch.infer.
vi_loss
(p_dists, q_dists, values, observed, kl_scaling=1, div_fn=<function elbo_loss>)¶ Calculates a regularization term for VI, it checks if rsample is avalible for all q_dists and applies
div_fn
. If not it uses elbo_score_function as a backup.Note in some cases distributions that does not support analytical KL divergence and rsample is used, it can be better to use the function
elbo_rb_score_function
which utilizes sub samples to reduce variance.- Parameters
p_dists (iterable) – list with torch.distributions.Distribution, the prior distributions
q_dists (iterable) – list with torch.distributions.Distribution, the approximating distributions
values (iterable) – list with torch.tensors, where the distributions are evaluated.
observed (iterable) – list with booleans, if True that index in all of the lists will be treated as observed.
kl_scaling (float) – sets the scale of of the ELBO term.
div_fn (callable) – function to be used to calculate the divergance term
- Returns
torch.tensor
Example
>>> import torch >>> import torch.distributions as dist >>> p_dists = [dist.Cauchy(0,1) for _ in range(5)] >>> q_dist = [dist.Normal(0,1) for _ in range(5)] >>> values = [torch.tensor(float(ii)) for ii in range(5)] >>> observed = [False for _ in range(5)] >>> vi_loss(p_dists, q_dist, values, observed, 1) tensor(-6.4327)
-
borch.infer.
vi_regularization
(p_dists, q_dists, values, div_fn=<function elbo_loss>)¶ Calculates a regularization term, given the provided div_fn
- Parameters
p_dists – list with torch.distributions.Distribution, the prior distributions
q_dists – list with torch.distributions.Distribution, the approximating distributions
values – list with torch.tensors
div_fn – Python callable (Default value = elbo_loss)
- Returns
torch.tensor
Example
>>> import torch >>> import torch.distributions as dist >>> import borch.infer as infer >>> p_dists = [dist.Cauchy(0,1) for _ in range(5)] >>> q_dist = [dist.Normal(0,1) for _ in range(5)] >>> values = [torch.tensor(float(ii)) for ii in range(5)] >>> vi_regularization(p_dists, q_dist, values, infer.elbo_loss) tensor(-6.4327)