What are Probabilistic Programming Languages?

By Kimberly Cook |Email | Jan 1, 2019 | 3669 Views

The use of statistics to overcome uncertainty is one of the pillars of a large segment of the machine learning market. Probabilistic reasoning has long been considered one of the foundations of inference algorithms and is represented in all major machine learning frameworks and platforms. Recently, probabilistic reasoning has seen major adoption within tech giants like Uber, Facebook or Microsoft helping to push the research and technological agenda in the space. Specifically, probabilistic programming languages(PPLs) have become one of the most active areas of development in machine learning sparking the release of some new and exciting technologies.

What are Probabilistic Programming Languages?
Conceptually, probabilistic programming languages(PPLs) are domain-specific languages that describe probabilistic models and the mechanics to perform inference in those models. The magic of PPL relies on combining the inference capabilities of probabilistic methods with the representational power of programming languages.

In a PPL program, assumptions are encoded with prior distributions over the variables of the model. During execution, a PPL program will launch an inference procedure to automatically compute the posterior distributions of the parameters of the model based on observed data. In other words, inference adjusts the prior distribution using the observed data to give a more precise model. The output of a PPL program is a probability distribution, which allows the programmer to explicitly visualize and manipulate the uncertainty associated with a result.

To illustrate the simplicity of PPLs, let's use one of the most famous problems of modern statistics: a biased coin toss. The idea of this problem is to calculate the bias of a coin. Let's assume that xi = 1 if the result of the i-th coin toss is head and xi = 0 if it is tail. Our context assumes that individual coin tosses are independent and identically distributed (IID) and that each toss follows a Bernoulli distribution with parameter θ: p(xi = 1 | θ) = θ and p(xi = 0 | θ) = 1 ?? θ. The latent (i.e., unobserved) variable θ is the bias of the coin. The task is to infer θ given the results of previously observed coin tosses, that is, p(θ | x1, x2, . . . , xN ).

Modeling a simple program like the biased coin toss in a general-purpose programing language can result on hundreds of lines of code. However, PPLs like Edward express this problem in a few simple likes of code:

# Model 
theta = Uniform(0.0, 1.0) 
x = Bernoulli(probs=theta, sample_shape=10) 
Data 5 data = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1]) 
Inference 
qtheta = Empirical( 8 tf.Variable(tf.ones(1000) ?? 0.5)) 
inference = ed.HMC({theta: qtheta}, 
data={x: data}) 
inference.run() 
Results 13 mean, stddev = ed.get_session().run( [qtheta.mean(),qtheta.stddev()]) 
print("Posterior mean:", mean) 
print("Posterior stddev:", stddev)

The Holy Grail: Deep PPL
For decades, the machine learning space was divided into two irreconcilable camps: statistics and neural networks. One camp gave birth to probabilistic programming while the other was behind transformational movements such as deep learning. Recently, the two schools of thought have come together to combine deep learning and Bayesian modeling into single programs. The ultimate expression of this effort is deep probabilistic programming languages(Deep PPLs).

Conceptually, Deep PPLs can express Bayesian neural networks with probabilistic weights and biases. Practically speaking, Deep PPLs have materialized as new probabilistic languages and libraries that integrate seamlessly with popular deep learning frameworks.

3 Deep PPLs You Need to Know About
The field of probabilistic programming languages(PPLs) have been exploding with research and innovation in recent years. Most of that innovations have come from combining PPLs and deep learning methods to build neural networks that can efficiently handle uncertainty. Tech giants such as Google, Microsoft or Uber have been responsible for pushing the boundaries of Deep PPLs into large scale scenarios. Those efforts have translated into completely new Deep PPLs stacks that are becoming increasingly popular within the machine learning community. Let's explore some of the most recent advancements in the Deep PPL space.

Edward
Edward is a Turing-complete probabilistic programming language(PPL) written in Python. Edward was originally championed by the Google Brain team but now has an extensive list of contributors. The original research paper of Edward was published in March 2017 and since then the stack has seen a lot of adoption within the machine learning community. Edward fuses three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming. The library integrates seamlessly with deep learning frameworks such as Keras and TensorFlow.

1 # Model
2 theta = Uniform(0.0, 1.0)
3 x = Bernoulli(probs=theta, sample_shape=10)
4 # Data
5 data = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
6 # Inference
7 qtheta = Empirical(
8 tf.Variable(tf.ones(1000) ?? 0.5))
9 inference = ed.HMC({theta: qtheta},
10 data={x: data})
11 inference.run()
12 # Results
13 mean, stddev = ed.get_session().run(
14 [qtheta.mean(),qtheta.stddev()])
15 print("Posterior mean:", mean)
16 print("Posterior stddev:", stddev)
1 # Inference Guide
2 qalpha = tf.Variable(1.0)
3 qbeta = tf.Variable(1.0)
4 qtheta = Beta(qalpha, qbeta)
5 # Inference
6 inference = ed.KLqp({theta: qtheta}, {x: data})
7 inference.run()

Pyro
Pyro is a deep probabilistic programming language(PPL) released by Uber AI Labs. Pyro is built on top of PyTorch and is based on four fundamental principles:
  • Universal: Pyro is a universal PPL - it can represent any computable probability distribution. How? By starting from a universal language with iteration and recursion (arbitrary Python code), and then adding random sampling, observation, and inference.
  • Scalable: Pyro scales to large data sets with little overhead above hand-written code. How? By building modern black box optimization techniques, which use mini-batches of data, to approximate inference.
  • Minimal: Pyro is agile and maintainable. How? Pyro is implemented with a small core of powerful, composable abstractions. Wherever possible, the heavy lifting is delegated to PyTorch and other libraries.
  • Flexible: Pyro aims for automation when you want it and control when you need it. How? Pyro uses high-level abstractions to express generative and inference models while allowing experts to easily customize inference.

Just as other PPLs, Pyro combines deep learning models and statistical inference using a simple syntax as illustrated in the following code:

1 # Model
2 def coin():
3 theta = pyro.sample("theta", Uniform(
4 Variable(torch.Tensor([0])),
5 Variable(torch.Tensor([1])))
6 pyro.sample("x", Bernoulli(
7 theta ?? Variable(torch.ones(10)))
8 # Data
9 data = {"x": Variable(torch.Tensor(
10 [0, 1, 0, 0, 0, 0, 0, 0, 0, 1]))}
11 # Inference
12 cond = pyro.condition(coin, data=data)
13 sampler = pyro.infer.Importance(cond,
14 num_samples=1000)
15 post = pyro.infer.Marginal(sampler, sites=["theta"])
16 # Result
17 samples = [post()["theta"].data[0] for _ in range(1000)]
18 print("Posterior mean:", np.mean(samples))
19 print("Posterior stddev:", np.std(samples))

# Inference Guide
2 def guide():
3 qalpha = pyro.param("qalpha", Variable(torch.Tensor([1.0]), requires_grad=True))
4 qbeta = pyro.param("qbeta", Variable(torch.Tensor([1.0]), requires_grad=True))
5 pyro.sample("theta", Beta(qalpha, qbeta))
6 # Inference
7 svi = SVI(cond, guide, Adam({}), loss="ELBO", num_particles=7)
8 for step in range(1000):
9 svi.step()

Infer.Net
Microsoft recently open sourced Infer.Net a framework that simplifies probabilistic programming for.Net developers. Microsoft Research has been working on Infer.Net since 2004 but it has been only recently, with the emergence of deep learning, that the framework has become really popular. Infer.Net provides some strong differentiators that make it a strong choice for developers venturing into the Deep PPL space:

  • Rich modeling language" Support for univariate and multivariate variables, both continuous and discrete. Models can be constructed from a broad range of factors including arithmetic operations, linear algebra, range and positivity constraints, Boolean operators, Dirichlet-Discrete, Gaussian, and many others.
  • Multiple inference algorithms" Built-in algorithms include Expectation Propagation, Belief Propagation (a special case of EP), Variational Message Passing and Gibbs sampling.
  • Designed for large-scale inference: Infer.NET compiles models into inference source code which can be executed independently with no overhead. It can also be integrated directly into your application.
  • User-extendable: Probability distributions, factors, message operations, and inference algorithms can all be added by the user. Infer.NET uses a plug-in architecture which makes it open-ended and adaptable.

Let's look at our coin toss example in Infer.Net

Variable<bool> firstCoin = Variable.Bernoulli(0.5);
Variable<bool> secondCoin = Variable.Bernoulli(0.5);
Variable<bool> bothHeads = firstCoin & secondCoin;
InferenceEngine engine = new InferenceEngine();
Console.WriteLine("Probability both coins are heads: "+engine.Infer(bothHeads));

The field of Deep PPL has is steadily becoming an important foundational block of the machine learning ecosystem. Pyro, Edward, and Infer.Net are just three recent examples of Deep PPLs but not the only relevant ones. The intersection of deep learning frameworks and PPL offers an incredibly large footprint for innovation and new use cases are likely to push the boundaries of Deep PPLs in the near future.

Source: HOB