Stories and Cheat-Sheets

Quadratic Approximation for a Proportion

This notebook demonstrates the advantage of working with a transformed parameter when making a quadratic approximation (i.e. an approximation to the probability density function by a Gaussian distribution). If, for example, the parameter to estimate is a proportion and therefore bound to the interval $[0,1]$, it might be difficult to fit a Gaussian, where a significant amount of probability mass might lie in the tails and outside of $[0,1]$. To overcome this we can consider a transformed parameter which is supported on the whole real line, make our approximation, find whatever summary statistic we are interested in, and then transform it back to the original scale.

begin
	using CairoMakie
	using Distributions
	using KernelDensity
	using Optim
	using Random
	using Zygote
end
"Visualise parameter values with confidence intervals."
function coef_table!(ax, intervals, names; grouping = nothing)
	mean(v) = sum(v) / length(v)
	function spacing(d)
		if length(d) == 1
			return [1]
		end
		steps = map(i -> .2 + (d[i] != d[i+1]), 1:(length(d) - 1))
		prepend!(steps, [1]) |> cumsum
	end
	ypos = grouping != nothing ? spacing(grouping) : 1:length(intervals) 
	rangebars!(ax, ypos, intervals, direction = :x, color = :black, 
	whiskerwidth = 10)
	scatter!(ax, map(mean, intervals), ypos, color = :black)
	vlines!(ax, [0], linestyle = :dash, color = :black, alpha = .5)
	ax.yticks = ypos
 	ax.ytickformat = _ -> names
	ax.yticklabelsize = 15
	ax.xgridvisible = false
	ax
end;

Model

We assume that the data follow a $Bernoulli(γ)$ distribution.

begin
	Random.seed!(402)
	p = 0.95                                      # True proportion.
	sample_data() = rand(Bernoulli(p), 30)        # Reusable sampler. 
end; 

and use the prior $$\gamma \sim Beta(1/2, 1/2)$$

Analytic Posterior

Using this prior, the posterior density given $n$ trials and $s$ successes is $$\gamma \sim Beta(1/2 + s, 1/2 + n - s)$$

begin 
	y = sample_data()
	n = length(y)
	s = sum(y)
	posterior = Beta(1/2 + s, 1/2 + n - s)
	max_a_post = mode(posterior) 
end;

Quadratic Approximation

Firstly fitted to the proportion γ directly.

begin
	γ̂ = mode(posterior)
	S = inv(-hessian(x -> logpdf(posterior, x), γ̂))
	quap = Normal(γ̂, S^.5) 
end;

Reparameterizing

If the goal is an approximation by a normal distribution, we might prefer a parameter defined on the whole real line. Let $θ = logit(γ)$, i.e. $γ = \sigma(\theta)$ where $\sigma$ is the standard logistic function. Then the prior for θ is $$p_{\theta}(\theta) = p_\gamma(\gamma) \bigg | \frac{d\gamma}{d\theta} \bigg | = Beta \big (\sigma(\theta) \ | \ 1/2, 1/2 \big )\sigma(\theta)(1 - \sigma(\theta))$$ and the likelihood is the standard product of Bernoulli densities.

begin
	σ(θ) = 1 / (1 + exp(-θ))
	logit(γ) = log(γ/(1 - γ))
	# Derivatives of above for change of variables.
	dσ(θ) = σ(θ) * (1 - σ(θ))
	dlogit(γ) = (1 /* (1 - γ)))
end;
begin
	# Log pdf of the prior distribution.
	prior(θ) = logpdf(Beta(1/2, 1/2), σ(θ)) + log(dσ(θ))
	
	# Log likelihood of the data given the parameter θ
	likelihood(θ) = sum(logpdf(Bernoulli(σ(θ)), yᵢ) for yᵢ in y)
	# Ustandardised log posterior
	posterior2(θ) = prior(θ) + likelihood(θ)
	# Maximise posterior over whole real line (approximated by (-10, 10))
	res = optimize(θ -> -posterior2(θ), -10, 10)
	θ̂ = Optim.minimizer(res)
= -inv(hessian(posterior2, θ̂))
	quap2 = Normal(θ̂, Sθ^.5)
end;

Better PDF

In the plot below it is clear that the approximation made on the logit scale is much closer to the true posterior.

begin
	fig1 = Figure()
	ax = Axis(fig1[1,1], xlabel = "γ", ylabel = "probability") 
	xlims!(ax, 0.7, 1.05)
	γ_grid = 0.7:.001:1.05
	
	lines!(ax, γ_grid, posterior, label = "analytic posterior", linewidth = 2)
	lines!(ax, γ_grid, quap, 
		label = "quadratic approximation on γ", linewidth = 2)
	lines!(ax, γ_grid, γ -> if γ < 1 pdf(quap2, logit(γ)) * dlogit(γ) else 0 end, 
		label = "quadratic approximation on θ = logit(γ)"
	) 
	
	axislegend(ax, position = :lt)
	save("images/fig1.svg", fig1)
end;
:(

Better Summary Statistics

In the following plot we have the maximum a posteriori (MAP) and 89% prediction interval for $\gamma$, for each version (true, transformed, untransformed). The MAP for the transformed approximation is closer to the true value, but more importantly the 89% prediction interval is much more realistic.

begin
	# 89% confidence intervals
	α = .89
	q = [(1 - α) / 2, (1 + α) / 2]
	intervals = [quantile(posterior, q),
		quantile(quap, q),
		map(σ, quantile(quap2, q))]
	names = ["analytic", "quap γ", "quap θ"]
	fig2 = Figure()
	ax2 = Axis(fig2[1, 1], title = "89% confidence intervals")
	xlims!(ax2, .75, 1.1)
	coef_table!(ax2, intervals, names)
 	save("images/fig2.svg", fig2)
end;
:(