The Beta distribution of the third kind (or generalised Beta prime)
- Preliminaries: the (scaled) Beta prime distribution
- Beta distribution of the third kind
- Update 2019-09-05: generalised Beta distribution
- Cumulative distribution function
- Sampling the Beta distribution of the third kind
- Application to the Bayesian binomial model
- Application to the Bayesian “two Poisson samples” model
We present the family of so-called Beta distributions of the third kind. In the context of Bayesian statistics, it is a conjugate family of prior distributions on the odds parameter of the binomial model. This distribution is known, but nobody provided a way to sample from it. We show how one can sample from this distribution in R.
Preliminaries: the (scaled) Beta prime distribution
The Beta distribution of the third kind generalizes the Beta distribution of the second kind, also known under the name Beta prime distribution.
The Beta prime distribution B′(c,d,λ)B′(c,d,λ) is the distribution of the random variable λU1−UλU1−U where U∼B(c,d)U∼B(c,d).
Its density function at x⩾0x⩾0 is 1λcB(c,d)xc−1(1+xλ)c+d.1λcB(c,d)xc−1(1+xλ)c+d. Usually the definition does not include the scale parameter λλ (that is, it is usually defined for λ=1λ=1 only).
It is easy to implement a sampler for this distribution, the density function and the cumulative density function:
rbetaprime <- function(n, c, d, lambda = 1){
stopifnot(lambda > 0)
u <- rbeta(n, c, d)
lambda * u/(1-u)
}
dbetaprime <- function(x, c, d, lambda = 1){
stopifnot(lambda > 0)
lambda/(lambda+x)^2 * dbeta(x/lambda/(1+x/lambda), c, d)
}
pbetaprime <- function(q, c, d, lambda){
stopifnot(lambda > 0)
pbeta(q/lambda/(1+q/lambda), c, d)
}
Beta distribution of the third kind
The Beta distribution of the third kind B3B3 was firstly introduced (as far as I know) in the paper Some Poisson mixtures with a hyperscale parameter, written by myself.
For parameters c>0c>0, d>0d>0, κ∈Rκ∈R, τ>0τ>0, the density function of B3(c,d,κ,τ)B3(c,d,κ,τ) is f(ϕ)=1B(c,d)2F1(c,c+d−κ,c+d;1−1τ)ϕc−1(1+ϕ)−κ(1+ϕτ)c+d−κ,ϕ⩾0.f(ϕ)=1B(c,d)2F1(c,c+d−κ,c+d;1−1τ)ϕc−1(1+ϕ)−κ(1+ϕτ)c+d−κ,ϕ⩾0. Thus, for κ=0κ=0, the B3(c,d,κ,τ) distribution equals B′(c,d,τ), and for κ=c+d or τ=1, it equals B′(c,d,1). Note that in general, τ is not a scale parameter.
Let’s write a R function computing this density:
library(gsl)
Gauss2F1 <- function(a,b,c,x){
if(x>=0 && x<1){ # hyperg_2F1 works fine in this situation
hyperg_2F1(a, b, c, x)
}else{ # transform to come down to the first situation
hyperg_2F1(c-a, b, c, 1-1/(1-x)) / (1-x)^b
}
}
dB3 <- function(x, c, d, kappa, tau){
stopifnot(c > 0, d > 0, tau > 0)
if(kappa == 0){
dbetaprime(x, c, d, tau)
}else if(kappa == c+d){
dbetaprime(x, c, d, 1)
}else{
1/beta(c,d) * x^(c-1)*(1+x)^(-kappa)/(1+x/tau)^(c+d-kappa) /
Gauss2F1(c, c+d-kappa, c+d, 1-1/tau)
}
}
This distribution is related to the four-parameter generalized Beta distribution introduced by Chen & Novick in the paper Bayesian analysis for binomial models with generalized beta prior distributions (1984); this distribution takes its value in [0,1]. They are related by an elementary transformation: Θ∼GB4(c,d,κ,τ)⟺Θ1−Θ∼B3(c,d,c+d−κ,1τ).
Update 2019-09-05: generalised Beta distribution
I’ve just discovered that the GB4 distribution appears in the paper On Kummer’s distributions of type two and generalized Beta distributions written by Hamza & Vallois. It is named generalised Beta distribution in this paper, it is denoted by βδ(a,b,c) and its density function at u∈[0,1] is given by 1β(a,b)2F1(−c,a;a+b;1−δ)ua−1(1−u)b−1(1+(δ−1)u)c for a,b,δ>0 and c∈R.
We have the following relation: if Φ∼B3(c,d,κ,τ), then Φ1+Φ∼β1τ(c,d,κ−c−d).
So, maybe a better name for B3 would be generalised Beta prime distribution.
Cumulative distribution function
The cumulative distribution function of B3 involves the Appell hypergeometric function F1. A Fortran implementation of this function is available in the R package appell
. This package has been removed from CRAN, but you can still install it. If Φ∼B3(c,d,κ,τ), then, for q⩾0, Pr(Φ⩽q)=qcF1(c;κ,c+d−κ;c+1;−q,−qτ)cB(c,d)2F1(c,c+d−κ,c+d;1−1τ). Here is a R implementation. I found that it works well with the option userflag = 0
of the appellf1
function.
pB3 <- function(q, c, d, kappa, tau, userflag = 0){
stopifnot(c > 0, d > 0, tau > 0)
if(kappa == 0){
pbetaprime(q, c, d, tau)
}else if(kappa == c+d){
pbetaprime(q, c, d, 1)
}else{
C <- beta(c,d) * Gauss2F1(c, c+d-kappa, c+d, 1-1/tau)
Appell <-
appell::appellf1(c, kappa, c+d-kappa, c+1, -q, -q/tau,
userflag = userflag)
q^c/c * Re(Appell$val) / C
}
}
Sampling the Beta distribution of the third kind
It is not very easy to sample the B3 distribution. In her master thesis, Myriam Chabot proved that it can be represented as a discrete mixture of B2 distributions, and we will use this result.
This result is the following one.
For τ<1, let K be a random variable on N whose probability mass at k∈N is given by 12F1(d,c+d−κ,c+d,1−τ)(1−τ)kk!(c+d−κ)k(d)k(c+d)k and let Φ be a random variable such that (Φ∣K=k)∼B′(c,d+k,1). Then Φ∼B3(c,d,κ,τ).
For τ>1, let K be a random variable on N whose probability mass at k∈N is given by 12F1(c,c+d−κ,c+d,1−1τ)(1−1τ)kk!(c+d−κ)k(c)k(c+d)k and let Φ be a random variable such that (Φ∣K=k)∼B′(c+k,d,1). Then Φ∼B3(c,d,κ,τ).
So we can sample B3(c,d,κ,τ) if we are able to sample these discrete distributions. To do so, we use the Runuran
package.
library(Runuran)
pmf_unnormalized <- function(k, c, d, kappa, tau){
out <- numeric(length(k))
positive <- k >= 0
k <- k[positive]
out[positive] <-
if(tau < 1){
exp(k*log(1-tau) - lfactorial(k) +
lnpoch(c+d-kappa,k) + lnpoch(d,k) - lnpoch(c+d,k))
}else{
exp(k*log(1-1/tau) - lfactorial(k) +
lnpoch(c+d-kappa,k) + lnpoch(c,k) - lnpoch(c+d,k))
}
out
}
NormalizingConstant <- function(c, d, kappa, tau){
if(tau < 1){
hyperg_2F1(d, c+d-kappa, c+d, 1-tau)
}else{
hyperg_2F1(c, c+d-kappa, c+d, 1-1/tau)
}
}
Ksampler <- function(n, c, d, kappa, tau){
dist <- unuran.discr.new(
pmf = function(k) pmf_unnormalized(k, c, d, kappa, tau),
lb = 0, ub= Inf, sum = NormalizingConstant(c, d, kappa, tau)
)
unuran <- unuran.new(dist, method="dgt")
ur(unuran, n)
}
rB3 <- function(n, c, d, kappa, tau){
stopifnot(c > 0, d > 0, tau > 0)
if(tau == 1 || kappa == c+d){
rbetaprime(n, c, d, 1)
}else if(kappa == 0){
rbetaprime(n, c, d, tau)
}else{
K <- Ksampler(n, c, d, kappa, tau)
if(tau < 1){
rbetaprime(n, c, d + K, 1)
}else{
rbetaprime(n, c + K, d, 1)
}
}
}
Let’s check. The density:
c <- 2; d <- 3; kappa <- 4; tau <- 5
nsims <- 1000000
sims <- rB3(nsims, c, d, kappa, tau)
plot(density(sims, from = 0, to = quantile(sims, 0.95)))
curve(dB3(x, c, d, kappa, tau), add = TRUE, col = "red",
lty = "dashed", lwd = 2)
The cumulative distribution function:
q <- seq(0.1, 4, length.out = 10)[-6]
cdfValues <- sapply(q, function(x) pB3(x, c, d, kappa, tau))
empirical_cdf <- ecdf(sims)
plot(empirical_cdf, xlim = c(0,4))
points(q, cdfValues, pch=19)
I’ve removed the sixth value of the vector q
because there is a crash of appellf1
for this value:
q <- seq(0.1, 4, length.out = 10)
pB3(q[6], c, d, kappa, tau)
## Error in appell::appellf1(c, kappa, c + d - kappa, c + 1, -q, -q/tau, : f1conv: Computation not possible
It works with the option userflag = 1
:
Finally perhaps the option userflag = 1
is a better default value:
cdfValues <- sapply(q, function(x){
pB3(x, c, d, kappa, tau, userflag = 1)
})
plot(empirical_cdf, xlim = c(0,4))
points(q, cdfValues, pch=19)
Application to the Bayesian binomial model
Consider the binomial statistical model parameterized by the odds ratio ϕ: L(ϕ∣x)∝ϕx(1+ϕ)n. Take a B3(c,d,κ,τ) prior distribution on ϕ: π(ϕ)∝ϕc−1(1+ϕ)−κ(1+ϕτ)c+d−κ Then the posterior distribution on ϕ is B3(c+x,d+n−x,κ+n,τ).
Application to the Bayesian “two Poisson samples” model
Consider the statistical model given by two independent observations x∼P(λS),y∼P(μT) where S and T are known “sample sizes”. We parametrize the model by μ and ϕ:=λμ.
Assigning a Gamma prior distribution GB(a,b) on μ, it is not difficult to get (μ∣ϕ,x,y)∼G(a+x+y,b+ϕS+T). In their paper A Bayesian framework for the ratio of two Poisson rates in the context of vaccine efficacy trials (2011), Laurent & Legrand derived the marginal likelihood on ϕ in the case of the semi-conjuguate family of prior distributions. Their result holds as long as μ and ϕ are independent under the prior distribution, and this result is ˜L(ϕ∣x,y)∝ϕx(1+ρϕ)a+x+y where ρ=ST+b.
Now, let’s assign a B′(c,d,τ) prior distribution on ϕ. Then the posterior distribution on ϕ is given by π(ϕ∣x,y)∝π(ϕ)˜L(ϕ∣x,y)∝ϕc+x−1(1+ρϕ)−(a+x+y)(1+ϕτ)c+d, and by noting that ϕτ=ρϕρτ, we recognize the scaled B3 distribution (ϕ∣x,y)∼1ρ×B3(c+x,a+d+y,a+x+y,ρτ). In particular, if τ=1ρ=T+bS, then we find (ϕ∣x,y)∼B2(c+x,a+d+y,τ), and this is the situation of the semi-conjugate family studied by Laurent & Legrand.