Standalone
How can we model a random variable that represents a proportion or a fraction of a quantity between 0 and 1?
Beta distribution.
Where is in the interval [0,1], ( \alpha ) and ( \beta ) are shape parameters, and ( B(\alpha, \beta) ) is the Beta function, which is a normalization constant to ensure that the total probability is 1.
Simple Explanation: The Beta distribution is a continuous probability distribution defined between 0 and 1. <--- It’s often used to model the distribution of random variables that represent proportions or probabilities since its support is the interval [0,1]. The shape of the distribution is determined by two parameters, ( \alpha ) and ( \beta ). By tweaking these parameters, the Beta distribution can take a variety of shapes, from uniform to U-shaped, to bell-shaped, to skewed left or right.
Example in Machine Learning/Data Science:
Since the beta distribution has a domain from 0 to 1, it’s perfect for modelling the distribution of things which also tend to range from [0, 1]. Can you think of any? (Bayesian prior distributions)
- Probabilities
- Proportions
This is a bit of an involved example, so hold on. Your senior manager Ted comes to your desk and asks you for your prediction of the proportion of customers who will click through on an a new ad. Instead of simply answering with a best guess like ‘27%’ (a point estimate) you can specify a distribution which shows how likely you think each plausible proportion is. For example: let’s say you think it would either be a strong 10% or 90% (because of SEO factors which push the ad to either end of the extreme), and is unlikely to be anywhere in between.
You hand this back to Ted on a napkin*:
(diagram)
*Ted is furious, because he is a frequentist, and you get fired.
Warning: when you’re estimating probabilities instead of proportions, this becomes probability on top of probability - the probability of each probability being the true probability!
This is related to Bayesian Statistics which, as you can see, involves making educated guesses on parameters (prior) and updating our beliefs in light of new data (posterior).
In essence, the Beta distribution, in this context, provides a flexible way to model your beliefs about a proportion (the CTR) and update them in light of new data.
The beta distribution models a random variable that can take values between 0 and 1, representing proportions or percentages. Its parameters are:
- Shape parameters ( \alpha ) and ( \beta )
- Expected value (mean) = ( \frac{\alpha}{\alpha + \beta} )
- Variance = ( \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} )
Used in Bayesian statistics to model the prior.