This post is also available in: हिन्दी (Hindi) العربية (Arabic)
Data is the most important part of all Data Analytics, Machine Learning, and Artificial Intelligence. Without data, we cannot train any model and all modern research and automation will go in vain. Big enterprises spend lots of money just to gather as much data as possible. Before moving on to different types of probability distributions, let’s see what is probability distribution.
What is a Probability Distribution?
A probability distribution is a statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. This range is bounded between the minimum and maximum possible values, but precisely where the possible value is likely to be plotted on the probability distribution depends on a number of factors. These factors include the distribution’s mean (average), standard deviation, skewness, and kurtosis.
Probability distributions are an integral part of machine learning as it helps to analyze and visualize the data.
Discrete and Continuous Variables
Before moving on to different types of probability, let’s understand some basic terms associated with the topic. The probability distribution type depends upon whether the variable contains discrete values or continuous values.
A discrete variable can only take a limited set of values from a given range of values. For example, the number of students in a class, the number of questions in a test paper, the number of children in a family, etc are all discrete variables.
A continuous variable can take any value from a given range of values. For example, the height of a person, the weight of a person, temperature, etc are all continuous variables.
Types of Probability Distribution
Following are the most common probability distributions used in different applications.
1. Bernoulli Distribution
Bernoulli distribution is a discrete probability distribution, meaning it’s concerned with discrete random variables. Bernoulli distribution applies to events that have one trial and two possible outcomes. Following are some of the examples of such experiments (known as the Bernoulli experiment).
- Will a coin when tossed land on the head? Here, since the coin is tossed only once, the number of trials is one and has two possible viz outcomes. ‘Head’ and ‘Tail’ (getting ‘Head’ is a success and getting ‘Tail’ is a failure).
- Will I roll a six with a die? Here, the die is flipped once, so the number of trials is one and has two possible outcomes viz. 6 or ‘not a 6’ (getting 6 is a success and getting ‘not a 6’ is a failure).
- Will a student pass an exam? As the exam is taken only once, here also the number of trials is one and has two possible outcomes viz. Pass or Fail.
In all Bernoulli trials, the two possible outcomes can be thought of in terms of “success” or “failure”.
The Bernoulli distribution is essentially a calculation that allows you to create a model for the set of possible outcomes of a Bernoulli trial. So, whenever you have an event that has only two possible outcomes, Bernoulli’s distribution enables you to calculate the probability of each outcome.
Let’s now understand how the probability of an event is calculated. A Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and a single trial. Let us consider that the random variable X can take the value 1 with probability p and the value 0 with probability q (where q = 1 – p).
The probability mass function is given by: px(1 – p)1 – x, where x can take value 0 or 1.

Bernoulli distribution has a crucial role to play in data analytics, data science, and machine learning. Some of the examples are
- A spam filter that detects whether an email should be classified as “spam” or “not spam”.
- A model that can predict whether a customer will take a certain action or not.
2. Uniform Distribution
Uniform distribution is a term used to describe a form of probability distribution where every possible outcome has an equal likelihood of happening. The probability is constant since each variable has equal chances of being the outcome.
For example, if you stand on a street corner and start to randomly hand a $100 bill to any lucky person who walked by, then every passerby would have an equal chance of being handed the money. The percentage of the probability is 1 divided by the total number of outcomes (number of passersby). However, if you favoured short people or women, they would have a higher chance of being given the $100 bill than the other passerby. That would not be described as a uniform probability.
The density function of uniform distribution is given by f(x) = 1/(b – a), a ≤ x ≤ b

Consider the following example.
The number of bouquets sold daily at a flower shop is uniformly distributed with a maximum of 40 and a minimum of 10.
Let’s try calculating the probability that the daily sales will fall between 15 and 30.
The probability that daily sales will fall between 15 and 30 is (30 – 15)/(40 – 10) = 0.5.
Probability of event lying between x1 and x2 is (x2 – x1)/(b – a). In the above example x1 = 15, x2 = 30 and a = 10, b = 40.
Similarly, the probability that daily sales are greater than 20 is (40 – 20)/(40 – 10) = 0.667.
And the probability that daily sales are less than 25 is (25 – 10)/(40 – 10) = 0.5.
3. Binomial Distribution
Let’s consider the case of cricket. Suppose that you won the toss today and this indicates a successful event. You toss again but you lose this time. If you win a toss today, this does not necessitate that you will win the toss tomorrow.
The binomial distribution is a discrete probability distribution, meaning it’s concerned with discrete random variables. It can be thought of as simply the probability of “Success” or “Failure” outcome in an experiment that is repeated multiple times. In general, a binomial experiment is a Bernoulli experiment repeated ‘n’ number of times.
Let’s now understand how the probability of an event is calculated. A binomial distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and ‘n’ number of trials. Let us consider that the random variable X can take the value 1 with probability p and the value 0 with probability q (where q = 1 – p).
The probability mass function is given by: nCxpx(1 – p)n – x, where nCx = n!/(x!(n – x)!).

Consider the following example.
Probability of a team winning a match is 0.8 (80%). If it plays 5 matches and you want to know what is the probability that it will win 3 of these matches.
Here, p = 0.8, q = 1 – 0.8 = 0.2, n = 5 and x = 3Therefore, the probability of winning 3 matches out of 5 will be (5!/(3!(5 – 3)!)(0.83)(0.2)5-3 = 0.2048 (20.48%).
4. Normal Distribution
A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. It’s a continuous distribution. For example, the bell curve is seen in tests like the SAT and GRE. The bulk of students will score the average, while smaller numbers of students will score a B or D.
An even smaller percentage of students score an F or A. This creates a distribution that resembles a bell. The bell curve is symmetrical i.e., the line at the centre of the curve (which represents the mean, median, and mode of the distribution) divides the curve into two equal halves. Half of the data will fall to the left of the mean; half will fall to the right.

The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the mean:
- 68.3% of the data falls within one standard deviation of the mean.
- 95.5% of the data falls within two standard deviations of the mean.
- 99.7% of the data falls within three standard deviations of the mean.
5. Poisson Distribution
The Poisson distribution is a discrete distribution that measures the probability of a given number of events happening in a specified time period. In finance, the Poisson distribution could be used to model the arrival of new buy or sell orders entered into the market or the expected arrival of orders at specified trading venues. Poisson distributions are very useful for smart order routers and algorithmic trading.
The probability mass function of random variable X is given by: P(X = x) = e-𝜇(𝜇x/x!),
where 𝜇 is the mean number of events and x is the number of events in that interval.

Let’s consider the following example to understand it better. A customer service centre receives ten emails every hour and you are interested in finding the probability that the customer service centre receives six emails in the next hour. Here, 𝜇 = 10 and x = 6, so the required probability will be P(X = 6) = e-10(106/6!) = 0.0631 = 6.31%.