Probability Distribution :-

In this article our main focus is to understand what is the importance of probability in AI and ML and Data Analysis. We will explore and understand the Probability Distribution functions in details with day to day examples.

Why to Learn Probability?

The probability gives us the mathematical calculations, distributions and help us to visualize what’s happening underneath.

Many Algorithms are designed Using Probability.

  1. These range from Individual algorithms like Naive Bayes algorithm , which is constructed using the Bayes Theorem.
  2. It also extends to the fields of studies such as Probabilistic Graphical Models, often called as Graphical Models and designed around the Bayes Theorem.
  3. Probability is also used in the model such as “Bayesian Belief Networks” or Bayes Nets, which are capable of capturing the conditional dependencies between variables.
  4. The models are trained using the iterative algorithm designed under a probabilistic framework like “Maximum Likelihood Estimations” It is mostly used in the supervised Data Clustering eg:- Estimating the K-Means for K clusters, also known as :- “K-means Clustering Algorithms“.
  5. Models can be tuned with a Probabilistic “Bayesian Optimizations” .
  6. Probabilistic Measures are used to Evaluate the Models in the ROC curves.

Introduction To Probability Distribution:-

Let us suppose you are a professor in a university. After checking assignments for a week, you graded all the students. You gave these graded papers to your Assistant in the university and told him to create a spreadsheet containing the grades of all the students. But the guy only stores the grades and not the corresponding students names.

Scores without Student names

He made another blunder, he missed a couple of entries in a hurry and we have no idea whose grades are missing. Let’s find a way to solve this.

One way is that you visualize the grades and see if you can find a trend in the data.

Frequency Distribution of the data.

The graph that we have plot is called the frequency distribution of the data. You see that there is a smooth curve like structure that defines our data, but do you notice an anomaly? We have an abnormally low frequency at a particular score range. So the best guess would be to have missing values that remove the dent in the distribution.

This is how we would try to solve a real-life problem using data analysis. For any Data Scientist, a student or a practitioner, distribution is a must know concept. It provides the basis for analytics and inferential statistics.

Common Data Types :-

Discrete Data or Variables :- As the name suggests this variable can take only specified values. For example, when you roll a die, the possible outcomes are 1, 2, 3, 4, 5 or 6 and not 1.5 or 2.45.

Continuous Data or Variables :- This can take any value within a given range. The range may be finite or infinite. For example, :- weight or height, the length of the road. The weight can be any value from 54 kgs, or 54.5 kgs, or 54.5436 kgs, and the length can take any values.

Types of Distributions :-

1) Bernoulli Distribution :-

For example in a cricket match, we decide who is going to bat or ball by a toss, as it has only two outcomes Heads or Tails. There is no Midway.

Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and a single trial. So the random variable X which has a Bernoulli distribution can take value 1 with the probability of success, say p, and the value 0 with the probability of failure, say q or 1-p.

Here, the occurrence of a head denotes success, and the occurrence of a tail denotes failure.
Probability of getting a head = 0.5 = Probability of getting a tail since there are only two possible outcomes.

The probability mass function is given by: px(1-p)1-x  where x € (0, 1).
It can also be written as

Bernoullis Function in Maths

The probabilities of success and failure need not be equally likely, like the coding competition between me and “Guido van Rossum” . He is pretty much certain to win as he experienced. So in this case probability of my success is 0.15 while my failure is 0.85 and his winning is 0.85 and loss is 0.15.

Here, the probability of success(p) is not same as the probability of failure. So, the chart below shows the Bernoulli Distribution of our fight.

Here, the probability of success = 0.15 and probability of failure = 0.85. Basically expected value of any distribution is the mean of the distribution. The expected value of a random variable X from a Bernoulli distribution is found as follows:

E(X) = 1*p + 0*(1-p) = p

The variance of a random variable from a binomial distribution is:

V(X) = E(X²) – [E(X)]² = p – p² = p(1-p)

There are many examples of Bernoulli distribution such as whether it’s going to rain tomorrow or not where rain denotes success and no rain denotes failure and Winning (success) or losing (failure) the game.

2) Uniform Distribution :-

When you roll a fair die, the outcomes are 1 to 6. The probabilities of getting these outcomes are equally likely and that is the basis of a uniform distribution. Unlike Bernoulli Distribution, all the n number of possible outcomes of a uniform distribution are equally likely.

A variable X is said to be uniformly distributed if the density function is:

The graph of a uniform distribution curve looks like

Uniform Distributions Curve.

You can see that the shape of the Uniform distribution curve is rectangular, the reason why Uniform distribution is called rectangular distribution.

For a Uniform Distribution, a and b are the parameters. 

Let’s try calculating the probability that the daily sales will fall between 15 and 30.

The probability that daily sales will fall between 15 and 30 is (30-15)*(1/(40-10)) = 0.5

Similarly, the probability that daily sales are greater than 20 is  = 0.667

The mean and variance of X following a uniform distribution is:

Mean -> E(X) = (a+b)/2

Variance -> V(X) =  (b-a)²/12

The standard uniform density has parameters a = 0 and b = 1, so the PDF for standard uniform density is given by:

Standard Uniform Density

3) Binomial Distribution :-

Suppose that you won the toss today and this indicates a successful event. You toss again but you lost this time. If you win a toss today, this does not necessitate that you will win the toss tomorrow. Let’s assign a random variable, say X, to the number of times you won the toss. What can be the possible value of X? It can be any number depending on the number of times you tossed a coin.

There are only two possible outcomes. Head denoting success and tail denoting failure. Therefore, probability of getting a head = 0.5 and the probability of failure can be easily computed as: q = p – 1 = 0.5.

A distribution where only two outcomes are possible, such as success or failure, gain or loss, win or lose and where the probability of success and failure is same for all the trials is called a Binomial Distribution.

The outcomes need not be equally likely. Remember the example of Game to chess between me and Vishwanath Anand ? So, if the probability of success in an experiment is 0.2 then the probability of failure can be easily computed as q = 1 – 0.2 = 0.8.

Each trial is independent since the outcome of the previous toss doesn’t determine or affect the outcome of the current toss. An experiment with only two possible outcomes repeated n number of times is called binomial. The parameters of a binomial distribution are n and p where n is the total number of trials and p is the probability of success in each trial.

On the basis of the above explanation, the properties of a Binomial Distribution are

  1. Each trial is independent.
  2. There are only two possible outcomes in a trial- either a success or a failure.
  3. A total number of n identical trials are conducted.
  4. The probability of success and failure is same for all trials. (Trials are identical.)

The mathematical representation of binomial distribution is given by:

Binomial Distribution

A binomial distribution graph where the probability of success does not equal the probability of failure looks like below.

Probability of Success!=Probability of Failure

Now, when probability of success = probability of failure, in such a situation the graph of binomial distribution looks like

Binomial Distribution, when Probability of Success=Probability of Failure

The mean and variance of a binomial distribution are given by:

Mean -> µ = n*p

Variance -> Var(X) = n*p*q

4) Normal Distribution :-

Normal distribution represents the behavior of most of the situations in the universe (That is why it’s called a “normal” distribution. I guess!). The large sum of (small) random variables often turns out to be normally distributed, contributing to its widespread application. Any distribution is known as Normal distribution if it has the following characteristics:

  1. The Mean, Median and Mode of the distribution coincides and are equal.
  2. The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
  3. The total area under the curve is 1.
  4. Exactly half of the values are to the left of the center and the other half to the right.

A Normal Distribution is highly different from Binomial Distribution. However, if the number of trials approaches to infinity then the shapes will be quite similar.

The PDF of a random variable X following a normal distribution is given by :-

PDF of a Normal Distribution.

The Mean and Variance of a random variable X which is said to be normally distributed is given by:

Mean -> E(X) = µ

Variance -> Var(X) = σ^2

Here, µ (mean) and σ (standard deviation) are the parameters.
The graph of a random variable X ~ N (µ, σ) is shown below.

Normal Distribution with Different Mean and SD.

Standard Normal Distribution :-

A standard normal distribution is defined as the distribution with mean 0 and standard deviation 1.  For such a case, the PDF becomes:

A Standard Normal Distribution Equation.
A Standard Normal Distribution Graph with Mean=0, SD=1.

5) Poisson Distribution :- My Favourite so I will extend a bit.

Suppose you have a Instrument that have a Computer Vision Technology and it is fixed to a Traffic camera counting the number of the cars passes at different time span of the day. It can be any random number. This distribution is actually observed to be of “Poisson Distribution”.

Suppose you are a Customer Care Representative ( Which will be surely getting Replaced by CHATBOTs in near Future), approximately how many calls or enqueries do you get in a day? It can be any number. Now, the entire number of calls at a call center in a day is modeled by Poisson distribution. Some more examples are

  1. The number of people visiting in website daily.
  2. The number of emergency calls recorded at a hospital in a day.
  3. The number of customers arriving at a shop in an hour.
  4. The number of suicides reported in IITs or NITs (Recently I observed) etc.

Note :- Poisson Distribution is applicable in situations where events occur at random points of time and space wherein our interest lies only in the number of occurrences of the event.

A distribution is called Poisson distribution when the following assumptions are valid :-

1. Any successful event should not influence the outcome of another successful event.
2. The probability of success over a short interval must equal the probability of success over a longer interval.

Note:- ( Here we are talking about a part of range Interval eg :- Total No. of cars in 1 hours = No. of cars in 12 hours , It is distributed ).
3. The probability of success in an interval approaches zero as the interval becomes smaller.

Note :- ( Here we are reducing the range only, eg :- the no. of cars in 1 sec is almost 0 so probability also 0)

Now, if any distribution validates the above assumptions then it is a Poisson distribution. Some notations used in Poisson distribution are:

  • “λ” is the rate at which an event occurs,
  • “t” is the length of a time interval,
  • And “X” is the number of events in that time interval.

Here, “X” is called a “Poisson Random Variable” and the probability distribution of ” X” is called “Poisson distribution”.

Let µ denote the mean number of events in an interval of length t. Then,

µ = λ*t.

The PMF of X following a Poisson distribution is given by:

Poisson Distribution

The mean µ is the parameter of this distribution. µ is also defined as the λ times length of that interval. The graph of a Poisson distribution is shown below:

Mean is 5.5

The graph shown below illustrates the shift in the curve due to increase in mean.

Mean Shifted to 8.5

It can be understood that as the mean increases, the curve shifts to the right.

The mean and variance of X following a Poisson distribution:

Mean -> E(X) = µ
Variance -> Var(X) = µ

Poisson Distributions with different Means but same Area under the Curve.

6) Exponential Distribution :-

Let us consider “Grouply” has made to the top Educational products and well know educational sites, and many people are visiting to the site and many enquiries are coming randomly ( Which is definitely going to happen in t+1 time ). What is the time interval between the enquiries, how can be estimate and fix the time interval so that accordingly we can arrange our workforce. Solution of this is “Exponential distribution”. Let us see some more example.

Exponential distribution is widely used for survival analysis. From the expected life of a machine to the expected life of a human, exponential distribution successfully helps us to estimate the result.

A random variable X is said to have an Exponential Distribution with PDF:

f(x) = { λe-λx,  x ≥ 0

and parameter λ>0 which is also called the rate.

Exponential Distribution

For survival analysis, λ is called the failure rate of a device at any time t, given that it has survived up to t.

Mean and Variance of a random variable X following an exponential distribution:

Mean -> E(X) = 1/λ

Variance -> Var(X) = (1/λ)²

Note :- The greater the rate, the faster the curve drops and the lower the rate, flatter the curve. This is explained better with the graph shown below.

Exponential Distribution with Different Failure rate

To ease the computation, there are some formulas given below.
1) P{X≤x} = 1 – e-λx, corresponds to the area under the density curve to the left of x.

2) P{X>x} = e-λx, corresponds to the area under the density curve to the right of x.

3) P{x1<X≤ x2} = e-λx1 – e-λx2, corresponds to the area under the density curve between x1 and x2.

Relations between the Distributions :-

Actually all these distributions are interrelated if we look closely. Let us explore the relations between them.

a) Relation between Bernoulli and Binomial Distribution

Bernoulli Distribution is a special case of Binomial Distribution with a single trial.

  1. There are only two possible outcomes of a Bernoulli and Binomial distribution, namely success and failure.

2. Both Bernoulli and Binomial Distributions have independent trails.

b) Relation between Poisson and Binomial Distribution

“Poisson Distribution” is a limiting case of “Binomial distribution”” under the following conditions:

  1. The number of trials is indefinitely large or n → ∞.
  2. The probability of success for each trial is same and indefinitely small or p→0.
  3. np = λ, is finite.

c) Relation between Normal and Binomial Distribution & Normal and Poisson Distribution:

“Normal distribution” is another limiting form of “Binomial Distribution” under the following conditions:

  1. The number of trials is indefinitely large, n → ∞.
  2. Both p and q are not indefinitely small.

Note :- The normal distribution is also a limiting case of Poisson distribution with the parameter λ →∞.

d) Relation between Exponential and Poisson Distribution:

If the times between random events follow exponential distribution with rate λ, then the total number of events in a time period of length “t” follows the Poisson distribution with parameter “λt”.

All the best !

” An Investment in knowledge pays the best Interest. ” – Benjamin Franklin

“Keep Learning and Sharing” with


Satyajit Das

Share This Post
Have your say!
Skip to toolbar