Photo by Burak Kebapci from Pexels

Estimating profit and having fun with normal distributions in Python

Jonathan Serrano
6 min readNov 27, 2021

--

Suppose you have gathered evidence that shows that buying and holding for one month asset Crypto X yields a profit accurately modeled by a normal distribution with mean -1.5 and standard deviation 3, like the one below.

Figure 1. Probability density function (PDF) of buying and holding asset Crypto X for 1 month. Mean is -1.5 and standard deviation is 3.

The figure shows the probability density function (PDF) of the expected returns of buying and holding asset Crypto X for 1 month.

Yes, I know, it would be a miracle if modeling returns would be so easy, but hey! This is about playing with normal distributions in Python, so let’s get back to the example.

Estimating the probability of a single profit

As an investor we strive to estimate how valuable an investment is, and the PDF above can help us accomplish just that. If we are to know the probability of getting a profit of 2.5% we could proceed as follows. First we find the 2.5% in value in the Figure 1 x-axis, then we move up until the plot is found, then take note of the corresponding y value, which is roughly 0.05. Note the roughly before; as savvy investors knowing something roughly is not good enough, we would rather know an exact value.

We can calculate it using the PDF function f(x) of a normal distribution.

Perhaps this is NOT a good idea… Too many calculations and substitutions, so let us allow Python to do the calculations with this code.

from scipy import statsu = -1.5
sigma = 3
ep_dist = stats.norm(u, sigma)

Stats module from scipy does the heavy lifting here and up to this point we have an instance of a normal distribution with mean -1.5 and standard deviation 3 stored in variable ep_dist, which stands for profit distribution.

Now, to answer the question we type.

prob_of_x = ep_dist.pdf(2.5)# prob_of_x = 0.05467002489199788
# our visual guess was 0.05, not bad!

And that’s it, we made our first PDF calculation in just 3 lines of code, sparing ourselves of making substitutions in the PDF equation.

Plotting a normal distribution

The code to generate the first plot is this one.

# x values (profits) 
# and corresponding y values (probabilities)
x = np.linspace(-10, 7.5, 100)
y = ep_dist.pdf(x)
plt.figure(figsize=(7.5,2.5))
plt.plot(x, y)
plt.title('Profit distribution for asset Crypto X', fontsize='15')
plt.xlabel('Expected profit (%)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.show()

If you are familiar with Matplotlib there is nothing new here. The code generates an array of x values and calculates the PDF y values, then plots the whole thing and adds some labels.

Is this investment opportunity convenient?

And the answer is that depends. On what? To answer this question we will throw in some other assumptions.

  1. We have $100 invest (amount chosen to make calculations easier).
  2. We can afford at most a 30% chance of loosing more than 2.5%.
  3. We expect to earn a return of at least 2%, or $2 given assumption (1).

Quite eclectic investors we are. Now let’s convert these assumptions into numbers.

Assumption (2) implies calculating the area below the PDF and to the left of a vertical line crossing x-axis at -2.5, which corresponds to the interval (-infinity, -2.5], as shown in blue in Figure 2.

Figure 2. The probability of achieving a profit smaller than 2.5%.

Assumption (3) requires to calculate an area but to the right of the desired profit.

In both cases the task is reduced to calculating the cumulative distribution function (CDF) at a certain value of the expected profit at the x-axis. And the good news is that Python can do this in a breeze.

Estimating the CDF

The cumulative density function (CDF) is the integral of the PDF, i.e. the area under the curve created by the PDF. Figure 3 shows in blue the profit PDF and its corresponding CDF in orange.

Figure 3. PDF and CDF (area under the curve created by the PDF) of a normal distribution.

Let’s use Python to calculate the PDF at x=-2.5 like this.

ep_dist.cdf(-2.5)
# returns 0.36944134018176367

And that would be all. A stats.norm object has a cdf() method that returns the cumulative area from the left of a given point. In this case this number means that there is a 36.9% probability of loosing 2.5% or more.

To better understand what is going on let us convert the x-axis profits in percentage to monetary profits, by using the fact that I have $100 to invest. The normal distribution now looks like Figure 4. Note that all is the same except for the x-axis units, which are now $ instead of %. This is why we chose $100 to invest ;).

Figure 4. The same distribution using $ in x-axis instead of a percentage.

The probability of earning at least $2 can be calculated like this.

1 - ep_dist.cdf(2)
# returns 0.12167250457438117
1 - ep_dist.cdf(0)
# returns 0.3085375387259869
# this is the probability of earning a profit larget than zero

This means that there is only a 12.16% probability of getting a profit of $2 or more. Quite low to be honest.

Calculating the overall expected profit

What is the expected profit value of investing $100 in an investment opportunity that follows this distribution?

An expected value is the value we expect (called the support) times the probability of it occurring. If we bet $100 (We win $100 if heads and loose $50 if cross) in flipping a fair coin our expected value is 0.5 * 100 + 0.5 * -50 = 25.

Following this example, to estimate our expected profit value we need to divide the PDF in n slices, then calculate the slice area by multiplying the slice width times its height. Then multiply each one of these slice’s areas by its support in $. Finally sum the n products. The sum’s result is the expected profit. Sounds hard but again Python does the trick in one line.

np.trapz(y * x, x)
# returns -1.4861696910157425

We should remember that x is the support (the slices edges) and y the probability PDF(x) of each support. Numpy’s trapz function calculates the trapezoidal integration of a function given its support. Since we invested $100 the expected profit of this investment is directly -$1.48 (note the negative number).

Investment decision

The probability of loosing more than 2.5% is 36.9%. Since 36.9 is larger 30% this is the first red flag. The probability of earning more than 2% ($2) is only 12.16%. Is it good enough? Most certainly it is not. Second red flag.

And finally, the expected profit of investing $100 of our money for one month in this investment is -$1.48.

No good business!

Unless we like to loose money we should pass on this investment, and Python helped us realize this.

Conclusions

Python and scipy.stats are perfect to deal with and make calculations related to distributions. Add on top of it some Numpy and Matplotlib magic and tons of work are gone.

If you found this post interesting please follow me.

You can check a notebook with the code here.

--

--

Jonathan Serrano

Tech advocate, developer, ML enthusiast and PhD in Computer Science.