How to build and visualise a Monte Carlo simulation with Python and Plotly
Some of the links in this article are affiliate links. This means if you click on the link and purchase or subscribe, I will receive a commission. All opinions and recommendations remain objective. You can also read the affiliates disclosure for more information.
This article does not constitute financial advice and is for educational purposes only.
What are Monte Carlo simulations?
Monte Carlo simulations are used to model the probabilities of different outcomes where those outcomes are hard to predict due to random variables. The Law of large numbers states that as a sample size grows, its mean gets closer to the average of the whole population. This is due to the sample being more representative of the population as the sample become larger. In other words, with a Monte Carlo simulation the goal is to simulate the collection of all or many possible paths (using random sampling) in order to find the possibilities and the most likely or theoretical solution.
In summary:
- A Monte Carlo simulation is a model used to predict the probability of a variety of outcomes when the potential for random variables is present.
- Monte Carlo simulations help to explain the impact of risk and uncertainty in prediction and forecasting models.
- A Monte Carlo simulation requires assigning multiple values to an uncertain variable to achieve multiple results and then averaging the results to obtain an estimate.
- A Monte Carlo model is a stochastic model, meaning that due to randomness the results may differ each time, as opposed to a deterministic model where given the same inputs you'll get the same result every time.
A quick example to illustrate
Monte Carlo simulations are named after the Monte Carlo casino in Monaco, so let's ask a casino based question.
"If we always pick red at roulette, how often would we win?"
The roulette wheel has 18 red slots, 18 black slots, and 1 green slot for a total of 37 slots.
import random
def play_roulette():
total_slots = 37
red_probability = (18 / total_slots) * 100
black_probability = (18 / total_slots) * 100
green_probability = (1 / total_slots) * 100
possible_outcomes = ["red", "black", "green"]
probabilities = [red_probability, black_probability, green_probability]
outcome = random.choices(
possible_outcomes,
weights=probabilities,
k=1
)[0]
return outcome
def perform_simulation(n_times=1000, choice="red"):
results = { "red": 0, "black": 0, "green": 0 }
for i in range(n_times):
outcome = play_roulette()
results[outcome] += 1
win_percentage = results[choice] / n_times
return results, win_percentage
if __name__ == "__main__":
results, win_percentage = perform_simulation(n_times=1000000,
choice="red")
print(results)
print(win_percentage)
So after 1 million simulations, we can say we win just less than half of the time with a 48.71% probability. We've proven that the extra green pocket gives the house an edge over the long run.
Building the Monte Carlo model with Python
Now we have an idea of what a Monte Carlo simulation is and have seen a short example, we can build a more complex model. The challenge I have set here is to recreate an awesome Monte Carlo retirement simulation from engaging-data.com using Python and Plotly. After playing around with this calculator I wondered how this could be re-created in Python with a few individual touches. I got quite close and there's lots to learn from the code.
The question this time is "If I invest a set amount for a number of years, how much might I have?"
All of the code for this model can be found on GitHub.
"""
Monte Carlo model to simulate the growth
of an investment portfolio over time.
"""
import numpy as np
from helpers import (
get_random_returns,
get_confidence_levels,
get_yearly_percentiles)
from plots import (
plot_histogram,
plot_yearly_percentiles)
def perform_simulation(inputs: dict):
"""
Performs a simulation to find out how much
the pot is worth in £ after years of growth.
Returns:
pot (float) - the final amount at the end
history (list) - the yearly history of results
[10000, 11000, 12000, ...]
"""
years = inputs['end_age'] - inputs['start_age']
pot = inputs['starting_pot']
returns = get_random_returns(years=years)
mean_return = (np.mean(returns) - 1) * 100
history = []
for i in range(years):
annual_return = returns[i]
pot *= annual_return
pot += inputs['annual_contributions']
history.append(int(pot))
return pot, history, mean_return
def perform_monte_carlo(inputs: dict, n: int = 1000):
pot_sizes = []
results = []
mean_returns = []
for i in range(n):
final_amount, history, mean_return = perform_simulation(inputs)
pot_sizes.append(final_amount)
results.append(history)
mean_returns.append(mean_return)
lower_confidence, upper_confidence = get_confidence_levels(pot_sizes)
print('Monte carlo model done :)', end='\n')
print('Plots saved to /outputs folder')
print('Mean return across all simulations: ', end='')
print(f'{round(np.mean(mean_returns), 1)}%')
return {
'pot_sizes': pot_sizes,
'results': results,
'yearly_percentiles': get_yearly_percentiles(results, inputs),
'lower_confidence': lower_confidence,
'upper_confidence': upper_confidence,
'mean_returns': mean_returns
}
if __name__ == "__main__":
inputs = {
'start_age': 20,
'end_age': 65,
'starting_pot': 5000,
'annual_contributions': 500 * 12,
'target_amount': 300000,
'n_simulations': 10000
}
mc = perform_monte_carlo(inputs,
n=inputs['n_simulations'])
plot_histogram(mc['pot_sizes'],
mc['upper_confidence'],
mc['lower_confidence'])
plot_yearly_percentiles(inputs=inputs,
df=mc['yearly_percentiles'])
This model takes a dictionary 'inputs' which you can change to adapt the simulation. The 'perform_monte_carlo' function carries out a given number of simulations and returns the final 'pot_sizes' with other useful information like the history and mean returns of each simulation, the yearly percentiles, alongside upper and lower confidence intervals.
For this example our starting age is 20 and end age is 65. We start with £5,000 and our annual contributions are £6,000 (or £500 per month) and we're aiming for a £300,000 pot! We will run this simulation 10,000 times. You might be thinking, how do we simulate the randomness of what our returns might be each year?
A quick Google search tells us that the historic annual average return of the S&P500 is 10% per year. Sorry but I'm much more pessimistic and expect lower. I have modelled a range of returns and assigned them probability weights in the file helpers.py below. This means I've assumed low returns are more likely, but there is also a chance of higher returns, or negative returns. Nobody knows what the markets will do, and that's why randomness will help us with this uncertainty and view the outcomes of many simulations.
import random
import numpy as np
import pandas as pd
def get_random_returns(years: int):
"""
Generates a list of random return percentages
for the length of years required.
"""
random_returns = []
for i in range(years):
high_negative_returns = (random.randint(-20, -8) / 1000) + 1
low_negative_returns = (random.randint(-7, -1) / 1000) + 1
low_returns = (random.randint(0, 4) / 100) + 1
medium_returns = (random.randint(5, 9) / 100) + 1
high_returns = (random.randint(10, 20) / 100) + 1
possible_returns = [ # Weights
high_negative_returns, # 5 % chance
low_negative_returns, # 25 % chance
low_returns, # 40 % chance
medium_returns, # 25 % chance
high_returns # 5 % chance
]
random_return = random.choices(
possible_returns,
weights=(5, 25, 40, 25, 5),
k=1
)[0]
random_returns.append(
random_return
)
return random_returns
def get_confidence_levels(pot_sizes):
upper_confidence = round(np.quantile(pot_sizes, 0.975), 2)
lower_confidence = round(np.quantile(pot_sizes, 0.025), 2)
return lower_confidence, upper_confidence
def get_yearly_percentiles(results, inputs) -> pd.DataFrame:
"""
Finds the percentiles for each year.
"""
results_rotated = list(zip(*results[::-1]))
year = []
age = []
ninetieth_percentile = []
seventy_fifth_percentile = []
median = []
twenty_fifth_percentile = []
tenth_percentile = []
for i, year_results in enumerate(results_rotated):
new_age = (inputs['start_age'] + 1) + i
ninetieth_percentile_value = np.percentile(year_results, 90)
seventy_fifth_percentile_value = np.percentile(year_results, 75)
median_value = np.median(year_results)
twenty_fifth_percentile_value = np.percentile(year_results, 25)
tenth_percentile_value = np.percentile(year_results, 10)
year.append(i + 1)
age.append(new_age)
ninetieth_percentile.append(ninetieth_percentile_value)
seventy_fifth_percentile.append(seventy_fifth_percentile_value)
median.append(median_value)
twenty_fifth_percentile.append(twenty_fifth_percentile_value)
tenth_percentile.append(tenth_percentile_value)
return pd.DataFrame(
list(
zip(year,
age,
ninetieth_percentile,
seventy_fifth_percentile,
median,
twenty_fifth_percentile,
tenth_percentile)
),
columns=[
'year',
'age',
'90th_percentile',
'75th_percentile',
'median',
'25th_percentile',
'10th_percentile']
)
The randomness we've introduced here is for every year in each of the 10,000 or more simulations a:
- 5% chance of negative returns between -20% and -8%
- 25% chance of negative returns between -7% and -1%
- 40% chance of low returns between 0% and 4%
- 25% chance of medium returns between 5% and 9%
- 5% chance of high returns between 10% and 20%
If you think these are too pessimistic or optimistic please go ahead change the values or weights 👍
The 'get_yearly_percentiles' function takes the 2D list 'results' (all of the histories for all simulations year by year), rotates it to line up year 1, year 2, year 3 and so on, and then finds the percentiles (10th, 25th, median, 75th, 90th) for each year. This effectively shows the range of results from all simulations for each year in a DataFrame:
year | age | 90th_percentile | 75th_percentile | median | 25th_percentile | 10th_percentile |
---|---|---|---|---|---|---|
1 | 21 | 11450 | 11300 | 11100 | 10990 | 10970 |
2 | 22 | 18153 | 17804 | 17399 | 17110 | 16957 |
3 | 23 | 25296.1 | 24570.25 | 23919 | 23395.75 | 23051 |
4 | 24 | 32631 | 31605 | 30632 | 29832 | 29279.8 |
5 | 25 | 40342.1 | 38841.25 | 37513.5 | 36407.75 | 35624.9 |
... | ... | ... | ... | ... | ... | ... |
41 | 61 | 618714.6 | 553043.8 | 492832 | 442674.5 | 403963.4 |
42 | 62 | 645462.7 | 578133 | 514355 | 461004.3 | 420718 |
43 | 63 | 673703 | 602295 | 535547 | 478970 | 437741.3 |
44 | 64 | 703538.1 | 629788.8 | 557324.5 | 498292.3 | 453481.4 |
45 | 65 | 739303.7 | 656680.5 | 580414.5 | 517842 | 470615.8 |
This can then be plotted using Plotly along with the final pot sizes.
Plotting the Monte Carlo results with Plotly
I was using Spyder to carry out this analysis, and saved the plots as html files in the /output directory. You'll need to install Plotly with python -m pip install plotly
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='svg'
def plot_histogram(pot_sizes: list,
upper_confidence:float,
lower_confidence: float):
"""
Plots the frequencies of the final pot sizes.
"""
fig = px.histogram(pot_sizes,
title=f"The final pot size after {len(pot_sizes)} simulations.")
fig.add_vline(x=lower_confidence,
line_width=3,
line_dash="dash",
line_color="green")
fig.add_vline(x=upper_confidence,
line_width=3,
line_dash="dash",
line_color="green")
fig.add_vline(x=np.median(pot_sizes),
line_width=3,
line_dash="dash",
line_color="black",
annotation_text="median",
annotation_font_size=15)
fig.add_vrect(x0=lower_confidence,
x1=upper_confidence,
line_width=0,
fillcolor="green",
opacity=0.2,
annotation_text="95% confidence interval",
annotation_font_size=15)
fig.update_layout(
xaxis_title="Amount (£)",
yaxis_title="Count",
showlegend=False,
font=dict(
family="Arial",
size=14
),
paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)',
)
fig.write_html('outputs/mc-histogram.html', auto_open=False)
def plot_yearly_percentiles(inputs, df):
"""
Plots the year by year percentile graph.
"""
exact_np = df[df['90th_percentile'] > inputs['target_amount']].iloc[0]
exact_sfp = df[df['75th_percentile'] > inputs['target_amount']].iloc[0]
exact_median = df[df['median'] > inputs['target_amount']].iloc[0]
exact_tfp = df[df['25th_percentile'] > inputs['target_amount']].iloc[0]
exact_tp = df[df['10th_percentile'] > inputs['target_amount']].iloc[0]
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['age'],
y=df['10th_percentile'],
line = dict(color='#FFA502'),
mode='lines',
name='10th %tile',
fill='none',
fillcolor = '#F7CA77'))
fig.add_traces(go.Scatter(x=df['age'],
y=df['25th_percentile'],
line = dict(color='#7BE56E'),
mode='lines',
name='25th %tile',
fill='tonexty',
fillcolor = '#F7CA77'))
fig.add_traces(go.Scatter(x=df['age'],
y=df['median'],
line=dict(color='black'),
line_width=3,
mode='lines',
name="median",
fill='tonexty',
fillcolor='#00FF66'))
fig.add_traces(go.Scatter(x=df['age'],
y=df['75th_percentile'],
line = dict(color='#7BE56E'),
mode='lines',
name="75th %tile",
fill='tonexty',
fillcolor = '#00FF66'))
fig.add_traces(go.Scatter(x=df['age'],
y=df['90th_percentile'],
line = dict(color='#FFA502'),
mode='lines',
name="90th %tile",
fill='tonexty',
fillcolor = '#F7CA77'))
fig.update_layout(hovermode="x")
fig.update_xaxes(tickangle=0,
dtick=1,
showticklabels=True,
gridcolor='lightgray',
type='category')
fig.update_yaxes(gridcolor='lightgray',
rangemode="tozero")
fig.add_hline(y=inputs['target_amount'],
line_width=2,
line_dash='dash',
line_color='red',
annotation_text='Target amount',
annotation_font=dict(
family="Arial",
size=15,
color="red"
),
annotation_font_size=15,
annotation_position='bottom left',
fillcolor='red')
fig.add_shape(type="line",
x0=int(exact_median['year'] - 1),
y0=0,
x1=int(exact_median['year'] - 1),
y1=inputs['target_amount'],
line_width=2,
line_color='gray',
line_dash='dash')
fig.add_shape(type="line",
x0=int(exact_tp['year'] - 1),
y0=0,
x1=int(exact_tp['year'] - 1),
y1=inputs['target_amount'],
line_width=2,
line_color='orange',
line_dash='dash')
fig.add_shape(type="line",
x0=int(exact_np['year'] - 1),
y0=0,
x1=int(exact_np['year'] - 1),
y1=inputs['target_amount'],
line_width=2,
line_color='orange',
line_dash='dash')
fig.add_shape(type="line",
x0=int(exact_tfp['year'] - 1),
y0=0,
x1=int(exact_tfp['year'] - 1),
y1=inputs['target_amount'],
line_width=2,
line_color='green',
line_dash='dash')
fig.add_shape(type="line",
x0=int(exact_sfp['year'] - 1),
y0=0,
x1=int(exact_sfp['year'] - 1),
y1=inputs['target_amount'],
line_width=2,
line_color='green',
line_dash='dash')
fig.add_annotation(x=int(exact_median['year'] - 1),
y=inputs['target_amount'] * 1.45,
text=f"<b>{int(exact_median['year'])} years</b>",
font=dict(
color="black",
size=21
),
showarrow=False,
yshift=10)
fig.add_annotation(x=int(exact_median['year'] - 1),
y=inputs['target_amount'] * 1.3,
text=f"<b>(Age {int(exact_median['age'])})</b>",
font=dict(
color="black",
size=21
),
showarrow=False,
yshift=10)
fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2,
y=df['10th_percentile'].max() - 8000,
text="<b>10%</b>",
font=dict(
color="black",
size=12
),
showarrow=False,
yshift=10)
fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2,
y=df['25th_percentile'].max() - 8000,
text="<b>25%</b>",
font=dict(
color="black",
size=12
),
showarrow=False,
yshift=10)
fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.25,
y=df['median'].max() - 5000,
text="<b>median</b>",
font=dict(
color="black",
size=12
),
showarrow=False,
yshift=10)
fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2,
y=df['75th_percentile'].max() - 4000,
text="<b>75%</b>",
font=dict(
color="black",
size=12
),
showarrow=False,
yshift=10)
fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2,
y=df['90th_percentile'].max() - 5000,
text="<b>90%</b>",
font=dict(
color="black",
size=12
),
showarrow=False,
yshift=10)
fig.add_annotation(x=.99,
xref='paper',
xanchor='right',
y=0,
yanchor='bottom',
text="<b>shedloadofcode.com</b>",
font=dict(
color="gray",
size=14
),
showarrow=False)
fig.add_annotation(x=0.01,
xref='paper',
yref='paper',
xanchor='left',
y=0.99,
yanchor='top',
text=f"In <b>{inputs['n_simulations']}</b> simulations " +
f"<b>{int(exact_median['age'])}</b> " +
f"is the median age ({int(exact_median['year'])} years)<br>",
font=dict(
color="black",
size=15
),
showarrow=False)
fig.add_annotation(x=0.01,
xref='paper',
yref='paper',
xanchor='left',
y=0.96,
yanchor='top',
text="<span style=\"color:orange\">10th to 90th %ile: " +
f"<b>{int(exact_np['year'])} to {int(exact_tp['year'])} " +
"years</b> to target</span>",
font=dict(
color="black",
size=15
),
showarrow=False)
fig.add_annotation(x=0.01,
xref='paper',
yref='paper',
xanchor='left',
y=0.93,
yanchor='top',
text="<span style=\"color:green\">25th to 75th %ile: " +
f"<b>{int(exact_sfp['year'])} to {int(exact_tfp['year'])} " +
"years</b> to target</span>",
font=dict(
color="black",
size=15
),
showarrow=False)
fig.update_layout(
title=f"Percentiles by year after {inputs['n_simulations']} simulations.",
xaxis_title="Age",
yaxis_title="Amount (£)",
legend_title="Legend Title",
showlegend=False,
font=dict(
family="Arial",
size=14
),
paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)',
)
fig.write_html('outputs/mc-percentiles.html', auto_open=False)
This outputs the year by year percentiles to 'outputs/mc-percentiles.html'. The good part about the Plotly HTML output is that after uploading to GitHub it can be viewed via htmlpreview.github.io
Go ahead and take a look at the Monte Carlo percentile graph.
You can also use this to embed the interactive plot in a web page using an iframe like the one below!
<iframe height="800" width="100%" loading="lazy" src="https://htmlpreview.github.io/?https://raw.githubusercontent.com/shedloadofcode/monte-carlo-simulation/main/outputs/mc-percentiles.html"></iframe>
We can see that the median amount crosses the target at age 51 given our inputs. However, given a better or worse outcome it could cross the target amount between ages 47 and 54.
Monte Carlo simulations are a great way to deal with uncertainty when we simply don't know what the expected values (in this case investment returns) will be.
We can also take a look at the frequencies of the final pot sizes at the end age 65 in a histogram.
We can see the 95% confidence interval is between £420k - £850k with the median pot size being £581k. This demonstrates the power of compounding and starting investing from an early age.
Comparing the results
I have used numerous scenarios as inputs to test this model against the calculator from engaging-data.com (ED) to see how the results align, which has been pretty fun. I set the average return on the ED calculator to 4% as my model is a bit more pessimistic. As mentioned earlier, you can change the probability weights for a given set of returns in the 'get_random_returns' function if you feel more optimistic.
Here are my findings in three scenarios:
Scenario 1 inputs
Input | Value |
---|---|
Start age | 20 |
End age | 65 |
Starting pot | 5,000 |
Annual contributions | 6,000 |
Target amount | 300,000 |
Simulations | 10,000 |
Scenario 1 results View calculator results
Model | Years | Age |
---|---|---|
This model | 30 | 50 |
ED calculator | 30.3 | 50 |
Scenario 2 inputs
Input | Value |
---|---|
Start age | 30 |
End age | 65 |
Starting pot | 10,000 |
Annual contributions | 10,000 |
Target amount | 400,000 |
Simulations | 10,000 |
Scenario 2 results View calculator results
Model | Years | Age |
---|---|---|
This model | 26 | 56 |
ED calculator | 25.2 | 55 |
Scenario 3 inputs
Input | Value |
---|---|
Start age | 25 |
End age | 65 |
Starting pot | 20,000 |
Annual contributions | 20,000 |
Target amount | 1,000,000 |
Simulations | 10,000 |
Scenario 3 results View calculator results
Model | Years | Age |
---|---|---|
This model | 30 | 55 |
ED calculator | 30.2 | 55 |
As you can see the results are very closely aligned, so I'm very pleased with how well this model is performing. Of course, as George Box said All models are wrong, but some are useful. We should not forget that models and simulations can only give us an indication of possible outcomes, we should never blindly trust them but use them as tools. I think it's also important to keep them realistic and not introduce too much bias or ego into our assumptions. It would be great to get 10% returns every year, but is that realistically going to happen?
Lowering our model's assumptions ensures we are closer to a 'worst case scenario' and any over-performance is a bonus!
Conclusion
We've learned lots on both Monte Carlo methods and creating / embedding Plotly visualisations with Python.
Some of the techniques used in this article with Plotly can also be used for variations of fan charts typically used for forecasting and acknowledging uncertainty.
I didn't quite get around to incrementing the results by 0.1 and plotting the circles which you can achieve with Plotly shapes. Maybe this is something you can try to replicate if you want to. I actually preferred seeing the vertical lines show which age the amount goes above the target rather than the exact intersection - this also is the foundation of statistical process control charts to make variation in the results explicit.
Thanks to engaging-data.com for giving me the inspiration to try and re-create this awesome model and visualisation with Python and Plotly.
Finally, it's worth mentioning that DataCamp has an interactive course Monte Carlo Simulations in Python and many other great courses on data science and machine learning. Read the full review Developing your data science and analytical coding skills - a review of DataCamp.
If you enjoyed this article you may also be interested in: