Inventory Forecasting with Markov Chain Monte Carlo

Sometimes you want to calculate statistics about some variable which has complex, possibly non linear relationship with another variable for which probability distribution is available, which may be non standard or non parametric. That’s the situation we face when trying predict and plan inventory in the face of demand with some arbitrary probability distribution. For this problem, the goal is to choose an inventory level, given a an arbitrary demand distribution, so that some statistic on earning is maximized. The fact  that the demand  distribution has arbitrary non standard distribution and earning has a complex non linear relation with inventory and demand crushes any hope for an analytical solution.

One way out of this quagmire is simulate earning by sampling from the demand distribution and applying the nonlinear function to convert inventory and demand to earning. That’s the approach I will be elaborating throughout this post. Python implementation of the solution is available in my open source project avenir.

Inventory Planning

Forecasting and planning inventory is double edged sword. If the inventory exceeds demand, there is holding cost. If there is not enough inventory to meet demand, there is cost associated with lost revenue. Even if the item not in inventory is back ordered, there is additional cost for fulfillment of back order.

For some time window e.g. weekly , earning is related to demand and inventory as below for our use case. Some of the functions may be non linear and even non deterministic.

e = d x  p – h(i – d)   when i > d
e = d x p – u(d – i – bo) – b(bo)    when i < d
e = earning
d = demand based on some distribution p(d)
p = per unit profit
h = function to compute holding cost
u = function to compute cost for un fulfillment
bo = number of back ordered items
b = function to compute cost associated with back ordering

The function u  for un-fulfillment cost in our use case is probabilistic. A fraction of unfulfilled demand is back ordered. That fraction is assumed to be random with a normal distribution. This introduces non determinism  in  transformation from inventory and demand to earning and makes the transformation function even more complex.

One popular way to plan for inventory is to maintain a specified  inventory level. Whenever it falls below that level, inventory is replenished. As an inventory planner we may want to know the inventory level  to be  maintained that will maximize the earning given a demand distribution.

If we wanted to calculate average earning conditioned on some inventory level, we have to perform the following integration for an analytical solution.

ea = ∫d e(d,i) p(d)
ea = average earning
e = earning as a function of demand(d) and inventory(i)
p(d) = demand distribution

Any  hope for analytical solution quickly disappears, when we remind ourself of the following inconvenient facts.

  1. Earning depends on inventory and demand non linearly
  2. The demand distribution is  arbitrary and defined non parametrically
  3. Instead of average, we may be interested in some other statistic of earning

Clearly, the analytical solution is intractable. However, we can approach the problem in a direct way by generating samples of earning. We perform the following steps for a given inventory level and demand distribution

  1. Sample demand from the demand distribution.
  2. Calculate earning for a given inventory and the sampled demand, to generate a sample for earning
  3. Repeat 1 and 2 to generate many samples for earning.
  4. From the earning samples, we can calculate any statistic, including average
  5. Repeat 1 through 4 for another inventory level
  6. Once we have earning statistic calculate for various inventory levels, we can choose the inventory level that maximizes such statistic e.g.average earning


Proper sampling is key to the success of our approach. For standard distributions like normal distribution, sampling can be done directly.

All sampling techniques for non parametric or non standard distribution are generally called Monte Carlo sampling. The basic idea is simple and intuitive. We want to traverse the state space (demand in our case) in such a way that number of samples generated from some region will be proportional to the probability in that region.

The three popular sampling algorithms are as below. To use them effectively, requires some prior knowledge of the nature of the distribution

Markov Chain Monte Carlo Sampling

I decided to use sampling technique called Markov Chain Monte Carlo (MCMC) . because It does not require prior knowledge of the nature of distribution distribution. the sampling algorithms is called Metropolis algorithm and it works as follows.

  1. Based on the current sample value di, get the next sample di+1 using the Markov Chain state transition distribution, also called the proposal distribution q(di+1 | di)
  2. Compute A = min(1, p(di+1) / p(di)
  3. Sample from a uniform distribution U(0,1)  between 0 and 1. If  U(0,1) < A, choose di+1 , else stay with the current sample di, 

The Markov Chain distribution should be such that it traverses the sate space well, particularly the important high probability regions . In our case since we are dealing with continuous variable. I used a normal distribution as the Markov chain generator with mean at the current sample value and a pre specified standard deviation.

If the  Markov chain is not symmetric, the algorithm gets more complex and it’s called Metropolis Hastings algorithm. Both q(di | di+1) and q(di+1 | di) enter into the picture. Since in our case, the Markov chain is based on normal distribution, the proposal distribution symmetric. With symmetric proposal distribution, Metropolis Hastings algorithm simplifies  to plain Metropolis. Metropolis Hastings algorithm works as below

  1. Compute A = min(1, p(di+1) x q(di | di+1) / p(di) x q(di+1 | di)
  2. if U(0,1) < A choose di+1 else stay with current sample di

As we can see when the proposal distribution is symmetric i.e. when q(di | di+1) = q(di+1 | di), the Metropolis Hastings algorithms simplifies to the plain Metropolis.

How Many Samples

This is the first issue that arises when embarking on a Markov Chain Monte Carlo solution. How many samples of earning should we generate so that we can extract meaningful statistic from the generated sample. The number samples required will depend on the nature of the demand distribution and the earning transformation function.

The number of samples should be enough so that the distribution of the samples reach a steady converging state. One criteria we could use is the convergence of the average of the samples values.

I ran tests with various sample sizes and checked for convergence of the average to determine the desirable sample size. The output is as below

sample size 20000 earning mean 5038.51  earning mean std dev 25.505
sample size 25000 earning mean 5037.28  earning mean std dev 22.213
sample size 30000 earning mean 5006.31  earning mean std dev 19.972
sample size 35000 earning mean 5052.30  earning mean std dev 18.056
sample size 40000 earning mean 5260.84  earning mean std dev 16.380
sample size 45000 earning mean 5058.52  earning mean std dev 15.816
sample size 50000 earning mean 5176.02  earning mean std dev 14.526
sample size 55000 earning mean 5117.73  earning mean std dev 13.858
sample size 60000 earning mean 5163.73  earning mean std dev 13.085
sample size 65000 earning mean 4836.71  earning mean std dev 13.070

It appears that at around 40000 – 50000 samples we seem to have convergence of the average value earning. For each sample size, I am also displaying the std deviation of the average, which steadily drops with increasing sample size. To understand this let’s digress to  central limit theorem.

According to central limit theorem, whenever n random samples are taken from any distribution with mean μ and variance  σ2 , the sample mean will be approximately  normal distribution with mean μ  and variance σ2 / n.

Earning Statistic at Various Inventory Levels

Just to reiterate, our goal is to calculate certain statistic about earning for various inventory level for a given demand distribution. For each iteration with an inventory level we will use the sample size determined in the last section.

One simple statistic we could use is average earning. To make it more interesting we will use a different earning statistic which will be the earning with a probability of 0.6 or higher i.e. an earning level such that the probability of earning to be equal or more than level is 0.6. Here is the output

inventory 1000  earning 6010.37 excess count 23823 deficit count 21177 transition count 36201
inventory 1050  earning 6081.10 excess count 26323 deficit count 18677 transition count 36079
inventory 1100  earning 6258.78 excess count 28503 deficit count 16497 transition count 36198
inventory 1150  earning 6020.59 excess count 31764 deficit count 13236 transition count 36182
inventory 1200  earning 5920.68 excess count 34093 deficit count 10907 transition count 36194
inventory 1250  earning 5755.85 excess count 36737 deficit count 8263 transition count 36203
inventory 1300  earning 5675.89 excess count 38461 deficit count 6539 transition count 36158
inventory 1350  earning 5570.25 excess count 40433 deficit count 4567 transition count 36243
inventory 1400  earning 4990.23 excess count 41666 deficit count 3334 transition count 36133
inventory 1450  earning 5937.17 excess count 42428 deficit count 2572 transition count 36097
inventory 1500  earning 5343.71 excess count 43636 deficit count 1364 transition count 36068
inventory 1550  earning 5448.19 excess count 44113 deficit count 887 transition count 36221
inventory 1600  earning 4545.37 excess count 44414 deficit count 586 transition count 36230
inventory 1650  earning 5102.07 excess count 45000 deficit count 0 transition count 36136
inventory 1700  earning 4912.56 excess count 45000 deficit count 0 transition count 36146
inventory 1750  earning 5104.34 excess count 45000 deficit count 0 transition count 35959

I used a sample size 45000. It appears that we maximum earning statistic at inventory level of 1100. Stating another way, for the given demand distribution for an item, if we maintain an inventory level of 1100, we will have earning of $6259 or more with a probability of 0.6.

There are some secondary output which shows the number of times inventory exceeded sampled demand and the number of times we had inventory lower than sampled demand. As expected deficit count drops as we increase inventory level. The last parameter tells us how many times MCMC sampler accepted new samples.

Proposal Distribution

The proposal distribution to generate the Markov chain has a strong influence on the quality of the MCMC sampler. Standard deviation of the normal distribution used as the proposal distribution, should be chosen carefully.

  • If the standard deviation is large, the next sample is more likely to be in a low probability region and it’s likely to be rejected, which will result in slow progress
  • On the other hand if the standard deviation is small, the sate space traversal behave more like random walk and too many samples may be required.. If the distribution is multi modal, it may not even visit all the modes.

The standard deviation is provided through a configuration parameter. You can can vary it and run some experiments to find the appropriate value.

One way to address this problem is to have a mixture of 2 proposal distribution one with a low standard deviation and one with a high standard deviation. Most of the time you will use the proposal distribution with low standard deviation.

Occasionally, the proposal distribution with high standard deviation will be used to encourage exploration of  less explored parts of the state space.

Correlated Samples

Since with MCMC, the next sample depends on the current sample as determined by the Markov chain, samples may be correlated. Ideally samples should be independent reflecting the actual distribution we want to sample.

One way to reduce correlation is to make multiple jumps, instead of 1 jump, through the Markov chain before applying the acceptance criteria. We can also perform auto correlation analysis on the samples to find the extent of correlation in the samples.

Burn In

Al though the benefits have been questionable, it’s a common practice to ignore the first n samples in the calculations, known as a burn in process. The initial state when simulation run starts is chosen randomly.

By discarding initial set of samples, we mitigate the impact of the initial random state. The number of burn in samples can be set as a configuration parameter.

For burn in size, any value in the range of 1000 – 5000 has been reported. To be conservative I used 5000.

Convergence Analysis

We need to choose the burn in size and the sample size properly so that we can get convergence in whatever statistic we are calculating from the generate samples. So far, I glossed over the choice of these two parameters.

For burn in size, I used a value as per some heuristic suggested in literature. For total sample size I inspected the mean earning for various sample size and came up with sample size based on convergence of the mean value as detected manually.

In this section I will go through more rigorous convergence analysis to properly choose burn in size and total sample size.  There are several available techniques. I will be using a convergence analysis technique called Geweke convergence.  The algorithm does the following

  1. It takes the sample points excluding the burn in samples
  2. Id defines a widow in the beginning of the sample array from step 1 and and a window at the end of the sample array
  3. It calculates mean and variance of the 2 windows. Based on them it calculates a modified z score

A lower z score indicates better convergence. Here is some sample output for various values of sample size and burn in size

sample size 20000  burn in size 500  z score 3.130
sample size 20000  burn in size 1000  z score 0.329
sample size 20000  burn in size 1500  z score 3.517
sample size 20000  burn in size 2000  z score 1.041
sample size 20000  burn in size 2500  z score 0.413
sample size 20000  burn in size 3000  z score 7.861
sample size 25000  burn in size 500  z score -3.630
sample size 25000  burn in size 1000  z score -7.465
sample size 25000  burn in size 1500  z score -7.089
sample size 25000  burn in size 2000  z score -5.580
sample size 25000  burn in size 2500  z score -5.034
sample size 25000  burn in size 3000  z score -3.520
sample size 30000  burn in size 500  z score 3.677
sample size 30000  burn in size 1000  z score 6.780
sample size 30000  burn in size 1500  z score 2.056
sample size 30000  burn in size 2000  z score 0.122
sample size 30000  burn in size 2500  z score -1.823
sample size 30000  burn in size 3000  z score -0.796
sample size 35000  burn in size 500  z score 1.629
sample size 35000  burn in size 1000  z score -0.479
sample size 35000  burn in size 1500  z score -0.835
sample size 35000  burn in size 2000  z score 0.627
sample size 35000  burn in size 2500  z score 2.427
sample size 35000  burn in size 3000  z score 5.777

According to this analysis, we should use a sample size 30000 and a burn in size of 2000. We have used a sample size of 45000 and a burn in size of 2000.

Parallel Implementation with Spark

Sample generation and statistic calculation may need to be repeated many times.For our particular problem, it needs repeated for each (item, inventory size) combination.

As an example, if we have 1000 items and we try 10 inventory levels for each item, sample generation and statistic calculation will have to be repeated  10000  times. In Spark, we have to partition by (item, inventory size).

I have plans to do a spark implementation. To make it reusable across various application I have to design the Spark implementation in the following way.

  • Provide all the distribution for source variables  as input data or configuration
  • Provided the source to target transformation function as externally provided custom logic.
  • All the parameters needed for transformation should be provided as configuration parameter.

Summing Up

We have used MCMC sampling to calculate statistic about some variable when analytical solution for such statistic is simply intractable, for various reason such as non standard and non parametric arbitrary distribution and complex non linear function.

The Metropolis sampling algorithm is implemented as a Python class in It can be used for any other application with proper configuration of the sampler object.To run the inventory use case, you could follow the tutorial.


For commercial support for any solution in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,



About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Data Science, Machine Learning, Optimization, Python, Simulation and tagged , , , . Bookmark the permalink.

1 Response to Inventory Forecasting with Markov Chain Monte Carlo

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s