Sometimes you want to calculate statistics about some variable which has complex, possibly non linear relationship with another variable for which probability distribution is available, which may be non standard or non parametric. That’s the situation we face when trying predict and plan inventory in the face of demand with some arbitrary probability distribution. For this problem, the goal is to choose an inventory level, given a an arbitrary demand distribution, so that some statistic on earning is maximized. The fact that the demand distribution has arbitrary non standard distribution and earning has a complex non linear relation with inventory and demand crushes any hope for an analytical solution.
One way out of this quagmire is simulate earning by sampling from the demand distribution and applying the nonlinear function to convert inventory and demand to earning. That’s the approach I will be elaborating throughout this post. Python implementation of the solution is available in my open source project avenir.
Forecasting and planning inventory is double edged sword. If the inventory exceeds demand, there is holding cost. If there is not enough inventory to meet demand, there is cost associated with lost revenue. Even if the item not in inventory is back ordered, there is additional cost for fulfillment of back order.
For some time window e.g. weekly , earning is related to demand and inventory as below for our use case. Some of the functions may be non linear and even non deterministic.
The function u for un-fulfillment cost in our use case is probabilistic. A fraction of unfulfilled demand is back ordered. That fraction is assumed to be random with a normal distribution. This introduces non determinism in transformation from inventory and demand to earning and makes the transformation function even more complex.
One popular way to plan for inventory is to maintain a specified inventory level. Whenever it falls below that level, inventory is replenished. As an inventory planner we may want to know the inventory level to be maintained that will maximize the earning given a demand distribution.
If we wanted to calculate average earning conditioned on some inventory level, we have to perform the following integration for an analytical solution.
Any hope for analytical solution quickly disappears, when we remind ourself of the following inconvenient facts.
- Earning depends on inventory and demand non linearly
- The demand distribution is arbitrary and defined non parametrically
- Instead of average, we may be interested in some other statistic of earning
Clearly, the analytical solution is intractable. However, we can approach the problem in a direct way by generating samples of earning. We perform the following steps for a given inventory level and demand distribution
- Sample demand from the demand distribution.
- Calculate earning for a given inventory and the sampled demand, to generate a sample for earning
- Repeat 1 and 2 to generate many samples for earning.
- From the earning samples, we can calculate any statistic, including average
- Repeat 1 through 4 for another inventory level
- Once we have earning statistic calculate for various inventory levels, we can choose the inventory level that maximizes such statistic e.g.average earning
Proper sampling is key to the success of our approach. For standard distributions like normal distribution, sampling can be done directly.
All sampling techniques for non parametric or non standard distribution are generally called Monte Carlo sampling. The basic idea is simple and intuitive. We want to traverse the state space (demand in our case) in such a way that number of samples generated from some region will be proportional to the probability in that region.
The three popular sampling algorithms are as below. To use them effectively, requires some prior knowledge of the nature of the distribution
Markov Chain Monte Carlo Sampling
I decided to use sampling technique called Markov Chain Monte Carlo (MCMC) . because It does not require prior knowledge of the nature of distribution distribution. the sampling algorithms is called Metropolis algorithm and it works as follows.
- Based on the current sample value di, get the next sample di+1 using the Markov Chain state transition distribution, also called the proposal distribution q(di+1 | di)
- Compute A = min(1, p(di+1) / p(di)
- Sample from a uniform distribution U(0,1) between 0 and 1. If U(0,1) < A, choose di+1 , else stay with the current sample di,
The Markov Chain distribution should be such that it traverses the sate space well, particularly the important high probability regions . In our case since we are dealing with continuous variable. I used a normal distribution as the Markov chain generator with mean at the current sample value and a pre specified standard deviation.
If the Markov chain is not symmetric, the algorithm gets more complex and it’s called Metropolis Hastings algorithm. Both q(di | di+1) and q(di+1 | di) enter into the picture. Since in our case, the Markov chain is based on normal distribution, the proposal distribution symmetric. With symmetric proposal distribution, Metropolis Hastings algorithm simplifies to plain Metropolis. Metropolis Hastings algorithm works as below
- Compute A = min(1, p(di+1) x q(di | di+1) / p(di) x q(di+1 | di)
- if U(0,1) < A choose di+1 else stay with current sample di
As we can see when the proposal distribution is symmetric i.e. when q(di | di+1) = q(di+1 | di), the Metropolis Hastings algorithms simplifies to the plain Metropolis.
How Many Samples
This is the first issue that arises when embarking on a Markov Chain Monte Carlo solution. How many samples of earning should we generate so that we can extract meaningful statistic from the generated sample. The number samples required will depend on the nature of the demand distribution and the earning transformation function.
The number of samples should be enough so that the distribution of the samples reach a steady converging state. One criteria we could use is the convergence of the average of the samples values.
I ran tests with various sample sizes and checked for convergence of the average to determine the desirable sample size. The output is as below
sample size 20000 earning mean 5038.51 earning mean std dev 25.505 sample size 25000 earning mean 5037.28 earning mean std dev 22.213 sample size 30000 earning mean 5006.31 earning mean std dev 19.972 sample size 35000 earning mean 5052.30 earning mean std dev 18.056 sample size 40000 earning mean 5260.84 earning mean std dev 16.380 sample size 45000 earning mean 5058.52 earning mean std dev 15.816 sample size 50000 earning mean 5176.02 earning mean std dev 14.526 sample size 55000 earning mean 5117.73 earning mean std dev 13.858 sample size 60000 earning mean 5163.73 earning mean std dev 13.085 sample size 65000 earning mean 4836.71 earning mean std dev 13.070
It appears that at around 40000 – 50000 samples we seem to have convergence of the average value earning. For each sample size, I am also displaying the std deviation of the average, which steadily drops with increasing sample size. To understand this let’s digress to central limit theorem.
According to central limit theorem, whenever n random samples are taken from any distribution with mean μ and variance σ2 , the sample mean will be approximately normal distribution with mean μ and variance σ2 / n.
Earning Statistic at Various Inventory Levels
Just to reiterate, our goal is to calculate certain statistic about earning for various inventory level for a given demand distribution. For each iteration with an inventory level we will use the sample size determined in the last section.
One simple statistic we could use is average earning. To make it more interesting we will use a different earning statistic which will be the earning with a probability of 0.6 or higher i.e. an earning level such that the probability of earning to be equal or more than level is 0.6. Here is the output
inventory 1000 earning 6010.37 excess count 23823 deficit count 21177 transition count 36201 inventory 1050 earning 6081.10 excess count 26323 deficit count 18677 transition count 36079 inventory 1100 earning 6258.78 excess count 28503 deficit count 16497 transition count 36198 inventory 1150 earning 6020.59 excess count 31764 deficit count 13236 transition count 36182 inventory 1200 earning 5920.68 excess count 34093 deficit count 10907 transition count 36194 inventory 1250 earning 5755.85 excess count 36737 deficit count 8263 transition count 36203 inventory 1300 earning 5675.89 excess count 38461 deficit count 6539 transition count 36158 inventory 1350 earning 5570.25 excess count 40433 deficit count 4567 transition count 36243 inventory 1400 earning 4990.23 excess count 41666 deficit count 3334 transition count 36133 inventory 1450 earning 5937.17 excess count 42428 deficit count 2572 transition count 36097 inventory 1500 earning 5343.71 excess count 43636 deficit count 1364 transition count 36068 inventory 1550 earning 5448.19 excess count 44113 deficit count 887 transition count 36221 inventory 1600 earning 4545.37 excess count 44414 deficit count 586 transition count 36230 inventory 1650 earning 5102.07 excess count 45000 deficit count 0 transition count 36136 inventory 1700 earning 4912.56 excess count 45000 deficit count 0 transition count 36146 inventory 1750 earning 5104.34 excess count 45000 deficit count 0 transition count 35959
I used a sample size 45000. It appears that we maximum earning statistic at inventory level of 1100. Stating another way, for the given demand distribution for an item, if we maintain an inventory level of 1100, we will have earning of $6259 or more with a probability of 0.6.
There are some secondary output which shows the number of times inventory exceeded sampled demand and the number of times we had inventory lower than sampled demand. As expected deficit count drops as we increase inventory level. The last parameter tells us how many times MCMC sampler accepted new samples.
The proposal distribution to generate the Markov chain has a strong influence on the quality of the MCMC sampler. Standard deviation of the normal distribution used as the proposal distribution, should be chosen carefully.
- If the standard deviation is large, the next sample is more likely to be in a low probability region and it’s likely to be rejected, which will result in slow progress
- On the other hand if the standard deviation is small, the sate space traversal behave more like random walk and too many samples may be required.. If the distribution is multi modal, it may not even visit all the modes.
The standard deviation is provided through a configuration parameter. You can can vary it and run some experiments to find the appropriate value.
One way to address this problem is to have a mixture of 2 proposal distribution one with a low standard deviation and one with a high standard deviation. Most of the time you will use the proposal distribution with low standard deviation.
Occasionally, the proposal distribution with high standard deviation will be used to encourage exploration of less explored parts of the state space.
Since with MCMC, the next sample depends on the current sample as determined by the Markov chain, samples may be correlated. Ideally samples should be independent reflecting the actual distribution we want to sample.
One way to reduce correlation is to make multiple jumps, instead of 1 jump, through the Markov chain before applying the acceptance criteria. We can also perform auto correlation analysis on the samples to find the extent of correlation in the samples.
Al though the benefits have been questionable, it’s a common practice to ignore the first n samples in the calculations, known as a burn in process. The initial state when simulation run starts is chosen randomly.
By discarding initial set of samples, we mitigate the impact of the initial random state. The number of burn in samples can be set as a configuration parameter.
For burn in size, any value in the range of 1000 – 5000 has been reported. To be conservative I used 5000.
We need to choose the burn in size and the sample size properly so that we can get convergence in whatever statistic we are calculating from the generate samples. So far, I glossed over the choice of these two parameters.
For burn in size, I used a value as per some heuristic suggested in literature. For total sample size I inspected the mean earning for various sample size and came up with sample size based on convergence of the mean value as detected manually.
In this section I will go through more rigorous convergence analysis to properly choose burn in size and total sample size. There are several available techniques. I will be using a convergence analysis technique called Geweke convergence. The algorithm does the following
- It takes the sample points excluding the burn in samples
- Id defines a widow in the beginning of the sample array from step 1 and and a window at the end of the sample array
- It calculates mean and variance of the 2 windows. Based on them it calculates a modified z score
A lower z score indicates better convergence. Here is some sample output for various values of sample size and burn in size
sample size 20000 burn in size 500 z score 3.130 sample size 20000 burn in size 1000 z score 0.329 sample size 20000 burn in size 1500 z score 3.517 sample size 20000 burn in size 2000 z score 1.041 sample size 20000 burn in size 2500 z score 0.413 sample size 20000 burn in size 3000 z score 7.861 sample size 25000 burn in size 500 z score -3.630 sample size 25000 burn in size 1000 z score -7.465 sample size 25000 burn in size 1500 z score -7.089 sample size 25000 burn in size 2000 z score -5.580 sample size 25000 burn in size 2500 z score -5.034 sample size 25000 burn in size 3000 z score -3.520 sample size 30000 burn in size 500 z score 3.677 sample size 30000 burn in size 1000 z score 6.780 sample size 30000 burn in size 1500 z score 2.056 sample size 30000 burn in size 2000 z score 0.122 sample size 30000 burn in size 2500 z score -1.823 sample size 30000 burn in size 3000 z score -0.796 sample size 35000 burn in size 500 z score 1.629 sample size 35000 burn in size 1000 z score -0.479 sample size 35000 burn in size 1500 z score -0.835 sample size 35000 burn in size 2000 z score 0.627 sample size 35000 burn in size 2500 z score 2.427 sample size 35000 burn in size 3000 z score 5.777
According to this analysis, we should use a sample size 30000 and a burn in size of 2000. We have used a sample size of 45000 and a burn in size of 2000.
Parallel Implementation with Spark
Sample generation and statistic calculation may need to be repeated many times.For our particular problem, it needs repeated for each (item, inventory size) combination.
As an example, if we have 1000 items and we try 10 inventory levels for each item, sample generation and statistic calculation will have to be repeated 10000 times. In Spark, we have to partition by (item, inventory size).
I have plans to do a spark implementation. To make it reusable across various application I have to design the Spark implementation in the following way.
- Provide all the distribution for source variables as input data or configuration
- Provided the source to target transformation function as externally provided custom logic.
- All the parameters needed for transformation should be provided as configuration parameter.
We have used MCMC sampling to calculate statistic about some variable when analytical solution for such statistic is simply intractable, for various reason such as non standard and non parametric arbitrary distribution and complex non linear function.
The Metropolis sampling algorithm is implemented as a Python class in sampler.py. It can be used for any other application with proper configuration of the sampler object.To run the inventory use case, you could follow the tutorial.
For commercial support for any solution in my github repositories, please talk to ThirdEye Data Science Services. Support is available for Hadoop or Spark deployment on cloud including installation, configuration and testing,