A Learning but Greedy Gambler

In multi-armed bandit (MAB) problem, a gambler must decide which arm of K slot machines to pull in sequence of N rounds of pulls to maximize the overall return. Many real life optimization and decision making problems can be modelled

