For Prob. 21.5-4, use three iterations of the method of successive approximations to approximate an optimal policy
Prob. 21.5-4
The price of a certain stock is fluctuating between $10, $20, and $30 from month to month. Market analysts have predicted that if the stock is at $10 during any month, it will be at $10 or $20 the next month, with probabilities 4/5 and 1/5 respectively; if the stock is at $20, it will be at $10, $20, or $30 the next month, with probabilities ¼, ¼ and ½ respectively; and if the stock is at $30, it will be at $20 or $30 the next month, with probabilities ¾ and ¼ respectively. Given a discount factor of 0.9, use the policy improve ment algorithm to determine when to sell and when to hold the stock to maximize the expected total discounted profit.
