Each day you own 0 or 1 stocks of certain commodity. The price of the stock is a stochastic process that can be modeled as a Markov chain with transition rates as follows
day n+1
100
200
day n
0.5
0.25
0.75
At the start of a day at which you own a stock you may choose to either sell at the current price, or keep the stock. At the start of a day at which you do not own stock, you may choose to either buy one stock at the current price or do nothing. You have initial capital of 200.
Your target is to maximize the discounted value of the profit over an infinite horizon, use discountfactor 0.8 (per day).
a) Define the states and give for each state the possible decisions.b) Formulate the optimality equations.c) Carry out two iterations of value iteration.d) Formulate the L.P.-model to solve this problem. Describe how you can obtain the optimal policy from the LP formulation.e) Choose a stationary policy and investigate using the policy iteration algorithm whether or not that policy is optimal.f) Give the number of stationary policies. Motivate your answer by using the definition of stationary policy.