Reference no: EM131420096
Consider the following maze environment. Write a Java program to solve the following questions.
The transition model is as follows: the intended outcome occurs with probability 0.8, and with probability 0.1 the agent moves at either right angle to the intended direction. If the move would make the agent walk into a wall, the agent stays in the same place as before. The rewards for the white squares are -0.04, for the green squares are +1, and for the brown squares are -1. Note that there are no terminal states; the agent's state sequence is infinite.
Part 1: Assuming the known transition model and reward function listed above, find the optimal policy and the utilities of all the (non-wall) states using both value iteration and policy iteration. Display the optimal policy and the utilities of all the states, and plot utility estimates as a function of the number of iterations as in Figure 17.5(a) in the above reference book (for value iteration, you should need no more than 50 iterations to get convergence). In this question, use a discount factor of 0.99. Below are some reference utility values (computed with a different discount factor) to help you get an idea if the trend of your answers is correct.
Part 2: Design a more complicated maze environment of your own and re-run the algorithms designed for Part 1 on it. How does the number of states and the complexity of the environment affect convergence? How complex can you make the environment and still be able to learn the right policy?
Using method of value iteration for Part 1
- Descriptions of implemented solutions
- Plot of optimal policy
- Utilities of all states
- Plot of utility estimates as a function of the number of iterations
Using method of policy iteration for Part 1
- Descriptions of implemented solutions
- Plot of optimal policy
- Utilities of all states
- Plot of utility estimates as a function of the number of iterations
Source code for Part 1.
Part 2 bonus questions
- Answers of the questions in the report
- Source code
Attachment:- Assignment Files.rar
Design an experiment to test the claims of astrology
: Then give each client a copy of both readings (true and reverse) and see if he or she can distinguish the true one.
|
Will a judgment rendered by california court be enforceable
: He effected service on both of the defendants under the provisions of the California "long-arm" statute. Assuming neither defendant appears in the action, will a judgment rendered by the California Court be enforceable against either of them?
|
Describe how you plan to obtain letters of recommendation
: HA499:As you begin to apply for positions or look for advancement in the position you have,describe how you plan to obtain letters of recommendation and from whom you wish to get them.Who is an appropriate person to ask for a letter of recommendation..
|
Find the tension in rope required to lower the block
: the block is supported by a vertical wall as shown below.the coefficient of friction btwn the wall and block is µ 0.3 and that btwn the wedge and horizontal surface is 0.25.find the tension in rope required to lower the block
|
Write a java program to solve the questions
: Consider the following maze environment. Write a Java program to solve the following questions. Assuming the known transition model and reward function listed above, find the optimal policy and the utilities of all the (non-wall) states using both ..
|
Describe the motivator-hygiene model
: Describe the motivator-hygiene model. What is the motivators factors and what are some hygiene factors? How can you apply this model in workplace? List examples? What are ways to improve motivation in the workplace?
|
People living in different regions on the world
: How did trade, commerce, and exploitation from the 15th century onwards alter the political, economic, social, and cultural landscapes across the globe? Did the coming together of the hemispheres have a mutually beneficial impact on people living ..
|
Explain the leaders role in employee engagement
: Select only one of the outcomes listed below, which will become the focal point of your Discussion Board (DB) response for this week. In your DB response, compare your organization or one that you are familiar with, to another organization based o..
|
Determine what elements of the production
: Determine what elements of the production and delivery of the product or service would be subject to benchmarking and describe how you would identify those organizations to which comparisons could be made in a benchmarking process.
|