Reference no: EM132310841
Assignment
Learning outcomes assessed:
1. Research and analyse the general nature of artificial intelligence and the problems it solves.
2. Objectively compare the strengths and limitations of various artificial intelligence techniques.
3. Apply and evaluate artificial intelligence techniques for solving a variety of real world problems.
Part 1: Decision Tree Learning
The dataset to be used for this section is: white-clover.arff
It describes a set of 63 measurements taken in farm paddocks from 1991-1994. They mainly consist of the % coverage of each plant in that paddock as well as where the measurements were taken. The dataset is a bit strange in that measurements from multiple years are contained in each data row. The class variable is the % of white clover found in 1994. This is supervised learning, so we are given the answer (the class variable) and can easily evaluate the success of our decision tree.
Your task is as follows:
A. Analysing the Data
Open the data file in a text editor and read the comments to understand the dataset.
Experiment with training several decision trees (start with J48 if unsure) with a variety of settings to see how well you can do at predicting the class variable on this dataset.
NB: Be careful of over-fitting. Think about how you might avoid overfitting the data.
B. Describing the Method
Choose TWO of the decision trees you experimented with.
Describe the method you used to analyse the data, making sure to include the following:
1. What are these decision trees attempting to learn?
2. What settings did you find worked well and why do you think this was? (for each tree)
3. What settings did you find did not work well and why do you think this was? (for each tree)
4. What did you do to avoid overfitting the data?
5. Include a screenshot of your settings pages so I can replicate you results.
Don’t forget to include your Test options!
C. Discussing the Results
You should have generated TWO sets of results in part A, one for each algorithm.
You should evaluate and discuss these results and should include at a minimum:
1. A screenshot of your results for each tree.
2. What do the results mean for each decision tree? What facts can you deduce from this result?
3. Which were the most important features in predicting the class variable? How did you decide this?
4. A comparison: list the main differences between each algorithm in a table.
D. Coming to a Conclusion
Finally, describe any conclusions you think you can draw from these results, including:
1. Which tree was better? How did you make this decision?
2. In your opinion, is the result of the best performing tree practically useful? Why?
3. Anything else interesting you found during your experimentation.
Part 2: Clustering
The dataset is available on Moodle: pasture.arff
This dataset contains a range of measurements from agricultural pastures. There are a wide variety of measurements, including number of earth worm species, fertiliser & rainfall.
The class variable is a simple Lo/Med/Hi rating for how productive that pasture was.
Your task is as follows:
A. Analysing the Data
Open the data file in a text editor and read the comments to understand the dataset.
Experiment with training several clustering algorithms (start with SimpleKMeans & EM) with a variety of settings to see how well you can do at predicting the class variable on this dataset.
You may want to start by working out how many clusters we are looking for. You can also experiment with ignoring various attributes to see what effect that has.
B. Describing the Method
Choose TWO of the clustering algorithms you experimented with.
Describe the method you used to analyse the data, making sure to include the following:
1. What are these algorithms attempting to learn?
2. What settings did you find worked well and why do you think this was? (for each)
3. What settings did you find did not work well and why do you think this was? (for each)
4. Include a screenshot of your settings pages so I can replicate you results.
Don’t forget to include your list of ignored Attributes. (if you did this)
C. Discussing the Results
You should have generated TWO sets of results in part A, one for each algorithm.
You should evaluate and discuss these results and should include at a minimum:
1. A screenshot of your results for each algorithm.
2. What do the results mean for each algorithm? What facts can you deduce from this result?
3. A comparison: list the main differences between each algorithm in a table.
D. Coming to a Conclusion
Finally, describe any conclusions you think you can draw from these results, including:
1. Which algorithm was better? How did you make this decision?
2. In your opinion, is the result of the best performing algorithm practically useful? Why?
3. Anything else interesting you found during your experimentation.
Part 3: Bayesian Learning
The dataset is available on Moodle: squash-stored.arff
This dataset contains a range of measurements taken from squash fruit during maturation, ripening & storage. This dataset has excellent descriptions of its attributes.
The class variable is a simple measure (3 possible values) of the quality of the fruit on arrival in Japan.
Your task is as follows:
A. Analysing the Data
Open the data file in a text editor and read the comments to understand the dataset.
Experiment with training several Bayes classifiers (start with Naïve Bayes) with a variety of settings to see how well you can do at predicting the class variable on this dataset.
NB: Be careful of over-fitting. Think about how you might avoid overfitting the data.
B. Describing the Method
Choose TWO of the Bayes classifiers you experimented with.
Describe the method you used to analyse the data, making sure to include the following:
1. What are these classifiers attempting to learn?
2. What settings did you find worked well and why do you think this was? (for each)
3. What settings did you find did not work well and why do you think this was? (for each)
4. What did you do to avoid overfitting the data?
5. Include a screenshot of your settings pages so I can replicate you results.
Don’t forget to include your Test options!
C. Discussing the Results
You should have generated TWO sets of results in part A, one for each algorithm.
You should evaluate and discuss these results and should include at a minimum:
1. A screenshot of your results for each algorithm.
2. What do the results mean for each algorithm? What facts can you deduce from this result?
3. A comparison: list the main differences between each algorithm in a table.
D. Coming to a Conclusion
Finally, describe any conclusions you think you can draw from these results, including:
1. Which algorithm was better? How did you make this decision?
2. In your opinion, is the result of the best performing algorithm practically useful? Why (not)?
3. Anything else interesting you found during your experimentation.
Part 4: Multi-Layer Perceptron
The datasets are the same as we have used in the previous examples and are available on Moodle:
quash-stored.arff
pasture.arff
white-clover.arff
Your task is as follows:
A. Analysing the Data
Experiment with training a Multi-Layer Perceptron (there is only one) with a variety of settings to see how well you can do at predicting the class variable on each dataset. Make sure to keep track of the best settings for each dataset.
NB: Be careful of over-fitting. Think about how you might avoid overfitting the data.
B. Describing the Method
Describe the method you used to analyse the data, making sure to include the following:
1. What are the perceptrons attempting to learn?
2. What settings did you find worked well and why do you think this was?
3. What settings did you find did not work well and why do you think this was?
4. What did you do to avoid overfitting the data?
5. Include a screenshot of your settings pages so I can replicate you results. (for each dataset.)
Don’t forget to include your Test options!
C. Discussing the Results
You should have generated THREE sets of results in part A, one for each data set.
You should evaluate and discuss these results and should include at a minimum:
1. A screenshot of your results for each dataset.
2. What do the results mean for each dataset? What facts can you deduce from this result?
3. Compare (in a table format) each of the three perceptron results with the best result obtained for that dataset in Parts A, B & C.
D. Coming to a Conclusion
Finally, describe any conclusions you think you can draw from these results, including:
1. In your opinion, is the perceptron result for each of the datasets practically useful? Why (not)?
2. Which algorithm was better for each of the three datasets? How did you make this decision?
3. Anything else interesting you found during your experimentation.
Part 5: Document Presentation
Your report should be presented professionally. This means:
Using an appropriate cover page listing your student details.
Having a table of contents page.
All images are appropriately captioned and resized without warping the original image.
Tables are used where appropriate and cleanly and clearly laid out.
Consistent use of styles throughout the entire document. (hint: use Word’s built in styles)
APA referencing used where appropriate including in-text references.
Appropriate language use.