Reference no: EM133997175
Introduction to Artificial Intelligence
Objectives
This assignment is designed to reinforce the knowledge and skills acquired in Week 1 to Week 4. It is an individual assessment to be submitted on Friday of Week 5. The assessment task relates to Unit Learning Outcomes 1 and 2 and must be done and submitted individually.
Overview
This assignment requires you to analyse a dataset of farms across Australian agricultural regions. It contains variables that are related to environmental conditions, soil properties, farm characteristics, and crop yields (i.e., productivity). As with many real-world datasets, the data may contain missing values, inconsistent entries, or input errors that require cleaning before machine learning (ML) can be applied. In addition, some ML algorithms may benefit from feature scaling when numerical variables have very different ranges.
The dataset variables are:
Region: Australian state where the farm is located (e.g., QLD, NSW, VIC).
Crop_Type: Type of crop grown on the farm (e.g., Wheat, Barley, Canola).
Soil_Type: General soil classification of the farm (e.g., Loam, Sandy, Clay).
Rainfall_mm: Average rainfall during the growing season.
Temperature_C: Average temperature in degree Celsius during the growing season.
Soil_Nitrogen: Nitrogen content in the soil (numerical).
Soil_Phosphorus: Phosphorus content in the soil (numerical).
Soil_pH: Soil acidity or alkalinity level (numerical).
Fertiliser_kg_ha: Amount of fertiliser applied per hectare.
Pest_Pressure: Index representing the severity of pest activity (numerical).
NDVI: Vegetation index derived from satellite imagery indicating plant health (numerical).
Farm_Size_ha: Size of the farm in hectares.
Irrigation_mm: Amount of irrigation water applied during the season.
Distance_to_Silo_km: Distance from the farm to the nearest grain storage facility.
Total_Production_tonnes: Estimated total crop production in tonnes for the farm.
Yield_t_ha: Actual crop yield in tonnes per hectare.
Yield_Category: Categorical label of yield level (Low, Medium, High).
The dataset is saved in the file ‘australian_crop_yield.csv', in the same folder as this assignment document on Moodle.
Tasks
You will pre-process the dataset to handle missing values, encode categorical variables, and perform feature scaling where necessary. Using the cleaned dataset, you will apply appropriate machine learning algorithms to accomplish the following three (3) objectives:
- Predict the yield category of a farm (Low, Medium, or High) based on environmental conditions, soil properties, and farm management practices.
- Predict the actual crop yield of a farm (in tonnes per hectare) using environmental, soil, and farm management variables.
- Group farms into meaningful clusters based on their environmental conditions, soil characteristics, and farm management practices. What insights can be obtained from these clusters?
Note that there could be variables which may not be appropriate to include when building predictive models. You should carefully consider which variables are suitable for modelling and justify your feature selection decisions in the report.
To complete this assignment, the following stages should be followed:
Data Exploration: Load the dataset and perform some basic exploratory data analysis, such as computing summary statistics, visualising the distributions of the variables, and checking for missing values and outliers. Describe your findings and observations in the report.
Data Pre-processing: Cleanse the dataset for machine learning, such as handling missing values and removing outliers, encoding categorical variables, scaling numeric variables, and applying other necessary data transformations. Describe the steps taken in the report.
Model Selection: Select a suitable machine learning algorithm to learn the model to accomplish each of the three (3) objectives. Justify your choices in the report.
Model Training: For each objective, train the selected model on a training set, using suitable values for the hyperparameters and evaluation metrics. Report the training and validation scores of each model and discuss their performances in the report.
Model Evaluation: Evaluate each trained model on a test set. Compare the test scores with the training and validation scores and analyse your results in terms of overfitting, underfitting, and model generalisability. No AI shortcuts — Only authentic assignment help from real expert tutors.
Model Interpretation: Interpret the results of each model and explain your solution to each of the three (3) objectives in the report.
Assessment Report: Document your work (max. 1,000 words) by explaining:
Data exploration and pre-processing
Model selection and training
Model evaluation and interpretation
Solutions for the three (3) objectives
Conclusion
Reflection: Reflect on your experience in completing this assignment (max. 500 words):
How has the assignment helped reinforce your understanding of machine learning principles and techniques?
Did you encounter any issues when attempting the assignment and how did you overcome them?
What lessons have you learnt?