Reference no: EM133764046 , Length: word count:3000
Machine Learning in Business
Assessment Task - Analytical Report and Business Report
Description
Purpose
There are two parts in this assignment.
Part A provides you with opportunities to learn a range of machine learning methods and Python skills (GLO1 & ULO1) and apply your digital literacy to research and develop a machine learning solution (GLO3, GLO5, and ULO2). By completing this task, you will gain knowledge and skills in selecting and applying one or more appropriate machine learning algorithm(s) to develop and evaluate a machine learning solution and interpret the outcomes.
In Part B, you will report your application of machine learning and make recommendations to the business and management audience. By completing this task, you will gain ability to explain and justify machine learning options and discuss their pros and cons to the business audience.
Project Overview
You're engaged to work with Data2Intel, an Australian learning analytics consulting service that specializes in delivering various data analytics services within the primary education sector. The current project aims to predict primary school students who are at risk of underperforming in writing. This initiative is part of a broader effort by a consortium of forty primary schools to enhance educational outcomes and provide targeted support to students in need.
Context/Scenario
Australian school education comprises K-12, a system akin to those in many other countries. Primary schools encompass scholastic years K-6, while secondary schools cover years 7-12. In Australia, most Kindergarten students must be at least 5 years old by January of the calendar year they commence schooling. The early years of schooling, typically K-2 (Kindergarten to Year 2), encompass children aged 5-7 years.
The National Assessment Program - Literacy and Numeracy (NAPLAN) tests are annual assessments for students in Years 3, 5, 7, and 9 to provide a key benchmark for assessing students' foundational literacy and numeracy skills. Research, although more limited in scope compared to secondary and tertiary education, underscores the profound importance and influence of these early years on future academic performance, employability, wellbeing, and career progression. The foundational skills acquired during these formative years set the stage for long-term educational and professional success.
You are provided with a dataset of 2,000 students across over forty schools. The dataset focuses on their reading and numeracy skills during the early years of Year 1 and Year 2. These skills were measured through localised, formative assessments, which, while validated and consistent, are not solely "pen and paper" tests.
Given the young age of the students, these assessments often include dialogue or interview-based evaluations administered by trained teachers. In the dataset, students at risk of underperforming in writing, Year3_Writing_At_Risk, were determined by their NAPLAN results in Year 3. In addition, the dataset also provides students' demographic and family backgrounds as well as disability conditions. Further details can be found in the supplied data description.
This comprehensive dataset spans five continuous years, from 2016 to 2020, and has been curated for learning support and research purposes. Importantly, there is no missing data, as records with incomplete information have been removed for this exercise. However, you are still required to check data quality and preprocess the data as needed.
Specific Requirements
You are tasked with performing two analytical tasks: 1/ uncovering data insights and exploring machine learning opportunities, and 2/ reporting the findings to Data2Intel.
Regarding the first task, you are required to respond to the following enquiries:
What are the SES backgrounds of the students in the dataset in Year 1 and Year 2? FYI, in 2018 the Catholic system schools have a national average SES of 100 while independent schools have an average SES of 102, according to the Australian Department of Education and Training.
What are students' reading skills, for example, Burt Reading Scores, at the start and end of Year 1 and at the start and end of Year 2?
What are students' writing skills at the start of Year 1, WritingVocab-01-SOY? Is there a relationship between this and Year3_Writing_At_Risk?
Are students' literacy skills and numeracy skills related? And are there relationships between these and their Year3_Writing_At_Risk?
Describe the students' disability conditions in the dataset. And are there relationships between these conditions and their Year3_Writing_At_Risk?
Are there other insights that might inform early interventions to improve students' writing skills?
Regarding the second task - machine learning opportunities, you're requested to develop two predictive models to identify students at risk of underperforming in writing in Year 3 (Year3_Writing_At_Risk) and one clustering model to explore possible clusters of students.
Based on your findings from both tasks, you are required to provide actionable insights and recommendations to primary schools and educators to implement targeted interventions and support mechanisms. In addition, you should advise the Client of the potential ethical and legal implications of the models.
You are required to deliver two (2) reports to the Client, Data2Intel learning analytics:
The first analytics report (Part A) should present your analysis and findings to Dr. Alok Sinha, Data2Intel Director of Data and Insights. This report should detail your approach to exploring the dataset, the machine learning techniques used, and your findings. Your findings should be supported by relevant visualizations and statistical analysis. This report should also develop and compare two predictive machine learning models and one clustering model, and recommend a predictive machine learning model, inform model deployment, and recommend future engagements with the client. See further details in the Specific Requirements section below.
The second consultancy report (Part B) should be developed for Sally Tran, Data2Intel Director of Education and Engagement - a fictitious character. The report should include your response to the client's six (6) questions, the proposed machine learning models, and recommendations for use. You should also discuss the limitations of your approach and any potential areas for future improvements.
Deliverable Requirements
You are required to:
Develop your business and data understandings using BACCM.
Prepare and explore the provided dataset, cleanse and pre-process data as needed. Undertake an exploratory data analysis (EDA) to respond to the client's six questions.
Undertake supervised machine learning model development, evaluation, and selection. Two predictive models should be developed, tested, and compared.
Undertake unsupervised machine learning using clustering analytics.
Develop two reports:
The first technical report (Part A) should present your EDA (Exploratory Data Analysis) and machine learning findings to Dr Alok Sinha.
The second consultancy report (Part B), for Sally Tran, Director of Education and Engagement, should present responses to the six specific requests about data, insight from clustering analytics, and a predictive machine learning model.
Format and present your report professionally. Two sample report templates are provided under Assessment Resources.
Correctly use the APA7 style of referencing
Part A. Case Study Report
Part A.1 Machine Learning Solution
A cover page (not included in the word count) that includes:
Report Title
Unit code and name
Student name and student ID
A table of contents (not included in the word count)
An executive summary of max. 200 words is required (included in the word count).
The report should include:
Introduction:
Objective: the business problem to be addressed in its business context, and the value proposition of the project.
Approach:
Overview of the machine learning approach, including machine learning types and problem(s), and prediction target(s).
Data preparation and Exploratory Data Analysis (EDA):
Data sources, data size, types, quality, cleansing and pre-processing, and any observations.
EDA: statistical analysis and visualisation.
Key insights gained from EDA to inform feature selection and data splitting.
Model development and evaluation:
Supervised Machine Learning:
Two predictive models and performance metrics.
Model comparison based on your selection criteria.
Unsupervised Machine Learning:
Clustering analytics results and justification of the number of clusters.
Solution recommendation:
Interpretation and discussion of results obtained from the validation and comparison.
Solution recommendation - what model is to offer to the client.
Future engagements with the client.
Technical recommendations:
Summary of the development and testing environment, such as software libraries, the programming language and computing environment used.
To inform model deployment, provide your machine process diagram and data pre- processing.
Suggestions for maintenance of accuracy and relevance over time (based on your research).
References (not included in the word count)
Optional appendices (not included in the word count - not subject to assessment), such as additional technical details, supplementary figures and tables.
Part A.2 Files
A python notebook with detailed comments to guide the deployment team, AND
A PDF version of the Python notebook.
Part B. Business report
A cover page (not included in the word count) that includes:
Report Title
Unit code and name
Student name and student ID
A table of contents (not included in the word count)
An executive summary of max. 100 words is required (included in the word count).
The report should include:
Introduction:
Business understanding of the project using the Business Analysis Core Concept Model (BACCM) framework1 .
Insights from Exploratory Data Analysis (EDA):
Answers to the Client's six (6) questions.
Additional insights, such as comments on data quality or observations beyond the client's six questions and possible insights gained from clustering analytics.
Proposed machine learning solution:
The selected machine learning model.
Interpretation of its performance and discussion of pros and cons.
Recommendations and conclusions:
Recommendations of business applications.
Potential benefits to stakeholders and how they relate to the value proposition.
Implications such as changes to business processes and decision making and possible impacts.
Recommendations for further improvements.
References (not included in the word count)
Optional appendices (not included in the word count - not subject to assessment), such as supplementary figures and tables.