Conduct an exploratory data analysis and data preparation

Assignment Help Other Subject
Reference no: EM133727189

Business Intelligence

Assessment - Report

This assessment task sheet provides you with information about the requirements for your assessment. Detailed instructions and resources are included for completing the task. The Criterion Reference Assessment (CRA) Rubric that markers use to grade the assessment task is included.

Task overview

Learning outcome 1: Analyse and apply strategies and technologies for effective data management that supports evidence-based decisions.

Learning outcome 2: Research organisational and societal problems using descriptive, predictive and prescriptive analytics models drawing on both internal and external data sources to generate insight, create value and support evidence-based decision making.

Learning outcome 3: Communicate effectively in a clear and concise written manner for both senior and middle management with correct and appropriate acknowledgment of the main ideas presented and discussed.

Task description 1: You are required to prepare a report that undertakes three tasks to analyse data-driven decision-making and descriptive and predictive analytics. Task 1 examines Google's data-driven practices, focusing on meticulous data analysis for search algorithms and advertising strategies. Task 1.1 evaluates Google's data management strategies, while Task 1.2 explores the technologies used and assesses their overall contribution. Task 2 involves a detailed exploration of the Melbourne_housing.csv dataset, encompassing exploratory data analysis (Task 2.1) and building a Linear Regression model (Task 2.2) for predicting residential property prices. Finally, Task 3 focuses on predictive analytics, predicting the income levels (<= 50K or >50K) of a population. Task 3.1 involves exploratory data analysis and data preparation, and Task 3.2 constructs a Decision Tree model. Together, these tasks contribute to a comprehensive understanding of data analytics in diverse business scenarios.

Task details

Task 1 Case Study Analysis
In the business world, Google is a prime example of effective data-driven decision-making. The company relies heavily on data to enhance its search algorithms, advertising strategies, and user experience. Google's success in providing relevant search results, targeted advertisements, and personalized user experiences is attributed to its use of data. The company analyses vast amounts of data to understand user behaviour, preferences, and trends.

You are expected to read the following papers as a starting point to look for other relevant references to investigate data management strategies and technologies employed for data management at Google to support your investigation of Tasks 1.1 and 1.2:

Task 1.1 Investigate the data management strategies employed by Google and evaluate and discuss the effectiveness of these strategies in supporting evidence-based decisions. (10 marks 400 words)

Task 1.2 Investigate the technologies employed for data management at Google and assess and discuss their contribution to the overall effectiveness of data management. (10 marks 400 words)

Task 2 Exploratory Data Analysis and Linear Regression Analysis (40 Marks)

Carefully study Melbourne_housing.csv data set (See Appendix A Data Dictionary for Melbourne Housing Price Data Set) and accompanying description of each variable. Each record in the Melbourne_housing.csv data set contains twenty-one variables that determine Price (fifth variable). You should conduct some research to identify determinates/ drivers of the selling price of residential properties to fully understand and interpret the key findings of your exploratory data analysis (EDA) and Linear Regression Model for the Melbourne_housing.csv data set.
Task 2.1 Conduct and report on exploratory data analysis (EDA) of the Melbourne_housing.csv data set using Altair AI Studio data mining tool. (20 marks 800 words) (CLO2, CLO5)

You are required to provide the following:

a screen capture of your final EDA process, briefly describe your EDA process diagram.

summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for Melbourne_housing.csv. Table

2.1 should include key characteristics of each variable in Melbourne_housing.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.

Discuss the key results of the exploratory data analysis presented in Table 2.1 and provide a rationale for selecting the top 5 variables to predict the selling price of residential properties (Price). Focus on the relationships among independent variables, as well as their connections with the dependent variable (Price). Draw insights from the results of the EDA analysis and relevant literature on determinants affecting the selling price of residential properties.

Hint: Statistics Tab and Chart Tab in Altair AI Studio provide a lot of descriptive statistical information and the ability to create useful charts like Bar charts, Scatterplots, Boxplot charts etc. for EDA analysis. You might also like to look at running correlations and/or chi square tests as appropriate to determine which variables contribute most to predicting the selling price of residential properties (Price).

Task 2.2 Build and report on your Linear Regression model for predicting the selling price of residential properties (Price) using Altair AI Studio data mining process and appropriate set of data mining operators and a reduced set of variables from Melbourne_housing.csv data set. (20 marks 800 words) (CLO2, CLO5)

You are required to provide the following:

A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process diagram.

Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for Melbourne_housing.csv data set.

Discuss the results of Final Linear Regression Model for Melbourne_housing.csv data set drawing on key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc.) for predicting the selling price of residential properties (Price) and relevant supporting literature on interpretation of a Linear Regression Model. Include all appropriate outputs such as Altair AI Studio Processes, Graphs and Tables that support key aspects of exploratory data analysis and linear regression model analysis of the Melbourne_housing.csv data set in your Report.

Task 3 Predictive Analytics Case Study

The goal of the Predictive Analytics Case Study is to predict the income of a given population, which is labeled as <= 50K and >50K (refer to Appendix B Data Dictionary for Income Dataset). The study aims to identify the variables that are most likely to predict the income of the population. You will apply business understanding, data understanding, data preparation, modelling, and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Tasks 3.1 and 3.2.

Task 3.1 Conduct an exploratory data analysis (EDA) and data preparation of income.csv data set and summarise key findings of EDA and data preparation in a Table and discuss key findings. (800 words) (CLO2, CLO5).

You are required to summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the income.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc. and relationships with other variables, transformation of existing variables, creation of new variables in a table named Table 3.1 Results of Exploratory Data Analysis and Data Preparation. Hint: Statistics Tab and Chart Tab in Altair AI Studio provide a lot of descriptive statistical information and useful charts like Bar charts, Scatterplots required for Task 3.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 3.1 which variables contribute most to predicting the income of a given population, which is labelled as <= 50K and >50K. You could also consider transforming some variables and creating new variables and converting target/label variable into a binominal variable to facilitate analysis in Tasks 3.2. Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict the income of a given population, which is labelled as <= 50K and >50K.

Task 3.2 Build a Decision Tree model for predicting the income of a given population, which is labelled as <= 50K and >50K, on the income.csv data set using Altair AI Studio; provide following outputs: (1) Decision Tree process, (2) Decision Tree diagram, (3) Decision Tree rules; discuss key results of Decision Tree model drawing on these outputs. (800 words) (CLO2, CLO5)
You are required to briefly explain your final Decision Tree Model Process and discuss the results of the Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting the income of a given population, which is labelled as <= 50K and >50K based on key contributing variables and relevant supporting literature on interpretation of decision trees.

Reference no: EM133727189

Questions Cloud

How and why your answers may have changed : How and why your answers may have changed? How you think these changes affect your actions with regard to dealing with health risks?
Case worker for adult protective services : You are working as a case worker for adult protective services. You are asked to go investigate a potential case of self-neglect in a 78-year-old man.
Initial outpatient psychiatry evaluation : Patient is a, seen today for an initial outpatient psychiatry evaluation.He reports past medical history of HTN on losartin 50 mg QD.
Analyze interplay between socialization media and education : Analyze the interplay between socialization, media, education, religion, family structures, and other pertinent factors in this context.
Conduct an exploratory data analysis and data preparation : CIS6008 Business Intelligence T2, 2024, University of Southern Queensland - Conduct an exploratory data analysis (EDA) and data preparation of income.csv data
Outpatient mental health service states : A new client at an outpatient mental health service states, When I have to face new people or situations-any situations in public
What are acceptable confidence level for thing like your car : Nothing is for certain. What are acceptable confidence levels for things like your car starting or your paycheck showing up on time?
Write conclusion on relationship between religion and music : Write a conclusion about the relationship between religion and music, especially religion and these types of music jazz, Latin pop, hip-hop/rap, and country.
Comprehensive strategic plan : To address the challenges posed by current EHR systems, a comprehensive strategic plan is needed. To improve interoperability,

Reviews

len3727189

7/9/2024 11:35:37 PM

I need help with Business intelligence assignment I will send the files now give the similarity and AI reports with this

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd