Discuss the results of the final linear regression model

Assignment Help Other Subject
Reference no: EM132281271

Assignment relates to the course objectives 1, 2 and 4:

1. demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes

2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems

4. demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.

Assignment consists of three main tasks and a number of sub tasks

Task 1 Data Lakes and Data Warehousing

Conduct a desktop research by critically reviewing relevant literature on data lakes in context of a more traditional approach to data management including data warehousing. Drawing on the relevant and current literature, write a short essay on data lakes that addresses two sub tasks:

Task 1.1) Provide a concise definition of a data lake in context of an organisational approach to data management (about 250 words) and

Task 1.2) Explain what are two advantages of deploying a data lake and what are two disadvantages of deploying a data lake as part of an organisational data management strategy (about 500 words)

Task 2 Exploratory Data Analysis and Linear Regression Analysis (Worth 35 Marks) Carefully study the Data Dictionary for Salary Data Set (See Table 1) and accompanying description of each variable in the salary.csv data set. It is important you understand this data set as it is used for Task 2 and Task 3 in Assignment 2.

Note: You should conduct some desktop research on determinates/drivers of a person's salary level in order to fully understand and interpret the key findings of the exploratory data analysis (EDA) and Linear Regression Models for the salary.csv data set for Task 2 and visual presentation of the salary.csv data set in Task 3.

Table 1 Data Dictionary for the salary.csv Data Set

Variable

Description - Note NA denotes a missing value for a variable

Unit

salary

Weekly Earnings (dollars)

Integer

hours

Average Hours Worked Per Week

Integer

IQ

IQ Score

Integer

kww

Knowledge of World of Work Score

integer

education

Number of Years of Education

Integer

wexperience

Years of Work Experience

Integer

tenure

Number of Years with Employer

Integer

age

Age in Years

Integer

married

= 1 If Married

Integer

black

= 1 If Black

Integer

south

= 1 If Live in South

Integer

urban

= 1 If Live in a Standard Metropolitan Statistical Area

Integer

sibs

Number of siblings

Integer

birthorder

Birth Order

Integer

meducation

Mother's Education (Years)

Integer

feducation

Father's Education (Years)

Integer

Task 2.1) Conduct an exploratory data analysis (EDA) of the salary.csv data set using the RapidMiner Studio data mining tool. Note this will require use of a number of RapidMiner operators

Provide the following for Task 2.1:
(i) a screen capture of your final EDA process, briefly describe your EDA process
(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for salary.csv.
(iii) Discuss the key results of exploratory data analysis presented in Table 2.1 and provide a rationale for selecting top 5 variables for predicting salary of a person and in particular their relationship with dependent/target variable salary drawing on the results of EDA analysis and relevant literature (About 300 words).

Table 2.1 should include the key characteristics of each variable in the salary.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.

Hint: The Statistics Tab and the Chart Tab in RapidMiner Studio provide a lot of descriptive statistical information and the ability to create useful charts like Barcharts, Scatterplots etc for the EDA analysis. You might also like to look at running some correlations and/or chi square tests as appropriate for the salary.csv data set to determine which variables contribute most to predicting house values.

Task 2.2) Build a Linear Regression model for predicting salary of a person using a RapidMiner data mining process and an appropriate set of data mining operators and a reduced set of variables from the salary.csv data set as determined by your exploratory data analysis in Task 2.1. Provide the following for Task 2.2:

(i) A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process

(ii) A table named Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for salary.csv data set.

(iii) Discuss the results of the Final Linear Regression Model for salary.csv data set drawing on the key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc) for predicting salary and relevant supporting literature on the interpretation of a Linear Regression Model (About 300 words).

Include all appropriate outputs such as RapidMiner Processes, Graphs and Tables that support key aspects of exploratory data analysis and linear regression model analysis of the salary.csv data set in your Assignment 2 report.

Note you need export Processes and Graphs from RapidMiner using File/Print/Export Image option and include in Task 2 section where relevant or in Appendix 2 of Assignment 2 report.

Task 3 Tableau Desktop View of salary data set

Task 3.1) Create a Tableau Text Table or Graph view that displays salary values, average hours worked per week and other relevant data using the data set salary.csv. Comment on the (1) process of preparing a Text Table or Graph view using Tableau Desktop and (2) key trends and patterns that are apparent (about 75 words).

Task 3.2) Create a Tableau Text Table or Graph view that displays salary values, education level and other relevant data using salary.csv. Comment on the (1) process of preparing a Text Table or Graph view using Tableau Desktop and (2) key trends and patterns that are apparent (about 75 words).

Note: you need copy the two Text Table / Graph views you have created in Tableau using the Worksheet Menu Copy or Export Image option and include in the Task 3 section where relevant or in Appendix 3 of Assignment 2 report.

Report presentation, writing style and referencing

Your Assignment 2 must be presented in report format, written in an appropriate style and supported where required with appropriate in text references using Harvard Referencing Style

Attachment:- Written Practical Report.rar

Verified Expert

Data Lake is much better than the data warehouse because of not needing specialized software and hardware for data storage. The data lake can collect the data from any source and in any format along with the collection of the data whether it is structured or unstructured. Salary has a significant impact of Age, Education, IQ level, urban place, and Knowledge because of the positive relationship between the response variable and the predictor variables.

Reference no: EM132281271

Questions Cloud

What are benefits and barriers of standardization : What are benefits and barriers of standardization? Do the benefits outweigh the barriers? How does standardized terminology play a role in data integration
What is the value of the stock if investors require : Dow Chemical expects earnings per share to be $6 this year, it currently has a plowback ratio of 60% and return on equity is expected to be 20%.
What is the monthly payment for mortgage : a. What is the monthly payment for your mortgage if the annual interest rate is 3.576%
What is the big idea and creative direction of the campaign : Apple: Get me a Mac. Big idea, IMC and creative strategy. What is the big idea and creative direction of the campaign, advertising appeal, tone of voice
Discuss the results of the final linear regression model : Explain what are two advantages of deploying a data lake and what are two disadvantages of deploying a data lake as part of an organisational data management
When does the firm need to sell its equity : When does the firm need to sell its equity? (Related to valuation, such as DCF or Relative multiples)
How do global operations attract new markets : How do global operations attract new markets? State two examples of cultural and ethical issues that face operations managers in a global environment.
Interest on a home mortgage is tax deductible : Explain why interest paid in the early years of a home mortgage is more helpful in reducing taxes than interest paid in later years.
Constructing games master databases : ICT709 ICT Industry Project Assignment - Constructing games master databases, University of the Sunshine Coast, Australia

Reviews

len2281271

4/12/2019 2:25:08 AM

Your assignment 2 report must be structured in report format as follows: Cover/Title Page for Assignment 2 Table of Contents Body of report – Task 1 main heading with appropriate sub headings Task 1.1, Task 1.2 etc.. Task 2 … Task 3…. List of References List of Appendices You must submit two files for Assignment 2: 1. Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with extension .docx 2. Tableau packaged workbook with extension .twbx contains required two Text Table / Graph views for Task 3

Write a Review

 

Other Subject Questions & Answers

  Diastolic reading of blood pressure

Describe a numerical pattern that you observe based on the table. Please note maximum, minimum and middle values and Write a reflective essay to summarize your work.

  Is this defamation-is opinion protected by first amendment

Two disc jockeys at WPYX-FM radio in Albany, New York, were sued for intentional infliction of emotional distress by Annette Esposito-Hilder, who was identified on the air by the two disc jockeys as the “winner” of the “ugliest bride” contest.

  Discuss the time of arrest through appeal

What are the potential hearings that might be involved in this case. Be sure to describe them from the time of arrest through appeal

  Describe case management models

Describe case management models applied within the case manager's role as a human service worker and describe his or her role in linking clients to community resources.

  How is the methodological approach of existentialism

essay questions1.what does kierkegaard mean by a leap of faith? would such a thing be easy to do?2.how is the

  Critical review of the social worker competence

Post a critical review of the social worker's competence in terms of the NASW Code of Ethics. Identify at least two attributes or behaviors displayed by the social worker and explain how they upheld or contradicted the NASW Code of Ethics

  What mineral on the mohs scale has the same hardness

A mineral can scratch glass, but leaves behind a streak on the streak plate. What is the mineral's hardness? (Hint 1: harness should be a whole number

  Analyze strategic planning models

Also, analyze strategic planning models and apply one of the models to your public organization (Customs and Border Protection), explaining why you have chosen that specific model.

  What are the sources of strength for an analytics competitor

How can a company become and prosper as an analytics competitor? What are the sources of strength for an analytics competitor? How has this article influenced your views about quantitative business modeling and its utility in business decision making

  Provide financial rationale for u.s. health care policies

Provide a financial rationale for the current U.S. health care policies. Discuss your position in which you highlight, at a minimum, economic and ethical considerations of the ACA as it has been implemented from 2010 to present day.

  Explain the commonalities found across all five

Assess the types of personality measurements and research designs used in in the peer-reviewed articles you researched. Briefly describe the main theoretical.

  Analyze the concept of patients and concerns of physicians

Analyze the concept of patients' rights and the concerns of physicians and nurses, as they apply to patients facing end-of-life decisions.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd