Reference no: EM132281271 
                                                                               
                                       
Assignment relates to the course objectives 1, 2 and 4:
1.  demonstrate applied knowledge of people, markets, finances, technology  and management in a global context of business intelligence practice  (data warehouse design, data mining process, data visualisation and  performance management) and resulting organisational change and how  these apply to implementation of business intelligence in organisation  systems and business processes
2.  identify and solve complex organisational problems creatively and  practically through the use of business intelligence and critically  reflect on how evidence based decision making and sustainable business  performance management can effectively address real world problems
4.  demonstrate the ability to communicate effectively in a clear and  concise manner in written report style for senior management with  correct and appropriate acknowledgment of main ideas presented and  discussed.
Assignment consists of three main tasks and a number of sub tasks
Task 1 Data Lakes and Data Warehousing
Conduct  a desktop research by critically reviewing relevant literature on data  lakes in context of a more traditional approach to data management  including data warehousing. Drawing on the relevant and current  literature, write a short essay on data lakes that addresses two sub  tasks:
Task 1.1) Provide a concise definition of a data lake in context of an organisational approach to data management (about 250 words) and
Task 1.2) Explain what are two advantages of deploying a data lake and what are  two disadvantages of deploying a data lake as part of an organisational  data management strategy (about 500 words)
Task 2 Exploratory  Data Analysis and Linear Regression Analysis (Worth 35 Marks) Carefully  study the Data Dictionary for Salary Data Set (See Table 1) and  accompanying description of each variable in the salary.csv data set. It  is important you understand this data set as it is used for Task 2 and  Task 3 in Assignment 2.
Note:  You should conduct some desktop research on determinates/drivers of a  person's salary level in order to fully understand and interpret the key  findings of the exploratory data analysis (EDA) and Linear Regression  Models for the salary.csv data set for Task 2 and visual presentation of  the salary.csv data set in Task 3.
Table 1 Data Dictionary for the salary.csv Data Set
| Variable | Description - Note   NA denotes a missing value for a variable | Unit | 
| salary | Weekly Earnings (dollars) | Integer | 
| hours | Average Hours Worked Per   Week | Integer | 
| IQ | IQ Score | Integer | 
| kww | Knowledge of World of Work   Score | integer | 
| education | Number of Years of   Education | Integer | 
| wexperience | Years of Work Experience | Integer | 
| tenure | Number of Years with   Employer | Integer | 
| age | Age in Years | Integer | 
| married | = 1 If Married | Integer | 
| black | = 1 If Black | Integer | 
| south | = 1 If Live in South | Integer | 
| urban | = 1 If Live in a Standard   Metropolitan Statistical Area | Integer | 
| sibs | Number of siblings | Integer | 
| birthorder | Birth Order | Integer | 
| meducation | Mother's Education   (Years) | Integer | 
| feducation | Father's Education   (Years) | Integer | 
Task 2.1) Conduct  an exploratory data analysis (EDA) of the salary.csv data set using the  RapidMiner Studio data mining tool. Note this will require use of a  number of RapidMiner operators
Provide the following for Task 2.1:
(i) a screen capture of your final EDA process, briefly describe your EDA process
(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for salary.csv.
(iii)  Discuss the key results of exploratory data analysis presented in Table  2.1 and provide a rationale for selecting top 5 variables for  predicting salary of a person and in particular their relationship with  dependent/target variable salary drawing on the results of EDA analysis  and relevant literature (About 300 words).
Table  2.1 should include the key characteristics of each variable in the  salary.csv data set such as maximum, minimum values, average, standard  deviation, most frequent values (mode), missing values and invalid  values etc.
Hint:  The Statistics Tab and the Chart Tab in RapidMiner Studio provide a lot  of descriptive statistical information and the ability to create useful  charts like Barcharts, Scatterplots etc for the EDA analysis. You might  also like to look at running some correlations and/or chi square tests  as appropriate for the salary.csv data set to determine which variables  contribute most to predicting house values.
Task 2.2) Build a Linear Regression model for predicting salary of a person using  a RapidMiner data mining process and an appropriate set of data mining  operators and a reduced set of variables from the salary.csv data set as  determined by your exploratory data analysis in Task 2.1. Provide the  following for Task 2.2:
(i) A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process
(ii) A table named Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for salary.csv data set.
(iii)  Discuss the results of the Final Linear Regression Model for salary.csv  data set drawing on the key outputs (coefficients, standardised  coefficients, t-statistics values, p-values and significance levels etc)  for predicting salary and relevant supporting literature on the  interpretation of a Linear Regression Model (About 300 words).
Include  all appropriate outputs such as RapidMiner Processes, Graphs and Tables  that support key aspects of exploratory data analysis and linear  regression model analysis of the salary.csv data set in your Assignment 2  report.
Note  you need export Processes and Graphs from RapidMiner using  File/Print/Export Image option and include in Task 2 section where  relevant or in Appendix 2 of Assignment 2 report.
Task 3 Tableau Desktop View of salary data set
Task 3.1) Create  a Tableau Text Table or Graph view that displays salary values, average  hours worked per week and other relevant data using the data set  salary.csv. Comment on the (1) process of preparing a Text Table or  Graph view using Tableau Desktop and (2) key trends and patterns that  are apparent (about 75 words).
Task 3.2) Create a Tableau Text Table or Graph view that displays salary values,  education level and other relevant data using salary.csv. Comment on the  (1) process of preparing a Text Table or Graph view using Tableau  Desktop and (2) key trends and patterns that are apparent (about 75  words).
Note:  you need copy the two Text Table / Graph views you have created in  Tableau using the Worksheet Menu Copy or Export Image option and include  in the Task 3 section where relevant or in Appendix 3 of Assignment 2  report.
Report presentation, writing style and referencing
Your  Assignment 2 must be presented in report format, written in an  appropriate style and supported where required with appropriate in text  references using Harvard Referencing Style
Attachment:- Written Practical Report.rar