Reference no: EM132137359 
                                                                               
                                       
You are expected to submit professionally presented word-processed assessment documents.
This includes:
- A title page showing: ID number/s, name/s, lecturers' name/s, and assessment title.
- Correct spelling and appropriate use of grammar.
- Pages numbered including a contents page.
- Stapled or bound (no paper clips/plastic folders or plastic sleeves).
- Questions correctly labelled and numbered with clear and consistent headings
- For main text use Time New Roman / Arial / Calibri front type and 12pt font size.
- Line spacing no less than 1.5 and no greater than double.
-  A complete reference list should be included at the back of the  assessment using Harvard AGPS style. of referencing with in-text  citation.
Learning Objectives : Applicable course objective:
-  LO-1: Demonstrate applied knowledge of people, markets, finances,  technology and management in a global context of business intelligence  and understand the necessity of data driven decision-making.
- LO-2:  Understand the resulting organisational change for business intelligence  practice (data warehouse design, data mining process, data  visualisation and performance management) and how these apply to  ......business processes.
- LO-3: Identify and solve complex organisational problems creatively and practically through ......problems.
-  LO-4: Comprehend the changing organisational culture and address  complex ethical dilemmas that arise from evidence based decision making  and business performance management.
- LO-5: Demonstrate the ability  to communicate effectively in a clear and concise manner in written  report style for senior management.
Task 1: Exploratory Data Analysis, model building and cross validation through RapidMiner
The  objective of Task 1 is to predict the probability of rainfall for  tomorrow (next day) based on today's weather conditions. In Task 1, you  are required to use the data mining tool RapidMiner to analyse and  report on the weather2008-17.csv data set provided for Assignment 2. You  should review the data dictionary for weather2008-17.csv data set (see  Table 1.1 below).
The  weather dataset contains 138,307 daily observations from January 2008  through to January 2017 from 49 weather stations. All observations were  drawn from these 49 weather stations. In completing Task-1 of  Assignment-2 you will need to apply the business understanding, data  understanding, data preparation, modelling and evaluation phases of the  CRISP DM data mining process.
Task 1.1 Conduct an exploratory data analysis of the weather2008-17.csv data set  using RapidMiner to understand the characteristics of each variable and  the relationship of each variable to the other variables in the data  set. Summarise the findings of your exploratory data analysis in terms  of describing key characteristics of each of the variables in the  weather2008-17.csv data set such as maximum, minimum values, average,  standard deviation, most frequent values (mode), missing values, invalid  or inconsistent values and others (if it is appropriate for this data  analysis) and relationships with other variables in a table named  ‘Results of Exploratory Data Analysis for weather2008-17 Data Set'.
Briefly  discuss the key results of your exploratory data analysis and the  justification for selecting your five (5) top variables for predicting  whether it is likely to rain tomorrow based on today's weather  conditions. (About 500 words)
Task 1.2 Build a Decision Tree model for predicting whether it is likely to rain  tomorrow based on today's weather conditions using RapidMiner and an  appropriate set of data mining operators and a reduced  weather2008-17.csv data set determined by your exploratory data analysis  in Task 1.1. Provide these outputs from RapidMiner (1) Final Decision  Tree Model process, (2) Final Decision Tree diagram, and (3) associated  decision tree rules.
Briefly  explain your final Decision Tree Model Process, and discuss the results  of the Final Decision Tree Model drawing on the key outputs (Decision  Tree Diagram, Decision Tree Rules) for predicting whether it is likely  to rain tomorrow based on today's weather conditions and relevant  supporting literature on the interpretation of decision trees (About 250  words).
Task 1.3 Create a Weka Logistic Regression model for predicting whether it is  likely to rain tomorrow based on today's weather conditions using  RapidMiner and an appropriate set of data mining operators and a reduced  weather2008-17.csv data set determined by your exploratory data  analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final  Logistic Regression Model process and (2) Coefficients, and (3) Odds  Ratios.
Briefly  explain your final Logistic Regression Model Process, and discuss the  results of the Final Logistic Regression Model drawing on the key  outputs (Coefficients, Odds Ratios) for predicting whether it is likely  to rain tomorrow based on today's weather conditions and relevant  supporting literature on the interpretation of logistic regression  models (About 250 words). (2+3+2=7 marks)
Task 1.4 You will need to validate your Final Decision Tree Model and Final  Logistic Regression Model. Note you will need to use the X-Validation  Operator; Apply Model Operator and Performance Operator in your data  mining process models here.
Discuss  and compare the accuracy of your Final Decision Tree Model with the  Final Logistic Regression Model for whether it is likely to rain  tomorrow based on today's weather conditions based the results of the  confusion matrix, ROC, Lift chart for each final model. You should use a  table here to compare the key results of the confusion matrix for the  Final Decision Tree Model and Final Logistic Regression Model (About 250  words).
Task 2 Data Warehousing, Big Data, and Contemporary Issues
- LO1, LO2 (15 marks) & LO4
For  Task-2.1 to 2.3 research the relevant literatures on how big data  analytics capability can be incorporated into organizational data  warehouse architecture and answer the below requirements-
Task  2.1 Develop an advanced high level data warehouse architecture design  for a large state owned water utility company that incorporates both  organizational structured data as well as big data capture, processing,  storage and presentation in a same diagram called - ‘Big Data Analytics  and Data Warehouse Combined' (about 50 words).
Task  2.2 Describe and justify the main components of your proposed high level  data warehouse architecture design with big data capability  incorporated presented in Task 2.1 with appropriate literature support  (about 750 words).
Task  2.3 Identify and critically analyse the key security, privacy and  ethical concerns for organisations within a specific industry that are  already using a big data analytics and algorithmic approach to decision  making with appropriate in-text referencing support (about 700 words).
Task 3 Tableau Desktop Dashboard
Assume  you are the tableau specialist of a New Zealand based Data Analytics  Company which is helping their client US Aviation LLC (an American  aircraft manufacturer,) to better understand the wildlife strikes with  aircraft, its causes and overall impacts. The aviation- wildlife.csv  lists historical data recorded for American aviation industry regarding  wildlife strikes with aircraft for the years 2000 to 2011. See
Table 3.1 which provides the Data dictionary for aviation-wildlife.csv dataset.
 
| Variable Name | Data Type | Description | 
| 1. Aircraft:   Type | Categorical | Aircraft, Helicopter | 
| 2. Airport:   Name | Categorical | Name of Airport | 
| 3. Altitude-Bin | Categorical | < 1000 Metres, > 1000 Metres, Unknown | 
| 4. Aircraft:   Make/Model | Categorical | Make and Model of Aircraft | 
| 5. Wildlife:   Number struck | Categorical | Range of numbers | 
| 6. Effect: Impact to   flight | Categorical | None, Aborted Take-off, Engine Shut Down, Precautionary Landing, Other | 
| 7. Effect:   Other | Categorical | Text remarks recorded for flight | 
| 8. Location: Nearby if en   route | Categorical | State Abbreviation | 
| 9. Aircraft: Flight Number | Real |   | 
| 10. Flight Date | Date | Date of Flight | 
| 11. Record ID | Integer | Record ID - unique integer number | 
| 12. Effect:   Indicated Damage | Categorical | No Damage, Caused Damage | 
| 13. Location: Freeform en route | Categorical | Text remark recorded for   flight | 
| 14. Aircraft: Number of engines? | Integer | 1, 2, 3 or 4 | 
| 15. Aircraft:   Airline/Operator | Categorical | Airline Operator | 
| 16. Origin   State | Categorical | Flight Origin State | 
| 17. When: Phase of flight | Categorical | Take-off   run, Approach, Climb, En-route, Landing   Roll | 
| 18. Conditions:   Precipitation | Categorical | Fog, None, Rain, Snow | 
| 19. Remains of wildlife collected? | Categorical | False, True | 
| 20. Remains of   wildlife sent to Smithsonian | Categorical | False, True | 
| 21. Remarks | Categorical | Text   remarks recorded regarding aviation -   wildlife collusion | 
| 22. Reported: Date | Date | Date Aircraft collusion   with wildlife reported | 
| 23. Wildlife:   Size | Categorical | Small, Medium, Large | 
| 24. Conditions: Sky | Categorical | No   Cloud, Overcast, Some Cloud | 
| 25. Wildlife:   Species | Categorical | Different types of wildlife mainly birds | 
| 26. When: Time   (HHMM) | Categorical | 24 hour format | 
| 27. When: Time   of day | Categorical | Dawn, Day, Night, Dusk | 
It is  important for you understand the variables in this dataset in order to  build the required Aircraft Wildlife Strikes (AWS) dashboard with four  specified Tableau views.
Task 3  requires you build a Tableau dashboard which includes four different  views of the aviation-wildlife.csv data set for the years 2000-2011 as  specified in sub Tasks 3.1, 3.2, 3.3 and 3.4.
Task  3.1 Create a Tableau View of the impact of wildlife strikes with  aircraft over time for a specific origin state. Provide a screen capture  of and describe the Tableau view you have created and comment on the  different types of impact to aircraft from wildlife strikes over time  and does this differs much for different origin states (about 125  words).
Task  3.2 Create a Tableau View of flight phase by time of the day which shows  when wildlife strikes with aircrafts occur. Provide a screen capture of  and describe the Tableau view you have created and comment on which  phase of a flight and time of the day wildlife strikes with aircraft are  more likely to occur (about 125 words).
Task  3.3 Create a Tableau View that compares wildlife species in order of  aircraft strike frequency and the chance of damage occurring. Provide a  screen capture of and comment on which wildlife species are most  frequently involved in aircraft strikes and which wildlife species are  most likely to have the most impact in terms of damage (total cost) when  an aircraft strike occurs (about 125 words).
Task  3.4 Create a Tableau Geo-Map View of flights by origin states that  displays the number of wildlife strikes and total monetary cost for each  origin state for different periods of time. Provide a screen capture of  and describe the Tableau view you have created and comment on this  Tableau GeoMap View in relation to the number of wildlife strikes by  origin state and total monetary cost over time. A number of origin  states cannot be plotted on the geomap view as these are outside USA,  comment on how you can deal with this issue (About 125 words).
Task  3.5 Provide screen snapshot of your AWS Dashboard and an accompanying  rationale (drawing on the relevant literature for good dashboard design)  for the graphic design and functionality that is provided by your AWS  Dashboard for the four specified Tableau views for sub Tasks 3.1, 3.2,  3.3 and 3.4 (About 500 words).
You  will need to submit your Tableau workbook in .twbx format which contains  your dashboard as a separate document to your main report for  Assignment 2.
Report presentation, writing Style and quality references
- LO5
Presentation:  use of formatting, spacing, paragraphs, table of contents, list of  tables and diagrams, introduction, conclusion, Appendix
Writing style: Use of English (Correct use of language, grammar, spelling and proofreading)
Referencing:  Appropriate level of referencing in text where required, reference list  provided, used Harvard Referencing Style correctly
Your assignment 2 report must be structured in report format as follows:
- UUNZ Cover page for assignment 2 report
- Title Page
- Executive Summary
- Table of Contents (Including List of Tables and Figures)
1. Body of report- main sections and subsections for assignment 1 task and sub tasks
1.1 Task 1.1 will be an appropriate sub headings etc....then for each sub task 1.2 and
1.3 and so on......
Writing Style and Online Assignment submission
This  assignment must be the expression of your own work. Use of English  correctly; such as, correct use of language and grammar,  spelling-checking and proof-reading.
All  assignments must be submitted electronically via the course study desk  "Turnitin Check Link: Assignment-2" first. Then, Turnitin (plagiarism  software) performs an automated checking for plagiarism, collusion and  cheating. After that, you need to submit Turnitin generated originality  report (.pdf) with Tableau file (.twbx) in the uPortal "Assignment-2  submission link".
Note  carefully UUNZ policy on Academic Misconduct such as plagiarism,  collusion and cheating. If any of these occur they will be found and  dealt with by the UUNZ Academic Integrity Procedures.
Harvard AGPS Referencing Requirement:
The  Harvard AGPS referencing style and in-text citations must be used in  appropriate places. Study the referencing techniques for Harvard AGPS  Referencing. UUNZ TPS (Tertiary Programme Support) classes will help you  to present your assignment in the correct report writing format and  Harvard AGPS style of referencing.
Attachment:- Specifications.rar