Build model for predicting survival of passengers on titanic

Assignment Help Applied Statistics
Reference no: EM13838108

Assignment: Written and Practical Report

The key frameworks and concepts covered in modules 1-5 are particularly relevant for this assignment. Assignment relates to the specific course learning objectives 1, 2 and 4 and associated MBA program learning goals and skills: Global Content, Problem solving, Critical thinking, and Written Communication at level 3:

1. Demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes.

2. Identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems.

4. Demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.

Assignment consists of three main tasks and a number of sub tasks

Task 1: Consists of the following sub tasks The sinking of the Titanic is a famous event. You may find it useful to research the facts surrounding the sinking of the Titanic to inform your understanding of the problem and ensuing interpretation of your data analysis of the factors determining the survival of passengers on the Titanic. Use the data mining tool RapidMiner to conduct an exploratory analysis of the titanic_train.csv data set which is provided on the course study desk Assignment 2 folder link and then build a simple predictive model of Survival on the Titanic using a Decision Tree.

a) You need to identify five key variables that contribute most to determining the survival rate of passengers on the ill-fated Titanic on its maiden voyage. Note you should also refer to the data dictionary provided with the titanic3_train.csv file which describes each of the variables and their range of values. (Hint: an exploratory analysis should be based on summary statistics, histograms, crosstab tables and scatterplots of individual variables and the relationship between individual variables and the target variable survived. Which variables are correlated with target variable survived and other variables?) You might also need to consider reformatting some of variables to facilitate the next stage of analysis of the titanic3._train.csv and titanic3_score.csv data sets using a Decision Tree (Hint: you will need to convert the survival variable to nominal variable with the values Yes = 1, No = 0 in titanic_train.csv). See Data Mining for the Masses Chapters 3 and 4 for guidance in Exploratory Data Analysis using RapidMiner.

Discuss each of your five top predictor variables and the results of your exploratory data analysis in general using the RapidMiner data mining tool as well as how you dealt with missing data and unusual data informed by relevant supporting literature on the survival rate of passengers on the Titanic. Your discussion should also include appropriate statistical analysis results such as graphs and results tables from conducting an exploratory data analysis in the RapidMiner data mining tool with some supporting references on predictive model building and interpretation using Decision Trees in data mining (about 600 words).

The following table lists the data dictionary for the data set titanic_train.csv. (Note: titanic_score.csv is the same as titanic_train.csv but does not contain any values for target variable survived which is referred to as a label variable in Rapidminer).

Variable Description pclass Passenger Class (1 = 1st class; 2 = 2nd class; 3 = 3rd class) survived Survived (0 = No; 1 = Yes) name Name Sex Sex Age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation(C = Cherbourg; Q = Queenstown; S = Southampton) boat Lifeboat body Body Identification Number home.dest Home/Destination

SPECIAL NOTES: Pclass is a proxy for socio-economic status (SES) 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower

Age is in Years; Fractional if Age less than One (1) If the Age is Estimated, it is in the form xx.5

Fare is in Pre-1970 British Pounds (£) Conversion Factors: 1£ = 12s = 240d and 1s = 20d With respect to the family relation variables (i.e. sibsp and parch) some relations were ignored. The following are the definitions used for sibsp and parch.

Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiancées Ignored) Parent: Mother or Father of Passenger Aboard Titanic Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic

Other family relatives excluded from this study include cousins, nephews/nieces, aunts/uncles, and in-laws. Some children travelled only with a nanny, therefore parch=0 for them. As well, some travelled with very close friends or neighbours in a village, however, the definitions do not support such relations.

STORY BEHIND THE DATA: This dataset is based on the Titanic Passenger List edited by Michael A. Findlay, originally published in Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, and expanded with the help of the internet community.

b). Build a model for predicting the survival of passengers on the Titanic using a decision tree in RapidMiner (See Chapter 10 of Data Mining for the Masses textbook for guidance on Decision Trees in RapidMiner) using the two data sets, titanic3_train.csv and titanic3_score.csv. Then present and discuss the results of your Decision Tree analysis and a diagram showing your final Decision Tree. Comment on the relative predictive strength of this model and what you believe are the most significant variables that determined whether a passenger on the Titanic survived or not. Include some supporting references on using Decision Trees in data mining (about 400 words).

Task 2: Consists of the following two sub tasks Big data is a hot topic and is generating enormous interest in industry and academia however there is no agreement on the definition of this term and the application of big data analytics in practice is currently more hype than reality.

Your task is twofold:

a) Research and critically critique the current literature available on the Internet and in academic journals and conferences and provide a comprehensive definition and description of the term ‘Big Data' that is underpinned and supported by the reference literature (Approx 500 words)

b) Research and critically critique the current literature available on the Internet and in academic journals and conferences and provide a comprehensive discussion describing one specific application of Big data analytics in an Industry sector, emphasize how, in this specific application, of Big data analytics is providing business value to organisations in this industry sector (Approx 1000 words)

Your discussion and analysis here should be underpinned by an appropriate level of in text referencing using Harvard Referencing Style.

Task 3: Consists of the following sub tasks With the following Excel file SalesSuperstore.xlsx provided on the course study desk Assignment Folder link and using Tableau Desktop 8.3 produce the four following reports with appropriate accompanying graphs based on a Tableau workbook sheet view for each. Briefly comment on each report in about 125 words in terms of what trends and patterns are apparent in each report.

The SalesSuperstore.xlsx file contains the following dimensions and information:

1. Customer Name, Customer Segment

2. Location- Region, State, City, Zipcode

3. Product Category, Sub Category, Product Name, Product Container, Unit Price

4. Order Information

5 . Shipping Information

6. Sales Information

7. Profit

a) Create a report and accompanying graph using Tableau that shows a trend analysis for sales by Product Category over the years 2009 to 2012 and comment on key trends and patterns apparent in this report (125 words approx)

b) Create a report and accompanying graph using Tableau that shows for each Product Category Average Profit and Total Sales for each month over the years 2009 to 2012 and comment on key trends and patterns apparent in this report (125 words approx)

c) Create a geographical map presentation using Tableau that shows graphically the relative size by City within each state, Product Sales for year 2012 and comment on key trends and patterns in this report (125 words approx)

d) Create a report and accompanying graph using Tableau that shows for Product Sub Categories that are technology based Unit Prices, Sales and Profit for each month over the years 2009 to 2012 and comment on key trends and patterns in this report (125 words approx)
Your assignment 2 report must be structured as follows, which is similar to the report structure detailed in Summers & Smith 2010:
Cover page for assignment 2 report 1. Title Page 2. Table of Contents 3. Body of report - main sections and subsections for assignment 2 task and sub tasks so 3.1 Task 1 will be a main heading with appropriate sub headings etc....for each sub task etc.. 3.2 Task 2 ... 3.3 Task 3 .... 4. List of References 5. List of Appendices

You need to submit two files when you submit Assignment 2 1. Your Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with the extension .docx 2. Your Assignment 2 Task 3 as a Tableau packaged workbook with the extension .twbx

Use the following file naming convention: 1. Student_no_Student_name_CIS8008_Ass2.docx and 2. Student_no_Student_name_CIS8008_Ass2.twbx

Online Assignment submission All assignments must be submitted electronically via the course study Assignment 2 submission link and are subject to automated checking for plagiarism and collusion by Turnitin when you submit your Assignment 2 documents via the Assignment 2 submission link.

Note carefully University policy on plagiarism, collusion and cheating. If any of these occur they will be found and dealt with.

Harvard referencing resources Install a reference tool (example Endnote) which integrates with your word processor. These tools are a great help for referencing and citing sources in your assignments. For more information on how to get Endnote you may visit the following webpage: https://www.usq.edu.au/library/referencing/endnote-bibliographic-software.

Study the referencing techniques in Communication skills handbook (Smith & Summers 2010). The USQ Librarian has compiled the following resources on how to reference correctly using the Harvard referencing system - make use of these excellent resources if you are unsure as how to reference correctly using Harvard referencing system. Library Harvard Referencing Guide https://www.usq.edu.au/library/referencing/harvardagps-referencing-guide

Reference no: EM13838108

Questions Cloud

What issues should be considered in drafting code of conduct : Within the Discussion Board area, write 600-800 words that respond to the following questions with your thoughts, ideas, and comments. Why is it important for a company to have a written code of conduct? What issues should be considered in drafting ..
What sorts of protection is provided by the bank : What sorts of protection is provided by the bank to ensure secure online banking? Justify why the bank would use these security measures.
Which side of health triangle corresponds to getting enough : Which side of the health triangle corresponds to getting enough rest?
What transferable skills do you have : Examine the critical skills and competencies required to achieve success. What transferable skills do you have and how can these skills be leveraged if you decide to take a different career or life path? How can your professional experience help you ..
Build model for predicting survival of passengers on titanic : Build a model for predicting the survival of passengers on the Titanic using a decision tree in RapidMiner using the two data sets, titanic3_train.csv and titanic3_score.csv.
Discuss the staffing needs of your organization : Discuss the staffing needs of your organization. Thoughtfully discuss each listed item (two to three fully developed sentences per item)
Determine the maximum volume of the funnel : Determine the maximum volume of the funnel
Find the distance between a and b : find the distance between A and B
Find the derivative : 1. Find the derivative. 5. The displacement s (in cm) of a linkage joint of a robot is given by s = (4t-t2)2/3, where t is the time (in s). Find the velocity of the joint for t = 2.75 s.

Reviews

Write a Review

Applied Statistics Questions & Answers

  What is the new project completion time

What is the new project completion time and what is the new total project cost - Identify what are the time and the path of this minimum cost schedule.

  Heteroscedastic and homoscedastic in statistics

In you OWN words define heteroscedastic and homoscedastic in statistics. state any references if used.

  What is a three-period moving average forecast

What is a three-period moving average forecast for the month of July? What is the slope of the regression equation developed when the Sales data are used to predict the Pounds?

  Ways in which we as researchers can present our data

What are some ways in which we as researchers can present our data and our findings in a way that can be understood my managers, employees and other stakeholders?

  Approximate the probability that at most will be defective

A manufacturing process produces semiconductor chips with a known failure rate of .If a random sample of chips is selected, approximate the probability that at most will be defective.Use the normal approximation to the binomial with a correction for ..

  The mean sales per customer µ for all of the sales

The mean sales per customer µ for all of the sales for your company last month is not known. Based on your past experience, you are willing to assume that the population standard deviation of sales, σ, is about $220. If you take a random sample of 10..

  A marketing analyst is studying the relationship

A marketing analyst is studying the relationship between the money spent on TV advertising (x) and the increase in sales (y). One study reported the following data (in $) for a particular company.

  Testing the significance of each partial regression coeffici

For a set of 15 data points, a computer statistical package has found the multiple regression equation to be y ^ = -23 + 20x_1+ 5x_2 + 25x_3 and has listed the t-ratio for testing the significance of each partial regression coefficient. Using the 0.0..

  A data set is normally distributed with mean

Assume a data set is normally distributed with mean 160 and standard deviation 25. If the data set contains 300 data values, approximately how many of the data values will fall within the range 110 to 210?

  Scores on the sat college entrance test in a recent year wer

A.  Question 1: A study of voting chose 663 registered voters at random shortly after an election.  Of these 72% said they had voted in the election.  Election records show that only 56% of all registered voters voted in the election.  The boldface n..

  What statistical test should the researchers

A group of researchers runs a study in which they look at the relationship between music preferences and reading preferences. What statistical test should the researchers use on their data?

  Selected from an underlying normal distribution

In a sample of n = 15 selected from an underlying normal distribution, we obtain standard deviation s =10. Test the null hypothesis that the population standard deviation σ = 16.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd