Clarify analysis what sort of people were likely to survive

Assignment Help Data Structure & Algorithms
Reference no: EM13934799

Data Science Project Report

Submit the files listed below in a single ZIP file:

• Titanic.rmd - R Markdown document used to generate your Data Science Project Report. An initial Sample_RMD.rm template file has been provided for you.

• Titanic.html - standalone HTML document (embedded images and code) generated in R Studio using Knitr and your Titanic.rmd R Markdown file.

• /data directory with your dataset files

Reproducible Research

For your Data Science Project Report you are expected to meet the criteria of a reproducible research project. Your Project report will document your analysis of the Titanic dataset. It will include your initial data exploration, model building and evaluation and your final predicted outcomes for the test dataset. For your research to be considered reproducible you must provide:

• The data used for your analysis

• All final code files, with appropriate comments

• A report of your analysis which includes background information explaining the question you are trying to answer, a discussion of the analysis and conclusions reached for your project with appropriate supporting explanations and figures.

To comply with this final requirement, your final report will be a standalone HTML document created using R Studio with Knitr & R Markdown tools. Using Knitr with R Markdown allows you to create a report that interweaves your discussion with your code and figures. See R Markdown - Dynamic Documents for R in the list of online resources provided below for further information.

Data Analysis Project

This assessment is based on a Kaggle competition. For this assignment you are asked to predict which of the Titanic's passengers survived the disaster. More information on the competition is available at the Kaggle competition site: Titanic: Machine Learning from Disaster

[https://www.kaggle.com/c/titanic].

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy. (Kaggle 2012)

Project Report Outline

Please use the project report outline provided below as a general guide to the specific sections and content that you should include in your project report.

1. Background

Introduce and discuss the background and purpose of your project. What information does the dataset provide? What question(s) are you trying to answer?

2. Exploratory analysis

Conduct exploratory analysis to discover which of the independent variables are most informative. You are required to explore and report on at least four variables. Three of the four must be Age, Sex and Class. You are free to explore and report on any other independent variables in the dataset. Your discussion should include at least one table or figure for each variable illustrating the relationship between each variable and passenger survival.

3. Building and evaluating the model

a. Discuss your choice of model. Explain why you've chosen this specific model. What are its strengths? What are its limitations?

b. Evaluate your model. The discussion for the evaluation section should include answers to following questions: How well does your model predict? Is it overfitting to the training set? Do you trust this model?

c. This section should include at least 2 tables or figures to summarize/ illustrate your discussion.

4. Predicting passenger survival

Finally, use the model you've built to predict the outcomes for the test set and compare these results to your training data. Optionally, I encourage you to submit your predictions to the Kaggle competition site and include your results in your report.

5. Conclusions

Discuss the conclusions you've drawn based on your analysis.

List of online resources

• Titanic: Machine Learning from Disaster

Kaggle competition site.

https://www.kaggle.com/c/titanic

• R Markdown -Dynamic Documents for R https://rmarkdown.rstudio.com/

• Getting Started with R: Kaggle's Titanic Competition:

List of 4 excellent tutorials for using R to compete in the Titanic competition. https://www.kaggle.com/c/titanic/details/new-getting-started-with-r

• Kaggle and DataCamp R Tutorial on Machine Learning

Interactive tutorial by Kaggle and DataCamp which provides coding exercises to help you predict the passenger survival rates for Kaggle's Titanic competition.

https://www.datacamp.com/courses/kaggle-tutorial-on-machine-learing-the-sinking-of-thetitanic

References

Titanic: Machine Learning from Disaster 2012, Kaggle, viewed 8 Oct 2015, https://www.kaggle.com/c/titanic.

Reference no: EM13934799

Questions Cloud

Examples of organs in which mitosis is frequent : What are some examples of organs in which mitosis is frequent, less or absent
A sample database for hotel reservation transactions develop : A sample database for hotel reservation transactions developed in Microsoft Access is shown next, but the Web site may have a more recent version of this database for this exercise. Develop some reports that provide information to help management mak..
What job numbers likely relate to the balance : What is Cost of Goods Sold? What job numbers likely relate to the balance in Cost of Goods Sold?
Analyze the changing landscape of the health care system : Analyze the changing landscape of the health care system. Differentiate the various places health care is delivered. Analyze what impact cultural demographics have on the health care market. Analyze the targeted audience of the clinic or office bas..
Clarify analysis what sort of people were likely to survive : In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.
Write a program which randomly chooses an integer : Write a program which randomly chooses an integer from 1 to 100. The program should then tell the user.The program should then ask the user to complete the puzzle such that each row and each column consists of the letters
Research and development project scheduled : Your program has a research and development project scheduled to start in January 2017 which is expected to take 40 months to complete. The project is expected to cost a total of $150 million (then-year dollars), with cost expected to be incurred as ..
Aircraft production contract planned for award fiscal year : Time now is February 2015. You have been asked to determine the amount that should be included in the Air Force's FY 2017 budget request for an aircraft production contract planned for award in that fiscal year. The contractor estimates that the cost..
Arithmetic unit and related self testing in mips : Write an arithmetic unit and related self testing in MIPS assembly as following. Put all the source codes in a directory and compress them into a zip file and upload. Grader should be able to download your zip file, unzip it and directly load it i..

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Write an algorithm that takes a sequence of real numbers

Write an algorithm that takes a sequence of real numbers s and its length n and returns the absolute value of the average of these numbers.

  Explaining view of header and footer areas of worksheet

In which view can you see header and footer areas of worksheet?

  Running time analyses of all the methods

You need to give the running time analyses of all the methods in terms of the Big O notation. Include your running time analyses in the source file of the CompressedSuffixTrie class and comment out them.

  Question about binomial tree

A binomial tree of height O, Bo is a one node tree. A binomial tree of height k, Bk is formed through attaching a binomial tree, Bk-1 to root of another binomial tree another binomial tree Bk-1.

  Creating an access database

PLUS is a corporation that makes all types of visual aids for judicial proceedings. Customers are usually private law firms, although the District Attorney's office has occasionally contracted for its services.

  What are the major business objectives

What are the major business objectives and processes operations that the airline company carries out - Describe the stakeholders involved

  Maekawa''s algorithm

Maekawa's Algorithm is used to achieve mutual exclusion for 13 sites. Suppose the sites are labeled 1, 2, ..., 13. Find the request sets R1, R2, ... , R13. Suppose sites 1, 6, 12 want to enter a critical section ( CS ) and they have sent requests in ..

  Calculate the size of the state space as a function of n

n vehicles occupy squares (1, 1) through ( n , 1) (i.e., the bottom row) of an n × n grid. The vehicles must be moved to the top row but in reverse order

  Write control structure-pseudocode algorithm for simple task

Three simple control structures which could be used to make this algorithm. What do you believe is most difficult part of creating algorithm?

  Develop an algorithm that will work with any combination

Given the above scenario, develop an algorithm (final design presented as a flow chart) that will work with any combination of items. Your algorithm shou4 generate some form of packing instructions for staff to follow, which should include at leas..

  Construct an entity-relationship model for the database

Construct an entity-relationship (ER) model for the database. Make sure you include in your model details of entities, relationships, attributes, keys and limits in participation.

  Server of local hospital to support remote access

Explain the file system that will be installed in the server of your local hospital to support remote access of data through the hospital's doctors while they are using their mobile equipments such as cellular phones or PDAs.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd