Reference no: EM132581000 , Length: 2500 Words
PROF-740 Fundamentals of Data Analytics Assignment - Rochester Institute of Technology, RIT Dubai, UAE
Assessment Title - Revised Exploratory Data Analysis (EDA) and Model Building
Purpose of the assessment (with Course Learning Outcome Mapping) This assignment is designed to assess students' knowledge and skills related to the following learning outcomes:
1- Comprehend the principles and purposes of data analytics and articulate the different dimensions of the area.
2- Analyze the challenges related to scaling in analytics with large data sets, identify appropriate techniques to scale up the computation.
3- Demonstrate the use of analytical tools to manipulate a data set to extract statistics and features, coping with missing and dirty data.
4- Apply basic learning techniques to build regression models and classifiers, predicting new data values via them.
5- Identify the design techniques used to mitigate the vulnerabilities in data privacy and security in sharing.
Assignment Description: Exploratory Data Analysis (EDA) and Model Building
On April 15, 1912, the largest passenger liner ever made collided with an iceberg during her maiden voyage. When the Titanic sank it killed 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. One of the reasons that the shipwreck resulted in such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others.
The titanic.csv file contains data for 1310 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including whether they survived (S), their age (A), their passenger-class (C), their sex (G) and the fare they paid (X).
1- Part 1- Using the Titanic data set conduct a final exploratory data analysis (EDA), and Build a model that answer your questions in the revised EDA of your midterm project. And make sure to submit the following:
- R Markdown file
- Final Report using MS-word
2- Part 2- Write a summary that Identify the design techniques used to mitigate the vulnerabilities in data privacy and security in sharing.
You can reference back to chapter 5, 18, 19 in the textbook to help you in understanding the different stages of the EDA and Model Building, and the Article Cloud Security Alliance, Big Data Working Group, "Expanded Top Ten Big Data Security and Privacy Challenges", April 2013.