Reference no: EM132372226 
                                                                               
                                       
Big Data and Analytics Assignment -
Analytic Report -
Purpose: The purpose of this task is to provide students with  practical experience in working in teams to write a data analytical  report to provide useful insights, pattern and trends in the  chosen/given dataset. This activity will give students the opportunity  to show innovation and creativity in applying SAS Analytics, and  designing useful visualization and predictive solutions for various  analytics problems.
Project Details:
This is a group assignment and you will complete the task with your  team. Your team will be made up of at most 3 members who are all  enrolled in the same laboratory - the teams will be allocated by your  tutor. It is expected that each team member will contribute equally to  the project.
Your team will use SAS Visual Analytics to explore, analyze and  visualize the dataset provided. You will receive feedback on the draft  about presentation choices, content, analysis, and style.
The aim is to use the data set allocated to provide interesting  insights, trends and patterns amongst the data. Your intended audience  is the CEO and middle management of the U.S. Department of Health and  Human Services who are responsible for overseeing the health industry in  America.
In addition, each individual team member will write a short  reflection as part of the report on their individual experience on  working on the project.
Tasks -
-  Task 1- Background information - Write a description of the dataset  and project, and its importance for the organization. Discuss the main  benefits of using visual analytics to explore big data. In this you  should include a justification for using the visualizations that you  will use and how they have been successful in other similar projects.  This discussion should be suitable for a general audience. Information  must come from at least 6 appropriate sources (2 per student) be  appropriately referenced. [2 to 3 pages].
-  Task 2 - Reporting / Dashboards - For your project, perform the  relevant data analysis tasks by answering the guided questions provided  (see Appendix for questions and dataset) and, identify the visualization  you need to develop. Note: remove any missing data points from your  visualizations where possible/suitable
-  Task 3 - Additional Visualizations - In addition to the guided  questions, it is expected that each student will provide at least two  other visualizations of the data (i.e. for a group of 3 students this is  6 extra visualizations). These additional visualizations will be judged  in terms of quality of the findings and complexity of analysis. These  visualizations should be using multi-dimensional, filtering and advance  calculation techniques.
-  Task 4 - Justification - Justify why these visualizations are chosen  in Task 2 and 3. Note: To ensure that you discuss this task properly,  you must include visual samples of the reports you produce (i.e. the  screenshots of the BI report/dashboard must be presented and explained  in the written report; use 'Snipping tool'), and also include any  assumptions that you may have made about the analysis in your Task 2  (i.e. the report to the operational team of the company). [1 to 2  pages].
-  Task 5 - Discussion of findings - using the visualizations created  discuss the findings from the data set. In this discussion you should  explain what each visualization shows. Then summarize the main findings.  [3 to 4 pages].
-  Task 6 - Executive Summary - summary of the data analysis including  a brief introduction, methods used and a list of the key findings [1  page only].
-  Task 7 - The Reflection (Individual Task) - each team member is  expected to write a brief reflection about this project in terms of  challenges, learning and contribution. [1 to 2 pages].
The report will be approximately 8 to 12 pages in length (not  counting cover page and references). The report will include the  following in the order provided below:
-  A cover page including the names and student id of all team members
-  Table of Contents
-  Table of Figures / Tables
-  Executive Summary
-  Background
-  The body of the report including reports, insights, justifications and visuals
-  Discussion of findings
-  Conclusion
-  References
-  Appendices
Appendix: Data Set and Guided Questions
-  Teradata - SAS Visual Analytics Data Source - READMIT-HISTORICAL
Guided Questions
1. GROUP TASK: Create a data dictionary for the data source by the group.
2. What are the average number of ICU days with respect to diagnose group and gender?
3. For each region, what is most and least common diagnosis group?
4. For each diagnosis group, which is most and least popular disease?
5. What are top 5 departments with respect to number of patients?
6. What are top 3 regions with respect to female patient numbers?
7. What are top 5 places where patients are discharged?
8. What are top 3 regions with respect to "black" race?
9. What are the top 5 hospitals with respect to Asthma patients' number of visits?
10. What are the active and inactive months in terms of admission for both male and female patients?
11. What are top 3 regions with respect to average days spend in  hospital? Hint- You need to create a measure to calculate number of days  spend in hospital
12. What are top 10 cities with respect to number of patients?
13. What is the trend of number of patient's admission from October  2011 to June 2012 with respect to region for both male and female?
14. Display only the most and least popular month in question 9 at a time.
15. What is the trend of patient numbers between Jan 2012 to June 2012 diagnosed with "CHF" only?
16. What is the trend of different diagnose group over the months?
17. What are top 5 departments in terms of number of operations and how these operations vary across months?
18. What are the most appropriate predictors of heart disease? Hint- use decision tree
19. Create a geomap of the Hospitals and patient number.
20. Create a cluster analysis on patient related data.