Explain the concept of clustering and k-means

Assignment Help Data Structure & Algorithms
Reference no: EM131995703 , Length: 8 pages

Background

A research team planned to study the heath development of the world in the past 15 years. The team retrieved the dataset from World Bank about Health and Population Statistics between 2001 and 2015.

The dataset contains the following attributes:
- Birth rate, crude (per 1,000 people)
- Fertility rate, total (births per woman)
- Adolescent fertility rate (births per 1,000 women ages 15-19)
- Death rate, crude (per 1,000 people)
- Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total)
- Cause of death, by injury (% of total)
- Cause of death, by non-communicable diseases (% of total)
- Mortality caused by road traffic injury (per 100,000 people)
- Health expenditure per capita (current US$)
- GNI per capita, Atlas method (current US$)
- Health expenditure, private (% of GDP)
- Health expenditure, public (% of GDP)
- Health expenditure, total (% of GDP)
- Maternal mortality ratio (national estimate, per 100,000 live births)
- Immunization, BCG (% of one-year-old children)
- Life expectancy at birth, male (years)
- Life expectancy at birth, female (years)
- Life expectancy at birth, total (years)
- School enrollment, primary (% gross)
- School enrollment, secondary (% gross)
- School enrollment, tertiary (% gross)
- School enrollment, tertiary, female (% gross)
- Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)
- Unemployment, female (% of female labor force) (modeled ILO estimate)
- Unemployment, male (% of male labor force) (modeled ILO estimate)
- Unemployment, total (% of total labor force) (modeled ILO estimate)
More details about the data attributes and data content can be found in the attached documents.

Assignment Task

You are a member of the team, and need to perform data analysis on countries in the region of East Asia & Pacific.

The team has not set any specific goal for the analysis. Therefore, you have the freedom to explore the data, and dig out anything you feel interesting or significant.

You have been requested to prepare a data analysis report about your work and explain your findings. The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.

To prepare the report, please follow the following outline:

1. Introduction

Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from.

2. Data Setup
Describe how to load the data, and the libraries needed. Provide an overview of the data about its dimensions and structures.

3. Exploratory Data Analysis
Perform 3 one-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate.

Perform 2 two-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate

The analysis can be performed on all years and all countries, or on a subset of your interest.

4. Advanced Analysis
Clustering
Briefly explain the concept of clustering and k-means.
Try to do a clustering analysis to group countries according to some selected attributes.

Linear Regression
Briefly explain the concept of linear regression.
Try to do 2 linear regression analysis. Plot the learned models.

The analysis can be performed on all years and all countries, or on a subset of your interest.

5. Conclusion

6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.

Report Format

Your report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. All comments and graph titles are counted.

The report MUST be formatted using the following guidelines:
- Paragraph text - 12 point Calibri single line spacing
- Headings - Arial in an appropriate type size
- Margins - 2.5cm on all margins
- Header - Report title
- Footer - page number (including the word "Page")
- Page numbering - roman numerals (i, ii, iii, iv) up to and including the Table of Contents, restart numbering using conventional numerals (1, 2, 3, 4) from the first page after the Table of Contents.
- Title Page - Must not contain headers or footers. Include your name as the report's author but DO NOT include any reference to your student ID, course code or course name.

Attachment:- Health and Population Statistics Data.rar

Verified Expert

This task provides a clear working example on regression model techniques and cluster analysis. Health development is an adaptive process composed of multiple transactions between these contexts and the biobehavioral regulatory systems that define human functions

Reference no: EM131995703

Questions Cloud

What is the maximum price that you would pay for share : What is the maximum price that you would pay for a share today if you wanted to earn a 12% return?
Market equilibrium in the education market : Are there any impacts from the pricing ceiling market equilibrium in the education market.
Recover drug development costs more quickly : Proposals to reduce patent length for drugs are sometimes made, but some critics argue that such a change would result in even higher prices.
What is the free cash flow to equity holders : The Sahali Corporation has EBIT of $2300000 million. What is the free cash flow to equity holders of Sahali Corp?
Explain the concept of clustering and k-means : ICT110 - Introduction to Data Science - University of sunshine coast - Describe how to load the data, and the libraries needed. Provide an overview of the data
Are we created by a builder to be builders : God is the ultimate builder. We were created in His image, so we also are created to be builders. From children to adults, people love to construct.
Two policies on consumer surplus and welfare : Compare the effect of these two policies on consumer surplus and welfare. Use a graph to show which policy is superior.
Compare strategic controls and financial controls : Compare and contrast strategic controls and financial controls. Provide specific examples of how each may be used to best serve a corporation.
What source do you use to determine godly standards : Managerial accounting uses non-financial standards to measure performance. Management and employees (or their unions) often debate these standards.

Reviews

len1995703

5/25/2018 1:56:02 AM

This assignment will take a number of weeks to complete and will require a good understanding of data science and management for successful completion. It is imperative that students take heed of the following points in relation to doing this assignment: 1. Ensure that you clearly understand the requirements for the assignment – what has to be done and what are the deliverables. 2. If you do not understand any of the assignment requirements – Please ASK the course coordinator or your tutor. 3. Each time you work on any aspect of the assignment reread the assignment requirements to ensure that what is required is clearly understood.

len1995703

5/25/2018 1:55:50 AM

Requests for an extension to an assignment MUST be made to the course coordinator prior to the date of submission and requests made on the day of submission or after the submission date will only be considered in exceptional circumstances. 100% 90% 75% 65% 50% 25% 0 Outstanding: High Distinction: Distinction: Credit: Pass: Fail: Not Submitted: An outstanding attempt – well formatted and professionally presented piece of work. An excellent piece of work that meets all the specified criteria with very minor omissions or mistakes More than competently meets the criteria specified with only minor mistakes or omissions. Competently meets the criteria as specified with few minor mistakes or omissions. Satisfactorily meets the criteria. Did not sufficiently meet the criteria to pass. No attempt made or different from what is acceptable

len1995703

5/25/2018 1:55:38 AM

please read all the attached documents carefully and write the report according to the details mentioned Submit your assignment to Blackboard Task 2. Please follow the submission instructions on Blackboard. The assignment will be marked out of a total of 100 marks and forms 30% of the total assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign system provided by Blackboard automatically. Refer to your Course Outline or the Course Web Site for a copy of the “Student Misconduct, Plagiarism and Collusion” guidelines. Assignment submission extensions will only be made using the official Faculty of Arts, Business and Law Guidelines.

Write a Review

Data Structure & Algorithms Questions & Answers

  Determining the incident containment strategy

Construct a process-flow diagram that illustrates the process of determining the incident containment strategy that would be used in this scenario, and identify which containment strategy would be appropriate in this case, through the use of graph..

  Explain the huffman algorithm

Huffman's algorithm occasionally generates compressed files that are larger than the original. Prove that all compression algorithms must have this property.

  Compute the change for different values of n with coins

Compute the change for different values of n with coins of different denominations using the greedy algorithm and determine whether the smallest number.

  Designing and populating a course table

Use data to design and populate a course table. Designate the CourseID field as a Primary Key and permit your database to automatically produce a value for this field.

  Edge connectivity of undirected graph-running maximum-flow

Illustrate how edge connectivity of undirected graph G = (V, E) can be determined by running maximum-flow algorithm on at most |V| flow networks, each having O(V) vertices and O(E) edges.

  Write a program that explores the seating patterns

Write a program that explores the seating patterns related to course performance by using an array of student scores. Draw a seating chart of the classroom.

  Write the algorithm which takes as input npda

Write the algorithm (described informally) which takes as input NPDA A and determines whether the language of A is nonempty.

  Data and process modeling

The next phase in the project development cycle is to develop a logical model of the system based on the system requirements. The first step is about the "what" step. We need to show what the system will do, without worrying about how it will do..

  Identifying the use cases of the system

Identifying the use cases of the system based on the narrative above, and giving a brief description for each of the use cases.

  How we can maintain the integrity of the linked list

Describe an O(1) algorithm that logically removes the value stored in such a node from the linked list, maintaining the integrity of the linked list.

  What about an inorder traversal of H

What is the sequence of indices of the array list that are visited in a preorder traversal of H? What about an inorder traversal of H?

  Create a flowchart to show the process

Create a flowchart to show the process that will allow the implementation of Stack, Push, and Pop operations.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd