List of the aliments and their cluster membership

Assignment Help Humanities
Reference no: EM131058156

Question 1

Get the dataset "food.txt" from GauchoSpace and read it with R. Alternatively you can download this data set from the library cluster.datasets with the following code:

library(cluster.datasets)
data(nutrients.meat.fish.fowl.1959)
The Data Set contains the quantity of Energy, Protein, Fat, Calcium and Iron of 27 differen aliments.

The task here is to finding meaningful clusters in the data. To this end perform the following:
1. Find clusters using a K-means algorithm. Try out different values of K and determine your best best solution. The number of clusters you choose should be based either on appropriate measures of fit, for example SSE as defined in the book IDM, and interpretability of the results. For each value of K that you try out provide:

a. the centroids
b. the size of each cluster and a list of the aliments and their cluster membership
c. the ratio between-SS/total-SS
d. a meaning (use your imagination) to each cluster formed, e.g. what are the summarizing characteristics of the aliments in group 1?
e. to answer part d above you might find useful using a parallel coordinate plot of the centroids
2. Apply hierarchical clustering using min, max and average distances (respectively single, complete and average methods in R).
a. For each method produce a dendrogram with the labels of the aliments
b. What are the differences, in any, in using the three different measures of distances?
c. Can you individuate clusters similar to those obtained by K-means clustering?

Additional exercises for PStat 231
Question 2
Perform PCA of the food.txtdata and use a biplot to visualize the first two PC and the Variables. Based on the biplot one could still individuate groups (clusters) of aliments with similar characteristics.

a. Is the grouping obtained by PCA similar or different from that obtained by the clustering algorithms above? Explain with some detail.
b. Which technique do you find most useful in describing the data set? Why?
1
Question 3
Suppose that we have four observations, for which we compute a dissimilarity matrix, given by

0.3 0.4 0.7
0.3 0.5 0.8
0.4 0.5 0.45
0.7 0.8 0.45
For instance, the dissimilarity between the first and second observations is 0.3, and the dissimilarity between the second and fourth observations is 0.8.
a. On the basis of this dissimilarity matrix, sketch the dendrogram that results from hierarchically clustering these four observations using complete linkage. Be sure to indicate on the plot the height at which each fusion occurs, as well as the observations corresponding to each leaf in the dendrogram.

b. Suppose that we cut the dendogram obtained in (a) such that two clusters result. Which observations are in each cluster?

Reference no: EM131058156

Questions Cloud

Design a database for an automobile company : Design a database for an automobile company to provide to its dealers to assist them in maintaining customer records and dealer inventory and to assist sales staff in ordering cars.
Identify specific environmental stewardship activities : This can include the removal of exotic species, trail repair, etc. Also, there are environmental groups that identify specific environmental stewardship activities that need volunteers to help pick up trash, plant trees, etc.
Design a database for a world-wide package delivery company : The database must be able to keep track of customers (who ship items) and customers (who receive items); some customers may do both.
Conduct a critical literature review of your research topic : What have researchers said about your research topic? What types of studies have they done, and what have been the findings and what epistemological perspectives have served as the foundation for these studies?
List of the aliments and their cluster membership : Get the dataset "food.txt" from GauchoSpace and read it with R. Alternatively you can download this data set from the library cluster.datasets with the following code:
Mean life expectancy : The U.S. Center for Disease Control reports that the mean life expectancy was 47.6 years for whites born in 1900 and 33.0 years for nonwhites. Suppose that you randomly survey death records for people born in 1900 in a certain county.
Question regarding the sample proportion : Find the test statistic that would be used for a test of H0: p = 0.3 versus Ha: p ≠ 0.3, given a sample proportion of 0.35 from a sample size of 200.
Design a database for an airline : Your design should include an E-R diagram, a set of relational schemas, and a list of constraints, including primary-key and foreign-key constraints.
How does seniority play a role in how overtime is scheduled : If an overtime list is created, how should it be managed since there are certain workers qualified for some tasks but not others? Should there be several task specific lists created, or an overall shop list? If a listed is created for overtime, ma..

Reviews

Write a Review

Humanities Questions & Answers

  Prepare a 1400- to 1500-word planning and threat assessment

present the following scenarioyour client is a man in his 40s. he is confined to a wheelchair and requires attendance

  Please write your thoughts an reaction about my classmate

please write your thoughts an reaction about my classmate answer. tell if you are agree or not and explainthe article

  How might depression lead to high-risk behaviors

How might depression lead to high-risk behaviors including alcohol abuse, sexual risk taking, or violence for adolescent males?

  Discuss how chinese japanesetaiwanese vietnamese developin

discuss how chinese japanesetaiwanese vietnamese developin dubai and how they make growth. how and what business has

  Identifies the eight categories of human development

create a table that identifies the eight categories of human development, summarizes the importance of the asset to the development of all youth

  Artist and finish the childrens book

I created a character name and concept for a children's book on my own. I asked an artist to draw the character. I liked his illustration. I then wrote the book with the assistance of my brother. After my brother and I wrote the book we decided to..

  Analyze manifestations of issue in rural communities

Identify and discuss the overt and covert objectives as the issue is brought to the public consciousness. Analyze manifestations of the issue in rural and small town communities.

  Comment on the amplifier linearity

For the circuit in Fig. 9.14, assuming α = 1 and IRC =5 V, use Eqs. (9.48) and (9.49) to find iC1 and iC2, and hence determine vod =vC2 -vC1 for input differential signals vid ≡ vB1 - vB2 of 2 mV, 5 mV, 10 mV, 15 mV, 20 mV, 25 mV, 30 mV, 35 mV,and..

  Venus de willendorf with the culture

Outline and Proposal following the annotations, you will be ready to plan your paper. An outline (one and one half pages) and a proposal (two to three pages) of your intended project are due.

  The difference between biases and fallacies

Biases affect how you interpret and collect information and can lead to flawed reasoning. Yet it is a typical part of human psychology as people tend to process information through the filter of their own perception.

  Global perspectives on indigenous peoples

This assignment is an opportunity for you to reflect on the material we have covered this semester, but you must take the final module, "Global Perspectives on Indigenous Peoples," into careful consideration.

  What occurred to warrant the quarantine? was this justified?

What occurred to warrant the quarantine? Was this justified?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd