Explain the contents of some or all of the given clusters?

Assignment Help Computer Engineering
Reference no: EM131926012

Problem

University Rankings. The dataset on American College and University Rankings contains information on 1302 American colleges and universities offering an undergraduate program. For each university, there are 17 measurements, including continuous measurements (such as tuition and graduation rate) and categorical measurements (such as location by state and whether it is a private or public school).

Note that many records are missing some measurements. Our first goal is to estimate these missing values from "similar" records. This will be done by clustering the complete records and then finding the closest cluster for each of the partial records. The missing values will be imputed from the information in that cluster.

a. Remove all records with missing measurements from the dataset.

b. For all the continuous measurements, run hierarchical clustering using complete linkage and Euclidean distance. Make sure to normalize the measurements. From the dendrogram: How many clusters seem reasonable for describing these data?

c. Compare the summary statistics for each cluster and describe each cluster in this context (e.g., "Universities with high tuition, low acceptance rate...").

d. Use the categorical measurements that were not used in the analysis (State and Private/Public) to characterize the different clusters. Is there any relationship between the clusters and the categorical information?

e. What other external information can explain the contents of some or all of these clusters?

f. Consider Tufts University, which is missing some information. Compute the Euclidean distance of this record from each of the clusters that you found above (using only the measurements that you have). Which cluster is it closest to? Impute the missing values for Tufts by taking the average of the cluster on those measurements.

Reference no: EM131926012

Questions Cloud

How the playwrights view marriage expectations of women : You are to write a four (4) to four and a half (4.5) page essay in which you identify and develop a thesis.
What are the ten most popular occupations in camp data frame : R Data Wrangling Homework- Download the "Campaign.zip" file from attachment. What are the ten most popular occupations and their counts in the camp data frame
How should they be used in the cluster analysis : For this goal, you are requested to find a cluster of "healthy cereals." Should the data be normalized? If not, how should they be used in the cluster analysis?
Bogus classification of marijuana : By using phrases like rising tide of common sense, bogus classification of marijuana, and when Uncle Sam gets out of the way
Explain the contents of some or all of the given clusters? : What other external information can explain the contents of some or all of these clusters? Remove all records with missing measurements from the dataset.
What is nashs thesis : What is he trying to convince you is true about people Colonial America and the reasons they might have participated in the American Revolution?
What happens after the jury has returned a verdict : What happens after the jury has returned a verdict. The discussion will cover motions for a new trial, motions in arrest of judgment, as well as the appeal.
How many natural clusters appear : Perform hierarchical clustering and inspect the dendrogram. From the dendrogram, how many natural clusters appear?
Examine the term-document matrix : Examine the term-document matrix. i. Is it sparse or dense? ii. Find two non-zero entries and briefly interpret their meaning, in words.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd