What does an agglomeration schedule tell us in general

Assignment Help Applied Statistics
Reference no: EM131439769

Question 1: Cluster Analysis

The spss file "metropolitan areas.sav" contains a data set taken from "Cities - Life in the World's 100 largest metropolitan areas, Population Crisis Committee, Washington, 1990". The data includes information about the following variables:

Population = population in millions

Murders = no of murders per year per 100,000 people

Food = percentage of income spent on food

Pproom = average number of persons living in one room

Water = % of homes with access to water and electricity

Telephone = no of telephones per 100 people

School = % of children completing education to age 18 years

Infant death = infant deaths/100 live births

Noise = ambient noise level on scale 1 (quietest) to 10 (noisiest)

Traffic = traffic flow: average mph of traffic in rush hour

area = area code: 1 = USA, Canada, Europe, Japan, Australia

In order to reduce the complexity of the data, I have conducted a cluster analysis.

a) What does an agglomeration schedule tell us in general? Provide a brief hypothetical example (using the Metropolitan Areas case), outlining the circumstances in which we might be interested in interpreting the agglomeration schedule.

b) When performing the hierarchical cluster analysis, I decided to select a 4 cluster solution. Would you have chosen the same number of clusters? What are the criteria for making this decision?

c) Please briefly summarize the key findings from the K-Means cluster solution. Do you believe it is a good solution? How would you label the clusters? What could be done to try improving the cluster solution?

d) As you can see in the dialog box for the K-Means cluster analysis, I did not specify any initial cluster means before performing the analysis. Why does it normally make sense to predetermine these values? What kinds of cluster means would make sense here as an input to the K-means cluster model?

e) Imagine that we obtain data from additional cities that are not currently included in our data set. How can I assign these new observations to one of the clusters identified in our previous analysis?

Question 2: Logistic Regression

A study was done to examine the characteristics of MBA graduates from four top US business schools. From the study, a subset of 100 students was selected. The data sample includes information on each student's profile with respect to

1. Grade Point Average (GPA)

2. GMAT Score

3. College Major

a. Humanities/Social Science (binary: 1=yes, 0=no)

b. Maths/Engineering (binary: 1=yes, 0=no)

c. Business (binary: 1=yes, 0=no)

4. Gender (1=Male, 2=Female)

5. Work Experience (1=1 year, 2=2years,...,6=more than 6 years)

One of the business schools (variable name: School_B), which is located on the East Coast has analyzed the data in order to better understand the profile of their MBA students in comparison to students at other top schools. In particular, a logistic regression analysis was performed using a binary variable (attendance=1; non-attendance=0) to predict the probability that a student in the survey attended School_B (instead of one of the other three schools).

The following screenshots display the steps taken when performing the logistic regression analysis in SPSS. The SPSS output report can be found in a separate file called appendix 2.

a) Based on the SPSS output provided in Appendix 2, is this a good model for predicting whether MBA students in the sample attended School_B? Please justify your answer from a statistical point of view by assessing model fit and overall model significance.

b) According to the output report, the significance level of the Hosmer-Lemeshow test is p=0.713. What does this mean? Is this good or bad news?

c) What types of students does School B attract? What are the most important predictors for attendance of School B?

d) In the output report you can see that GPA is a significant predictor of attendance at School B. Moreover, the natural logarithm of the unstandardized slope coefficient for GPA is Exp(B)=22.794. What does this mean?

e) According to the classification plot at the end of the SPSS output report, does the model seem to be better at predicting "attendance" or "non-attendance" at School B? Would you say that 0.5 is a reasonable cut-off value as a classification threshold?

Assignment Files -

https://www.dropbox.com/s/szbkh90yj0f8kk6/Assignment%20Files.zip?dl=0

Reference no: EM131439769

Questions Cloud

Calculate a confidence interval in spss : You will calculate a confidence interval in SPSS. Choose an appropriate variable from Weeks 2 and 3 and calculate a confidence interval in SPSS. (use: SEXUAL ORIENTATION and Hours spent on math homework/studying in typical school week all found in..
Develop a monte carlo simulation model : PD Tax Service is a regional tax preparation firm that competes with such national chains as H&R Block. The company is considering expanding and needs a financial model to analyze the decision to open a new store.
Calculate the values added : The value of imported leather is 70% of the value of the shoes (or $70). The tariff on shoes is 20% while the tariff on leather is 15%. Calculate the Values Added and use them to derive the Effective Rate of Protection or ERP.
Discuss at least five pros and five cons of outsourcing : Y?ou are the COO of Rockwell Collins (a manufacturer of aviation electronics, including for the military) and are considering outsourcing a portion of your production (including to a French company and to a Chinese company). Discuss at least five (5)..
What does an agglomeration schedule tell us in general : What does an agglomeration schedule tell us in general? Provide a brief hypothetical example (using the Metropolitan Areas case), outlining the circumstances in which we might be interested in interpreting the agglomeration schedule
Does the policy of buying us treasury bonds : Does maintaining a quasi-peg to the US dollar have a cost for China? Does the policy of buying US Treasury bonds have a cost for China's economy?
Calculate the optimum point : MIS20010 Business Analytics Calculate an investment risk figure for each company. We will use the Coefficient of Variation (CV) as the risk measure and
Which you will instruct the team to use to investigate : You are a public health scientist and informatician assigned to lead a team of environmental specialists investigating an outbreak of lead poisoning in a small community. The suspicion is that the lead poisoning has been caused by pollution genera..
Consumer decision making : Consumer decision making. Assume you're in the market for both a new cell phone and cell phone provider. Prepare a report summarizing your experience. Compare and contrast your experience with an actual consumer purchase decision your recently made.

Reviews

len1439769

3/25/2017 2:06:31 AM

I need help with the attached homework .most important you need to follow the instructions specifically. Please read the instructions below carefully: The exam paper consists of 2 sections; each of which needs to be answered. For your answers, please use the space provided. All questions are equally weighted and must be answered. Please make explicit any assumptions underlying your answers, interpret your results and justify your answers, conclusions and recommendations.

Write a Review

Applied Statistics Questions & Answers

  What is the probability that both are red

An urn contains 6 red, 5 blue and 2 green marbles. If 2 marbles are picked at random, what is the probability that both are red ?

  Find a peer-reviewed article

Find a peer-reviewed article that provides an example of a study that utilized a t-test for a single sample (on a topic of interest to you). Briefly describe the study and discuss its findings.

  What it means for two events to be mutually exclusive

Explain what it means for two events to be mutually exclusive; for N events. If A and B are events, define (in words) A, A U B, A n B, and A n B.

  What is the probability of students

In a survey of 900 college students , 405 out of 900 students has credit card or 45% of the student owned a credit card.

  Calculate chi-square

Calculate chi-square for this table. Show your work. Draw a table just like the one above, leaving room in each cell to record these numbers: observed frequency fo), expected frequency fe), Fo -fe(f0-fe)2 and (f0-fe)2/fe

  Describe the multiple regression model in statistics

In your OWN words describe the multiple regression model in statistics. state any references if used.

  Find the probability that the company will meet its goal

Find the probability that the company will meet its goal on a particular 100 miles of line. Find the probability that the company will not meet its goal on a particular 100 miles of line.

  What is the cost of the test which will make us indifferent

We need to acquire a piece of equipment. However, we are uncertain about its reliability. It might need a low, medium or a high number of repairs during its life. The present value ("PV") of the costs of the equipment over its life would be as follow..

  What was the mean number of u.s. states

What was the mean number of U.S. states in which all of the beer brands were sold in in 2008 and what was the mean number of U.S. states in which all of the beer brands were sold in in 2012?

  Describe the distribution of revenue growth rates

The following are the revenue growth rates for 30 fast-growing companies. Develop a dot plot for these data and describe the distribution of revenue growth rates.

  Analysis of biostatistical article

Discuss the level of measurement, assumptions that can be made, statistics that can be calculated from these data, and the general quality of the data

  Probabilities

Probabilities

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd